RemoveBOM v2 webpage

by . Last updated . This page has been accessed 8,126 times since the 21st August 2007.

You are connecting to the IPv4 version of this website from the IP address You can try the IPv6-only version if you want.



I found a second showstopping problem with Microsoft Expression Web other than the BOM breaking PHP, and this includes all versions up to v4: there is a bug where the formatting engine will slowly & randomly corrupt your HTML! This is a rather serious problem, so I resolved to fix it by having my preflight python script perform validation of new content to catch any errors.

The following Python script fixes this problem and is an enhancement of RemoveBOM v1. It takes your Microsoft Expression Web directory tree and copies it to another location, performing the following operations as it goes:

  1. It generates a full sitemap.xml in the document root.
  2. Tests ..html files (not .htm, this is an easy way to mark for non-validation) for the UTF-8 BOM. If present, it removes the BOM and validates the data as valid UTF-8.
  3. For .html files it prepends and appends php header rewriting code which spits out headers setting the content type to UTF-8 if the BOM was present.
  4. For .html files it also sets HTTP Last-Modified to the last modified time of the php-containing html file which ensures that a HTTP 302 Not Modified response is given by Apache should the web browser send a "send if modified since X" request (which most do), thus greatly lowering bandwidth costs and indeed server load thanks to idiotic spider robots.
  5. For .html files it also uses PHP output buffering to determine a correct Content-Length header and enables zlib compression should the source file exceed 64Kb - this adds latency for the compression and decompression, but halves or quarters the amount of data needing to be transmitted.
  6.  It passes all XHTML declaring itself as such through a validating XHTML parser and opens a list of found errors, if any, after completion. It uses a HTML5 microdata enabled XHTML DTD, so you can use HTML5 microdata just fine.
  7. It knows when to not copy files which are unchanged, so it is fast to run just before you upload your changes.

You may find this script useful as a base for writing your own. No guarantees or support are given with this code. Enjoy!