Occassionally when you get files from someone else or use a new text editor you encounter an issue with hidden characters or incorrect encoding that can be a pain to track down.
A common one of these that I haven't encountered for ages, (since we moved from Windows to OSX for development) is the Byte Order Mark (BOM).
This is a simple marker that indicates the encoding order of UTF-8 files and should in itself be no problem. It does however cause an issue when the file is included within other files (by PHP as *.inc files for example) such as in template files.
This can result in blank lines that are hard to track down or problems with page headers.
You may also find that the BOM causes problems for an ordinary PHP page. When sending custom HTTP headers the code to set the header must be called before output begins. A BOM at the start of the file causes the page to begin output before the header command is interpreted, and may lead to error messages and other problems in the displayed page.
It was exactly that issue that I encountered the other day and being such a blast from the past - I thought I would make this quick post about the issue and some ways of tracking it down and fixing it.
So, first off what issue does it cause. Well in my case and in past cases too it caused Drupal to complain and create a circular redirect on admin pages.
This is illustrated by errors such as the 'headers already being sent’ error (https://www.drupal.org/node/1424).
In fact it was that error that jogged my memory on the subject as I hadn't seen it for a long time.
The cause was an old module template file that I hadn't done any work on for a few years since moving from Windows development environment.
So in this case I was thrown a bone as that was obviously the latest change to the site.
In other cases however it can be tricky to track down - so lets look at what it is and how to locate and remove it.
You can search your files for BOMs (limiting the search to the first line, as the character sequence may appear elsewhere and it is only the first characters of the first line that we are interested in).
grep -rl '\xEF\xBB\xBF' .
on MACOSX you need to prepend a dollar sign:
grep -rlI $'\xEF\xBB\xBF' .
Note the full stop is required to indicate the files in the current directory.
We could remove this en masse with a script - but this may cause issues if the file is supposed to have them:
sed 's/\xEF\xBB\xBF//' < inputfile > outputfile
again on MAC oSX (note the $ )
sed $'s/\xEF\xBB\xBF//' < inputfile > outputfile
Note that these will remove the first instance (if you wanted to remove every instance - not adviseable - but as 'sed' note - use the global modifier 'g').
However unless you have loads of them it is likely to be only a single file that is causing an issue - so lets open that and see what it looks like.
vim file.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Although the BOM is the first sequence of charatcetrs in the file - the BOM is not displayed by defauklt in most text editors. This makes them hard to locate/assess.
But you can open the file in VIM in binary mode::
vim -b file.html <feff><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Note the <feff> char sequence.
When looking at the file in VIM in normal text mode you can check if there is a BOM:
:setlocal bomb? bomb
And get it to save without the BOM
:setlocal nobomb :w
Most text editors allow you to set a a global option to not set the BOM.
See the attached Asciiinema of a terminal session showing these steps (or go here https://asciinema.org/a/46517).
- info [at] blue-bag.com
- Telephone: 0843 2894522
- Blue-Bag HQ:
The Garage, Manor Farm
Somerset, BA3 4HP, United Kingdom
- Telephone: (+44) 01761 411542
- Blue-Bag Brighton:
Unit 35 Level 6 North, New England House
New England Street, Brighton
BN1 4GH United Kingdom
- Telephone: (+44) 01273 687900
- VAT GB 748125034
- UK Company Reg: 3932829