[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
iconv 2.2.4 doesn't handle UTF-8 BOM
From: |
Alexander Dupuy |
Subject: |
iconv 2.2.4 doesn't handle UTF-8 BOM |
Date: |
Fri, 21 Mar 2003 04:41:49 -0500 |
While iconv 2.2.4 (and/or libiconv 1.7) will eat a zero-width
nonbreaking space at the beginning of a file (aka Byte Order Mark, or
BOM) in UTF-16* input (and output a BOM for UTF-16* output), it doesn't
ignore an initial BOM in UTF-8 data. While use of a BOM for UTF-8
encoding isn't as common (since there are no byte-ordering issues for
8-bit data), there are some applications/OS which will use a BOM for
UTF-8 to distinguish it from other 8-bit character data in the default
locale (I have heard rumors that Mac OS X does this).
The Unicode website documents that BOM may occur in any Unicode text
transformation http://www.unicode.org/faq/utf_bom.html#23 and explicitly
notes that if you really want a zero-width nonbreaking space at the
start of your data stream, you should double it. (Of course, even
that's not good enough, since GNU iconv will eat BOM anywhere in UTF-16,
but that's another issue, and I'm not complaining about it.)
While I have no position on whether iconv should eat BOM anywhere in
UTF-8 data (I'm inclined to say no, but don't feel very strongly about
it) it certainly seems that iconv should at least eat BOM at the start
of a conversion string. Prepending a BOM to UTF-8 (or UTF-7) output
would probably be a bad idea, since many other applications, like iconv
currently, would just choke on the UTF-8 BOM.
@alex
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- iconv 2.2.4 doesn't handle UTF-8 BOM,
Alexander Dupuy <=