bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] iconv fails to convert utf8 with bom to cp1251


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] iconv fails to convert utf8 with bom to cp1251
Date: Thu, 07 Dec 2017 01:57:22 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-101-generic; KDE/5.18.0; x86_64; ; )

Yan wrote:

> Arch linux, iconv (GNU libc) 2.26

Your report ought to have been directed to the glibc tracker, not to the
libiconv tracker.

But anyway, since glibc and GNU libiconv behave the same way in this regard,
I can answer it:

> Iconv doesn't understand utf8 with bom ("EF BB BF" prefix which is legal
> according to standard). It prints "iconv: illegal input sequence at
> position 0".

Quoting the standard [1]:

   U+FEFF in the first position of a stream MAY be interpreted as a
   zero-width non-breaking space, and is not always a signature.

      A protocol SHOULD also forbid use of U+FEFF as a signature for
      those textual protocol elements for which the protocol provides
      character encoding identification mechanisms, when it is expected
      that implementations of the protocol will be in a position to
      always use the mechanisms properly.

You provided the encoding identification "UTF-8" to iconv, therefore
iconv SHOULD not allow a BOM in this conversion.

In other words, use of the BOM is only for those cases where no
encoding identification is present and some software has to guess.

Bruno

[1] https://tools.ietf.org/html/rfc3629#section-6




reply via email to

[Prev in Thread] Current Thread [Next in Thread]