[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 180
Mingye Wang (Arthur2e5)
[bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 18030 "principles"
Thu, 29 Sep 2016 02:33:51 -0400
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2
I am not sure if someone has brought this up before, as what I am
reporting is, in fact, a well-documented issue. 
iconv encodes the GB code points A1A4 and A1AA differently for GB 2312
and GB 18030:
bytes gb2312 gb18030
----- ------ -------
A1A4 U+00B7 U+30FB
A1AA U+2014 U+2015
This slight difference breaks compatibility between these two encodings,
a principle of the mandatory GB 18030[^1] standard:
[^1]: -2000 and -2005. In 2000 it says "de facto internal encoding".
> 3. Principles
> This standard is backwards compatible with the internal encoding
> defined in GB 2312.
This violation of standard principles is not rare in the FOSS world,
according to . Someone submitted a similar bug to Python, but it
got marked "wontfix" to ensure compatibility with "the rest of the FOSS
world" as well as round-trip safety (in case of a Ruby-like
normalization[^2]). I am submitting this bug in hope that changes in
libiconv, an important reference implementation for "the rest of the
FOSS world", can lead to revisions in other libraries.
[^2]: Ruby uses a gb18030-compatible implementation internally, but
still accepts Unicode code points from the incompatible code points.
Description: OpenPGP digital signature
- [bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 18030 "principles",
Mingye Wang (Arthur2e5) <=