[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gnu-libiconv] unicode normalization / unorm (was: The utf-8-mac enc
[bug-gnu-libiconv] unicode normalization / unorm (was: The utf-8-mac encoder on macOS gives incorrect output)
Thu, 26 Oct 2017 14:23:49 -0600
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
On 2017-10-26 05:50 AM, Marcin Sulikowski wrote:
I've been trying to use libiconv on macOS to convert UTF-8 strings to
their NFD form using libiconv's "utf-8-mac" encoding which is available
In GNU coreutils we are working on a unicode normalization program
(unorm) which can perform nfd/nfc/nfkd/nfkc conversions and other
multibyte character processing.
It is still highly experimental, but produces the following output based
on your input:
$ ( printf "a%.0s" `seq 4094` ; echo -n ó ) \
| unorm --normalization=nfd \
| hexdump -e '8/1 "%02x " "\n"'
61 61 61 61 61 61 61 61
61 61 61 61 61 61 6f cc
Where U+00f3 (\xc3 \xb3) was normalized to "o" + U+0301 (\x6F \xCC \x81).
More information about the multibyte implementation progress is here:
If you'd like to experiment with the program, a snapshot is here:
(note this is unstable and unsupported snapsot of coreutils code).
Any feedback is appreciated.