[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: u32_normalize UNINORM_NFKC on 0xD800

From: Bruno Haible
Subject: Re: u32_normalize UNINORM_NFKC on 0xD800
Date: Fri, 27 May 2011 01:49:25 +0200
User-agent: KMail/1.9.9

Simon Josefsson wrote:
> I'm doing some Unicode NFKC operations and noticing that u32_normalize
> fails for U+D800.

This is a valid behaviour, because U+D800 is a "surrogate" point code
and therefore not a valid character code point.

See the Unicode standard, chapter 2 [1], pages 23..24:
Surrogate code points and other non-character code points "should never be
interchanged". This means, for libunistring, that they are invalid input
and invalid output in all functions taking or returning UTF-32 strings or
UTF-8 strings.

Character code points and code points that are in regions that may be assigned
in future Unicode versions must not be rejected; these are valid input.


[1] http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf
In memoriam Jeane Gardiner <http://en.wikipedia.org/wiki/Jeane_Gardiner>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]