bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.


From: John Kearney
Subject: Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
Date: Thu, 23 Feb 2012 03:43:49 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20120129 Thunderbird/10.0

^ caviot you can represent the full 0x10ffff in UTF-16, you just need 2
UTF-16 characters. check out the latest version of unicode.c for an
example how.

On 02/22/2012 11:32 PM, Eric Blake wrote:
> On 02/22/2012 03:01 PM, Linda Walsh wrote:
>>     My question had to do with an unqualified wint_t not
>> unsigned wint_t and what platform existed where an 'int' type or
>> wide-int_t, was, without qualifiers, unsigned.  I still would like
>> to know -- and posix allows int/wide-ints to be unsigned without
>> the unsigned keyword?
> 
> 'int' is signed, and at least 16 bits (these days, it's usually 32).  It
> can also be written 'signed int'.
> 
> 'unsigned int' is unsigned, and at least 16 bits (these days, it's
> usually 32).
> 
> 'wchar_t' is an arbitrary integral type, either signed or unsigned, and
> capable of holding the value of all valid wide characters.   It is
> possible to define a system where wchar_t and char are identical
> (limiting yourself to 256 valid characters), but that is not done in
> practice.  More common are platforms that use 65536 characters (only the
> basic plane of Unicode) for 16 bits, or full Unicode (0 to 0x10ffff) for
> 32 bits.  Platforms that use 65536 characters and 16-bit wchar_t must
> have wchar_t be unsigned; whereas platforms that have wchar_t wider than
> the largest valid character can choose signed or unsigned with no impact.
> 
> 'wint_t' is an arbitrary integral type, either signed or unsigned, at
> least as wide as wchar_t, and capable of holding the value of all valid
> wide characters and the sentinel WEOF.  Like wchar_t, it may hold values
> that are neither WEOF or valid characters; and in fact, it is more
> likely to do so, since either wchar_t is saturated (all bit values are
> valid characters) and thus wint_t is a wider type, or wchar_t is sparse
> (as is the case with 32-bit wchar_t encoding Unicode), and the addition
> of WEOF to the set does not plug in the remaining sparse values; but
> using such values has unspecified results on any interface that takes a
> wint_t.  WEOF only has to be distinct, it does not have to be negative.
> 
> Don't think of it as 'wide-int', rather, think of it as 'the integral
> type that both contains wchar_t and WEOF'.  You cannot write 'signed
> wint_t' nor 'unsigned 'wint_t'.
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]