[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Multibyte support (round 2)
From: |
Eric Blake |
Subject: |
Re: Multibyte support (round 2) |
Date: |
Mon, 29 Aug 2016 12:13:12 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
On 08/27/2016 12:05 AM, Assaf Gordon wrote:
> Regarding wchar_t == UCS:
> And so, the question becomes:
> When the locale is "UTF-8", is the internal representation of 'wchar_t'
> identical to UCS2 or UCS4 (i.e. unicode code-points).
> While the standard explicitly says this can not be assumed,
> I think in practice it is always the case.
>
> It is so in glibc and musl-libc,
> and in OpenBSD,FreeBSD,NetBSD with "UTF-8" locales (but not in non-utf8
> locales).
But not in Cygwin, where wchar_t is 2 bytes, and where Cygwin already
supports surrogate pairs in wchar_t to represent Unicode characters
beyond 0xffff (such a representation is a violation of the POSIX
definition of wchar_t, which is supposed to encode every possible
character via a single code point, but it was deemed a better solution
than limiting Cygwin to only the BMP characters, and only affects code
that is explicitly using characters outside BMP).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature