[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#7963: 16-bit wchar_t on Windows and Cygwin
From: |
Andy Koppe |
Subject: |
bug#7963: 16-bit wchar_t on Windows and Cygwin |
Date: |
Wed, 2 Feb 2011 20:28:17 +0000 |
On 2 February 2011 16:35, Corinna Vinschen wrote:
> On Feb 2 17:28, Corinna Vinschen wrote:
>> On Feb 2 17:02, Bruno Haible wrote:
>> > But if you say that the application should convert UTF-16 surrogates
>> > to UTF-32 before calling iswalpha: That's certainly a requirement
>> > for Cygwin 1.7.x application that want to support the entire Unicode
>> > character set. But it's outside of POSIX, and many GNU programs will
>> > not want to include this added complexity. Just try to apply this
>> > suggestion to gnulib's quotearg.c, then estimate the time someone
>> > would need to apply it also to regcomp.c, strftime.c, mbscasestr.c,
>> > coreutils/src/wc.c, and so on.
>>
>> Cygwin's regcomp is taken from FreeBSD and is UTF-16 capable, including
>> surrogate handling. It only required two changes in the code.
>
> Btw., I would be sure glad if Cygwin would use a wchar_t of 4 bytes as
> well. The problem is that this requires too many changes at once to
> work right, and it would introduce a lot of backward compatibility
> problems which would have to be handled.
Cygwin 1.7 might have been a good point for that change, because the
lack of proper locale and charset support in previous versions meant
that backward compatibility was much less of a concern than it is now.
But it's a difficult change indeed, and it's not entirely clear that
it's worthwhile. I guess 64-bit Cygwin (if or when it happens) might
be the next opportunity.
> If only the one's who decided that wchar_t in Cygwin should have the
> same size as WCHAR_T in the underlying Windows would have thought twice
> about the implications...
Windows Unicode support was introduced with Windows NT in 1993,
whereas Unicode was only extended beyond 16 bits with version 2.0 in
1996. Cygwin was first released the year before. If the Unicode
extension was a consideration at all (which I'd doubt), wchar_t !=
WCHAR probably seemed far more daunting than having to deal with
surrogates at some point down the line.
Andy
bug#7948: 16-bit wchar_t on Windows and Cygwin, Paul Eggert, 2011/02/02