[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Please remove iconv_open (charset, "ASCII"); from unicode.c
From: |
John Kearney |
Subject: |
Please remove iconv_open (charset, "ASCII"); from unicode.c |
Date: |
Wed, 07 Mar 2012 05:47:22 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20120129 Thunderbird/10.0 |
Hi chet can you please remove the following from the unicode.c file
localconv = iconv_open (charset, "ASCII");
This is invalid fall back. zhis creates a translation config. The
primary attempt is utf-8 to destination codeset. If that conversion
fails this tries selecting ASCII to codeset. !!!!! But the code still
inputs utf-8 as input to the icconv. this means that this is less
likely to successfully encode than a simple assignment. consider
U+80 becomes utf-8 "\xc2\x80" which because we tell iconv this is
ascii becomes ascii "\xc2\x80".
do this line takes a U+80 and turns it into a U+c3 and a U+80.
The way i rewrote the icconv code made it cleaner, safer and quicker,
please consider using it. I avoided the need for the strcpy among
other things.
On 02/21/2012 03:42 AM, Chet Ramey wrote:
> On 2/18/12 5:39 AM, John Kearney wrote:
>
>> Bash Version: 4.2 Patch Level: 10 Release Status: release
>>
>> Description: Current u32toutf8 only encode values below 0xffff
>> correctly. wchar_t can be ambiguous size better in my opinion to
>> use unsigned long, or uint32_t, or something clearer.
>
> Thanks for the patch. It's good to have a complete
> implementation, though as a practical matter you won't see UTF-8
> characters longer than four bytes. I agree with you about the
> unsigned 32-bit int type; wchar_t is signed, even if it's 32 bits,
> on several systems I use.
>
> Chet
>
- Please remove iconv_open (charset, "ASCII"); from unicode.c,
John Kearney <=