bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can somebody explain to me what u32tochar in /lib/sh/unicode.c is tr


From: Chet Ramey
Subject: Re: Can somebody explain to me what u32tochar in /lib/sh/unicode.c is trying to do?
Date: Sat, 10 Mar 2012 17:17:07 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 3/7/12 12:07 AM, John Kearney wrote:
> You really should stop using this function. It is just plain wrong, and
> is not predictable.
> 
> 
> It may enocde BIG5 and SJIS but is is more by accident that intent.
> 
> If you want to do something like this then do it properly.
> 
> basically all of the multibyte system have to have a detection method
> for multibyte characters, most of them rely on bit7 to indicate a
> multibyte sequence or use vt100 SS3 escape sequences. You really can't
> just inject random data into a txt buffer. even returning UTF-8 as a
> fallback is a bug. The most that should be done is return ASCII in error
> case and I mean U+0-U+7f only and ignore or warn about any unsupported
> characters.
> 
> Using this function is dangerous and pointless.
> 
> I mean seriously in what world does it make sense to inject utf-8 into a
> big5 string? Or indead into a ascii string. Code should behave like an
> adult, not like a frightened kid. By which I mean it shouldn't pretend
> it knows what its doing when it doesn't, it should admit the problem so
> that the problem can be fixed.

Wow.  Do you really think that personal insults are a good way to advance
an argument?

Listen: bottom line.  It's a fallback function.  It's called in the
unlikely event that iconv isn't available at all and we're not in a
UTF-8 locale.  Any fallback is as good as another, though maybe the
best one would be to return \uNNNN or \UNNNNNNNN (before you ask,
Posix leaves the \u/\U failure cases unspecified).  The real question
is what to do with invalid input data, since any transformation is
going to "inject random data" into the buffer.  Maybe the identity
function would be better after all.  But then you'd ask whether or
not it makes sense to inject a C-style escape sequence into a big5
string.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]