Am 10.03.2012 23:17, schrieb Chet Ramey:
On 3/7/12 12:07 AM, John Kearney wrote:
You really should stop using this function. It is just plain wrong, and
is not predictable.
It may enocde BIG5 and SJIS but is is more by accident that intent.
If you want to do something like this then do it properly.
basically all of the multibyte system have to have a detection method
for multibyte characters, most of them rely on bit7 to indicate a
multibyte sequence or use vt100 SS3 escape sequences. You really can't
just inject random data into a txt buffer. even returning UTF-8 as a
fallback is a bug. The most that should be done is return ASCII in
error
case and I mean U+0-U+7f only and ignore or warn about any unsupported
characters.
Using this function is dangerous and pointless.
I mean seriously in what world does it make sense to inject utf-8
into a
big5 string? Or indead into a ascii string. Code should behave like an
adult, not like a frightened kid. By which I mean it shouldn't pretend
it knows what its doing when it doesn't, it should admit the problem so
that the problem can be fixed.
Wow. Do you really think that personal insults are a good way to
advance
an argument?
Listen: bottom line. It's a fallback function. It's called in the
unlikely event that iconv isn't available at all and we're not in a
UTF-8 locale. Any fallback is as good as another, though maybe the
best one would be to return \uNNNN or \UNNNNNNNN (before you ask,
Posix leaves the \u/\U failure cases unspecified). The real question
is what to do with invalid input data, since any transformation is
going to "inject random data" into the buffer. Maybe the identity
function would be better after all. But then you'd ask whether or
not it makes sense to inject a C-style escape sequence into a big5
string.
Chet
I guess I was a bit terse wouln't call it a personal insult though.
Though I guess I do have pretty thick skin, sorry if you felt it was
meant as one.
My point is the fallback function/handler should report an
error/warning not do anything and move on.
Trying to reover an irrecoverable error is just making it more
difficult to figure out what is going on.
Basically this is a script/enviroment error, so report the error,
don't hide it.
Its a similar problem with the iconv fallback of returning UTF-8. If
iconv says it can't encode the unicode value in the destination
charset do we really know better? Again it is better to report the
error an move on. because injecting utf-8 into big5 or whatever is
also wrong. because if utf-8 is the destination charset then it would
have already been detected or iconv would have worked so contextually
we this is wrong.
if (iconv (localconv, (ICONV_CONST char **)&iptr, &sn, &optr,
&obytesleft) == (size_t)-1)
return n; /* You get utf-8 if iconv fails */
now don't forget we know at this point that iconv knows the source and
destination charsets so we have unicode character unsupported in
destination charset.
or here
n = u32toutf8 (c, s);
if (utf8locale || localconv == (iconv_t)-1)
return n;
If destination charset is utf-8 OR destiation charset NOT utf-8 and
icconv didn't recognise detination charset encode it as uft-8.
Lets say CTYPE=BIG5 and you try to encode a unicode char U+F000 which
is an invalid big5 char(at least I think it is).
so iconv returns an error.
now the code inserts the utf-8 encoding of U+F000, which is an invalid
string sequence.
this isn't helping anyody.