[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]

From: Ethan Blanton
Subject: Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]
Date: Sat, 17 Feb 2007 10:33:08 -0500
User-agent: Mutt/1.5.12-2006-07-14

Patrick Georgi spake unto us the following wisdom:
> but skipping a character should be possible:
> - build another iconv state that translates input encoding into input 
> encoding (unless that enables a fast-path, which I'm not sure of - 
> alternative might be some encoding that is the ultimate superset, if 
> such an encoding exists)
> - push first unknown byte into it. if that creates a response already, 
> discard (as it might be some header sequence) and restart with the same 
> byte in the next step, otherwise start at the next byte
> - until iconv emits a response, push byte after byte into it
> - skip that many bytes in the input, replace with one "?"

This is more or less what we do in Gaim, for some of our fallback
attempts.  This can still lead to junk in your output, particularly
given that a) there are non-UTF-8 character sets which look just like
valid UTF-8 (e.g., ISO-2022-{JP,KR}), and b) there are character sets
which will accept any byte as valid, though it may not be (e.g.,

The bottom line, though, is that if the user (or operating system) has
not successfully communicated the character set used for some chunk of
data, you _cannot_ do the right thing -- the best you can do is try
not to mess it up too much.  For us, this basically means filter out
anything that isn't UTF-8 before it gets to the user (normally
replacing invalid sequences with one or more '?' characters), as our
UI is guaranteed to be UTF-8 by design.  With monotone you aren't
given this guarantee, but a similar approach seems reasonable; try to
convert it to whatever LC_CHARSET recommends, restarting one byte at a
time and replacing any bytes which fail to convert with '?'.


The laws that forbid the carrying of arms are laws [that have no remedy
for evils].  They disarm only those who are neither inclined nor
determined to commit crimes.
                -- Cesare Beccaria, "On Crimes and Punishments", 1764

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]