[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: problems with german umlauts
From: |
Werner LEMBERG |
Subject: |
Re: problems with german umlauts |
Date: |
Fri, 26 Jan 2007 09:03:58 +0100 (CET) |
> well it was an approximation (due to the previously mentionned lack
> of vocabulary)
Do you mean that your English isn't sufficient to describe the things
correctly or that the issue itself is difficult to describe?
> ISO 2022 (as well as SHIFT-JIS and other japaneses encoding of the
> same type) use indeed "artificial" 8bit characters.
`Artificial'? Not at all! Almost all of the registered 8bit
character sets have been in use sometime and somewhere. The same for
the 16bit encodings. Note that even Unicode (encoded as UTF8) has
been registered, and there exist proper escape sequences to switch
from ISO 2022 to Unicode and back to ISO 2022.[1]
> The 0-127 range is always almost compatible with ASCII
Uh, oh, you are entering muddy waters. In old times, people haven't
actually used ASCII but officially approved variants of ISO-646.
IIRC, about 10 characters in the range 0x20-0x7F are variable.
> and there is 2 escaping character which work like double quotes.
> Inside quotes, character are multibyte (indeed it's impossible to
> store so many kanjis into only 128 slots)
Hmm, `double quotes' is perhaps a bad analogy. The one escape
character (followed by a character set ID) activates a different
encoding for the next character only, the other does the same
permanently.
> But this option raises more issues than it brings solutions... even
> if it is still widely used in japan (ISO 2022 is still their default
> encoding for e-mailings)
The very problem is that you can encode a single character like `á' in
many ways; for example, you could switch to latin1, or to latin2,
or... Additionally, ISO 2022 is stateful, this is, if you encounter a
bad or missing byte, the rest of the document might be corrupted. For
this reason it has become standard to switch back the encoding at the
end of a line and restart it at the beginning of the next line.
Werner
[1]: Just to make clear how ISO 2022 works (slightly simplified): The
byte range 0-255 is split into four areas: The `control code'
areas C0 (0x00-0x1F) and C1 (0x80-0x9F), and the left and right
`graphic code' areas (GL and GR, code ranges 0x20-0x7F and
0xA0-0xFF). Three character codes are always at the same
position: ESC (0x1B), SPACE (0x20), and DELETE (0x7F). In the
following, I ignore C0 and C1.
In a first step, registered character sets are assigned to GL and
GR. Normally, GL holds the standard version of ISO-646 (which is
equal to ASCII if combined with C0), but national variants exist.
For example, in Japan you'll often find that the backslash (at
position 0x5C) is replaced with the Yen sign. GR then gets the
`extended' character sets with either 96 characters (latin1, for
example) or 96x96 characters (JIS X 0208 for Japanese, GB 2312
for Chinese, etc.) or even 96x96x96 (CCCII, a Chinese encoding,
now defunct).
It's even possible to not use GR at all: The above-mentioned
Japanese email encoding is using only the bytes 0x00-0x7F (since
in former times not all email clients supported 8bit cleanly),
switching forth and back between encodings which share the range
0x20-0x7F.
- Re: problems with german umlauts, (continued)
- Re: problems with german umlauts, uunail, 2007/01/23
- Re: problems with german umlauts, Jonathan Henkelman, 2007/01/25
- Re: problems with german umlauts, Bertalan Fodor, 2007/01/25
- Re: problems with german umlauts, Mats Bengtsson, 2007/01/25
- Re: problems with german umlauts, Jonathan Henkelman, 2007/01/25
- Re: problems with german umlauts, Mats Bengtsson, 2007/01/25
- Re: problems with german umlauts, yota moteuchi, 2007/01/25
- Re: problems with german umlauts, Werner LEMBERG, 2007/01/25
- Re: problems with german umlauts, Werner LEMBERG, 2007/01/25
- Re: problems with german umlauts, yota moteuchi, 2007/01/25
- Re: problems with german umlauts,
Werner LEMBERG <=
- Re: problems with german umlauts, René Bastian, 2007/01/26
- Re: problems with german umlauts, Bertalan Fodor, 2007/01/26
- Re: problems with german umlauts, Mats Bengtsson, 2007/01/26
- Re: problems with german umlauts, Jonathan Henkelman, 2007/01/26
- Re: problems with german umlauts, David Rogers, 2007/01/26
- Re: problems with german umlauts, Anthony W. Youngman, 2007/01/28
- Re: problems with german umlauts, Bertalan Fodor, 2007/01/25
- Re: problems with german umlauts, Graham Percival, 2007/01/25
- Re: problems with german umlauts, Bertalan Fodor, 2007/01/26
- Re: problems with german umlauts, Anthony W. Youngman, 2007/01/26