[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: More Cyrillic vs UTF-8
From: |
Simon Josefsson |
Subject: |
Re: More Cyrillic vs UTF-8 |
Date: |
Sat, 26 Apr 2003 13:54:34 +0200 |
User-agent: |
Gnus/5.090019 (Oort Gnus v0.19) Emacs/21.3.50 (gnu/linux) |
Kenichi Handa <address@hidden> writes:
> In article <address@hidden>, Simon Josefsson <address@hidden> writes:
>> (Same configuration as last mail)
>> Cut'n'paste the following string into a new file and save it:
>
>> Горбачев
>
>> UTF-8 isn't shown as an option, and indeed selecting UTF-8 destroys
>> the data. Doesn't Emacs CVS support the entire Unicode repertoire?
>
>> (The string above, encoded as shift_jis, is, according to od -x:
>> 0000000 4384 8084 8284 7184 7084 8984 7584 7284)
>
> Those characters belongs to the charset japanese-jisx0208,
> and the current Emacs still can't encode them into UTF-8.
>
> How did you get such characters?
That may be interesting by itself. Go to
http://www.nns.ru/persons/gorbach.html using galeon (or mozilla, I
think). Cut'n'paste the first word and yank it in Emacs. It looks as
single-width in galeon, but when yanked into emacs it becomes double
width. Yanking it into xterm or gnome-terminal doesn't change the
string, it looks like single-width. Save the HTML file and open it in
emacs as a koi8 file (note that emacs doesn't auto detect it as koi8
so you to do that manually), then it is single-width too.
I guess it is the emacs X cut'n'paste code that somehow makes the
string into double width japanese characters.
- Re: More Cyrillic vs UTF-8, (continued)
- Re: More Cyrillic vs UTF-8, Richard Stallman, 2003/04/28
- Re: More Cyrillic vs UTF-8, Benjamin Riefenstahl, 2003/04/29
- Re: More Cyrillic vs UTF-8, Richard Stallman, 2003/04/30
- Re: More Cyrillic vs UTF-8, Kai Großjohann, 2003/04/30
- Re: More Cyrillic vs UTF-8, Richard Stallman, 2003/04/28
- Re: More Cyrillic vs UTF-8, Richard Stallman, 2003/04/28
Re: More Cyrillic vs UTF-8, Kenichi Handa, 2003/04/26
- Re: More Cyrillic vs UTF-8,
Simon Josefsson <=