[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with multilingual input?

From: Jason Rumney
Subject: Re: Problem with multilingual input?
Date: Wed, 21 Nov 2007 09:28:26 +0000
User-agent: Thunderbird (Windows/20071031)

Bostjan Vilfan wrote:
>   character: б (332881, #o1212121, #x51451, U+0431)
>     charset: mule-unicode-0100-24ff
>          (Unicode characters of the range U+0100..U+24FF.)
>  code point: #x28 #x51
>      syntax: w     which means: word
>    category: y:Cyrillic
> buffer code: #x9C #xF4 #xA8 #xD1
>   file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
>     display: by this font (glyph code)
>      -outline-Bitstream Vera Sans
> Mono-bold-r-normal-normal-16-120-96-96-c-*-iso10646-1 (#x431)
> Comparing this result to yours in your previous message, it would
> appear that the font is the culprit. Namely I invoke Emacs with the
> command line options
> "C:\Program Files\Emacs\emacs-22.1\bin\runemacs.exe" -g -0 --font
> "-outline-Bitstream Vera Sans
> Mono-bold-r-normal-normal-*-*-96-96-c-*-iso8859-1"
Try the font "DejaVu Sans Mono". It is an extended version of Bitstream
Vera Sans Mono that supports many more characters, including Cyrillic.

> and the cyrillic characters are clearly visible. However, this still
> does not exhaust  the possible questions. Namely, when I invoke Emacs
> with the "problematic font" as described above, I can still display
> cyrillic characters in a new file. Problems arise only when I 
> _reopen_ the file.
The difference is the character encoding: when you enter characters,
they are entered as iso8859-5 encoded characters, so Emacs chooses a
Cyrillic font to display them. When you read them from a UTF-8 encoded
file, they are read as mule-unicode-0100-24ff encoded characters, so
Emacs chooses a Unicode font to display them. On Windows, all truetype
fonts are Unicode fonts for some subset of characters, but Emacs 22 does
not look in any more detail to see what subset that is.

This will improve with Emacs 23 (once the unicode branch is merged),
since Cyrillic will always be consistently encoded as Unicode, and a new
font backend now has the ability to look more closely at Unicode fonts
to see which Unicode subranges they support.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]