Re: Problem with multilingual input?

Hello,
I followed your instructions, and I think I made some progress.

When I reopened the file containing cyrillic characters with language environment = utf-8, I obtained the following results for describe-char:

character: b (98, #o142, #x62, U+0062)
    charset: ascii (ASCII (ISO646 IRV))
code point: #x62
     syntax: w     which means: word
   category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
file code: #x62 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Bitstream Vera Sans Mono-bold-r-normal-normal-16-120-96-96-c-*-iso8859-1 (#x62)

character: б (332881, #o1212121, #x51451, U+0431)
    charset: mule-unicode-0100-24ff
         (Unicode characters of the range U+0100..U+24FF.)
code point: #x28 #x51
     syntax: w     which means: word
   category: y:Cyrillic
buffer code: #x9C #xF4 #xA8 #xD1
file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Bitstream Vera Sans Mono-bold-r-normal-normal-16-120-96-96-c-*-iso10646-1 (#x431)

Comparing this result to yours in your previous message, it would appear that the font is the culprit. Namely I invoke Emacs with the command line options

"C:\Program Files\Emacs\emacs-22.1\bin\runemacs.exe" -g -0 --font "-outline-Bitstream Vera Sans Mono-bold-r-normal-normal-*-*-96-96-c-*-iso8859-1"

If I invoke Emacs simply with the command line

Emacs

then the describe-char commands yield:

character: b (98, #o142, #x62, U+0062)
    charset: ascii (ASCII (ISO646 IRV))
code point: #x62
     syntax: w     which means: word
   category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
file code: #x62 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Courier New-normal-r-normal-normal-13-97-96-96-c-*-iso8859-1 (#x62)

character: б (332881, #o1212121, #x51451, U+0431)
    charset: mule-unicode-0100-24ff
         (Unicode characters of the range U+0100..U+24FF.)
code point: #x28 #x51
     syntax: w     which means: word
   category: y:Cyrillic
buffer code: #x9C #xF4 #xA8 #xD1
file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Courier New-normal-r-normal-normal-13-97-96-96-c-*-iso10646-1 (#x431)

and the cyrillic characters are clearly visible. However, this still does not exhaust the possible questions. Namely, when I invoke Emacs with the "problematic font" as described above, I can still display cyrillic characters in a new file. Problems arise only when I _reopen_ the file. To investigate this problem I invoked Emacs with

"C:\Program Files\Emacs\emacs-22.1\bin\runemacs.exe" -g -0 --font "-outline-Bitstream Vera Sans Mono-bold-r-normal-normal-*-*-96-96-c-*-iso8859-1"

and entered the same lines as before in a new file (even without language environment = utf-8). descirbe-char yields

character: b (98, #o142, #x62, U+0062)
    charset: ascii (ASCII (ISO646 IRV))
code point: #x62
     syntax: w     which means: word
   category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
file code: #x62 (encoded by coding system iso-latin-1-dos)
    display: by this font (glyph code)
     -outline-Bitstream Vera Sans Mono-bold-r-normal-normal-16-120-96-96-c-*-iso8859-1 (#x62)

character: б (3665, #o7121, #xe51, U+0431)
    charset: cyrillic-iso8859-5
         (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.)
code point: #x51
     syntax: w     which means: word
   category: y:Cyrillic
buffer code: #x8C #xD1
file code: not encodable by coding system iso-latin-1-dos
    display: by this font (glyph code)
     -outline-Arial-bold-r-normal-normal-16-120-96-96-p-*-iso8859-5 (#x431)

In this case the cyrillic characters are visible

THUS IT WOULD APPEAR THAT IN THIS CASE EMACS IS ABLE TO SELECT A SUBSTITUTE FONT THAT RENDERS THE CHARACTERS CORRECTLY. WHY DOES IT NOT DO SO WHEN THE FILE IS REOPENED?

Regards,
Bostjan

----- Original Message ----
From: martin rudalics <rudalics@gmx.at>
To: Bostjan Vilfan <bvilf@yahoo.com>
Cc: Bug-Gnu-Emacs <bug-gnu-emacs@gnu.org>
Sent: Wednesday, November 21, 2007 8:23:07 AM
Subject: Re: Problem with multilingual input?

> On Windows I tried your suggestion (set-language-environment) and the
> result was the same (empty rectangles). Then I selected
> Options->Mule-Show All of Mule Status and read off the current
> language environment as UTF-8. Thus, language environment equals utf-8
> or English does not influence the outcome.

Sorry for the delay. I hoped someone else would respond but apparently
all language environment experts are busy at the moment. Please CC to
bug-gnu-emacs when answering - maybe we'll get qualified help.

> On Linux the outcome is OK (cyrillic characters visible), again

The correct name of this OS is GNU/Linux.

> regardless of the language environment settings (utf-8 or English)

When I have a file saved with mule-utf-8 containing the two lines

bla bla
бла бла

visit the file with `current-language-environment' utf-8 and do
`describe-char' for the first character of the first line I get

character: b (98, #o142, #x62, U+0062)
charset: ascii (ASCII (ISO646 IRV))
code point: #x62
syntax: w which means: word
category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
file code: #x62 (encoded by coding system mule-utf-8-dos)
display: by this font (glyph code)
-outline-Courier New-normal-r-normal-normal-16-96-120-120-c-*-iso8859-1 (#x62)

`describe-char' for the first character of the second line gets me

character: б (332881, #o1212121, #x51451, U+0431)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: #x28 #x51
syntax: w which means: word
category: y:Cyrillic
buffer code: #x9C #xF4 #xA8 #xD1
file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
display: by this font (glyph code)
-outline-Courier New-normal-r-normal-normal-16-96-120-120-c-*-iso10646-1 (#x431)

on WindowsME. What do you get?

From:	Bostjan Vilfan
Subject:	Re: Problem with multilingual input?
Date:	Wed, 21 Nov 2007 00:31:48 -0800 (PST)