[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: converting between charsets
From: |
Alexander Kotelnikov |
Subject: |
Re: converting between charsets |
Date: |
Tue, 16 May 2006 00:30:44 +0400 |
User-agent: |
Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) |
>>>>> On Mon, 15 May 2006 10:11:48 -0400
>>>>> "SM" == Stefan Monnier <address@hidden> wrote:
SM>
SM> Than show us how and when you call encode-coding-region.
SM> I.e. repeat the above but in elisp rather than english.
SM> Assume you're explining it to a complete idiot.
SM>
SM> Please take seriously the bit about the idiot.
SM>
>>>> For example.
>>>> 1. (find-file "/tmp/test.txt")
SM>
SM> How did you start Emacs?
SM>
>> For example 'emacs -q', even if do not suppress reading my ~/.emacs
>> the result is the same.
SM>
SM> Under X or under a tty?
X
>>>> 2. enter some text in Russian (after I toggled xkb layout)
>>>> 3. M-: (encode-coding-region (point-min) (point-max) 'koi8-r) and
>>>> Russian characters become '?'.
SM> What did you expect instead?
>> I expect that cyrrillic characters will be encoded to their koi8-r values.
SM>
SM> If you put the cursor on the russian chars before calling
SM> encode-coding-region and hit C-u C-x = what does it say?
character: Т (01212102, 332866, 0x51442)
charset: mule-unicode-0100-24ff
(Unicode characters of the range U+0100..U+24FF.)
code point: 40 66
syntax: word
category: y:Cyrillic
buffer code: 0x9C 0xF4 0xA8 0xC2
file code: 0xD0 0xA2 (encoded by coding system utf-8)
font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso10646-1
SM> If you put the cursor on the `?' that replaced that char and hit C-u C-x =
SM> what does it say?
character: ? (077, 63, 0x3f)
charset: ascii (ASCII (ISO646 IRV))
code point: 63
syntax: punctuation
category: a:ASCII l:Latin
buffer code: 0x3F
file code: 0x3F (encoded by coding system utf-8)
font: -monotype-courier
new-medium-r-normal--13-94-99-99-m-80-adobe-standard
And for français I get:
character: ç (04347, 2279, 0x8e7)
charset: latin-iso8859-1
(Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100)
code point: 103
syntax: word
category: l:Latin
buffer code: 0x81 0xE7
file code: 0xC3 0xA7 (encoded by coding system utf-8)
font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso8859-1
after (representaion is \347)
character: ç (0347, 231, 0xe7)
charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
code point: 231
syntax: whitespace
category:
buffer code: 0xE7
file code: 0xE7 (encoded by coding system utf-8)
font: -monotype-courier
new-medium-r-normal--13-94-99-99-m-80-adobe-standard
--
Alexander Kotelnikov
Saint-Petersburg, Russia
- Re: converting between charsets, (continued)
- Re: converting between charsets, Stefan Monnier, 2006/05/08
- Re: converting between charsets, Alexander Kotelnikov, 2006/05/09
- Re: converting between charsets, Stefan Monnier, 2006/05/09
- Re: converting between charsets, Alexander Kotelnikov, 2006/05/13
- Re: converting between charsets, Stefan Monnier, 2006/05/13
- Re: converting between charsets, Alexander Kotelnikov, 2006/05/14
- Re: converting between charsets, Stefan Monnier, 2006/05/14
- Re: converting between charsets, Alexander Kotelnikov, 2006/05/15
- Re: converting between charsets, Alexander Kotelnikov, 2006/05/15
- Re: converting between charsets, Stefan Monnier, 2006/05/15
- Re: converting between charsets,
Alexander Kotelnikov <=
- Re: converting between charsets, Stefan Monnier, 2006/05/15
- Re: converting between charsets, Alexander Kotelnikov, 2006/05/16
- Re: converting between charsets, Stefan Monnier, 2006/05/17