[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: recognizing coding systems
Re: recognizing coding systems
Sat, 06 Nov 2004 09:45:54 +0100
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3.50 (gnu/linux)
Alexandru Cardaniuc <address@hidden> writes:
> The emacs manual says:
> However, you can alter the priority list in detail with the command
> `M-x prefer-coding-system'. This command reads the name of a coding
> system from the minibuffer, and adds it to the front of the priority
> list, so that it is preferred to all others. If you use this command
> several times, each use adds one element to the front of the priority
> I added these lines to my .emacs file:
> (prefer-coding-system 'koi8-r)
> (prefer-coding-system 'cp866)
> (prefer-coding-system 'cp1251)
> after I run the command describe-coding-system I get this:
> Priority order for recognizing coding systems when reading files:
> 1. cp1251 (alias: windows-1251 microsoft-1251 microsoft-cp1251
> windows-cp1251 win-1251 win-cp1251)
> 2. iso-latin-1 (alias: iso-8859-1 latin-1)
> 3. iso-2022-jp (alias: junet)
> 4. iso-2022-7bit
> 5. iso-2022-7bit-lock (alias: iso-2022-int-1)
> 6. iso-2022-8bit-ss2
> 7. emacs-mule
> 8. raw-text
> 9. japanese-shift-jis (alias: shift_jis sjis)
> 10. chinese-big5 (alias: big5 cn-big5)
> 11. no-conversion (alias: binary)
> 12. mule-utf-8 (alias: utf-8)
> Other coding systems cannot be distinguished automatically
> from these, and therefore cannot be recognized automatically
> with the present coding system priorities.
> Only the last prefer-coding-system command appears on the priority
> list for recognizing coding systems. Am I doing something wrong?
Well, you encountered a rather special case.
Short answer: the three encodings you pass to `prefer-coding-system'
in turn are special in that for these--but not for
others!---say, `(prefer-coding-system 'cp1251)' /overrides/
`(prefer-coding-system 'cp866)'. Usually Emacs behaves as the manual
says; here you encountered an exception.
Long answer: The priority of coding systems is determined by a
variable `coding-category-list'. Each coding system belongs to a
coding category. For example, the coding system `iso-latin-1' belongs
to the category named `coding-category-iso-8-1'. You can determine the
category of a coding system by calling the function
(coding-system-category 'iso-latin-1) ==> coding-category-iso-8-1
Look at the value of `coding-category-list' (C-h v). After
`(prefer-coding-system 'cp1251)' the first element of this list is
`coding-category-ccl'; this is the category which is tried first, when
it comes to decode text. Each category symbol in turn is bound to the
name of a coding system. This is also done by `prefer-coding-system'.
When Emacs decides which coding system to use, it tries each category
in turn; for each category it looks up the coding-system bound to it
and checks whether it may be used to decode the characters in
question. If you do `C-h v coding-category-ccl', you'll see that it
is bound to `cp1251'.
Now the problem which surprised you: /each/ of the three encodings you
pass to `prefer-coding-system' belongs to the category
`coding-category-ccl'. This has the effect that---unlike with coding
systems which belong to different categories---each call to
`prefer-coding-system' /overrides/ the former.
BTW, even if that were different, I wonder whether you'd see any
effect. I'd guess (though I don't /know/) that cp1251 contains a
fairly large amount of valid characters.
Or are there any documents encoded in koi8-r containing characters
which are not valid in cp1251?
 More precisely: for /decoding/ it is determined by a C data
structure which is initialized based on `coding-category-list' by the
 This happens in the C function detect_coding_mask, called from
detect_coding, called from Finsert_file_contents. For
coding-category-ccl, for instance, detect_coding_mask calls
16 Brumaire an 213 de la Révolution
Liberté, Egalité, Fraternité!