[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: recognizing coding systems

From: Oliver Scholz
Subject: Re: recognizing coding systems
Date: Sat, 06 Nov 2004 09:45:54 +0100
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3.50 (gnu/linux)

Alexandru Cardaniuc <address@hidden> writes:

> The emacs manual says:
>    However, you can alter the priority list in detail with the command
> `M-x prefer-coding-system'.  This command reads the name of a coding
> system from the minibuffer, and adds it to the front of the priority
> list, so that it is preferred to all others.  If you use this command
> several times, each use adds one element to the front of the priority
> list."
> I added these lines to my .emacs file:
> (prefer-coding-system 'koi8-r)
> (prefer-coding-system 'cp866)
> (prefer-coding-system 'cp1251)
> after I run the command describe-coding-system I get this:
> -------------------------------------
> Priority order for recognizing coding systems when reading files:
>   1. cp1251 (alias: windows-1251 microsoft-1251 microsoft-cp1251 
> windows-cp1251 win-1251 win-cp1251)
>   2. iso-latin-1 (alias: iso-8859-1 latin-1)
>   3. iso-2022-jp (alias: junet)
>   4. iso-2022-7bit 
>   5. iso-2022-7bit-lock (alias: iso-2022-int-1)
>   6. iso-2022-8bit-ss2 
>   7. emacs-mule 
>   8. raw-text 
>   9. japanese-shift-jis (alias: shift_jis sjis)
>   10. chinese-big5 (alias: big5 cn-big5)
>   11. no-conversion (alias: binary)
>   12. mule-utf-8 (alias: utf-8)
>   Other coding systems cannot be distinguished automatically
>   from these, and therefore cannot be recognized automatically
>   with the present coding system priorities.
> -----------------------------------------
> Only the last prefer-coding-system command appears on the priority
> list for recognizing coding systems. Am I doing something wrong?

Well, you encountered a rather special case.

Short answer: the three encodings you pass to `prefer-coding-system'
in turn are special in that for these--but not for
others!---say, `(prefer-coding-system 'cp1251)' /overrides/
`(prefer-coding-system 'cp866)'.  Usually Emacs behaves as the manual
says; here you encountered an exception.

Long answer: The priority of coding systems is determined by a
variable `coding-category-list'[1].  Each coding system belongs to a
coding category.  For example, the coding system `iso-latin-1' belongs
to the category named `coding-category-iso-8-1'. You can determine the
category of a coding system by calling the function

(coding-system-category 'iso-latin-1) ==> coding-category-iso-8-1

Look at the value of `coding-category-list' (C-h v).  After
`(prefer-coding-system 'cp1251)' the first element of this list is
`coding-category-ccl'; this is the category which is tried first, when
it comes to decode text.  Each category symbol in turn is bound to the
name of a coding system.  This is also done by `prefer-coding-system'.

When Emacs decides which coding system to use, it tries each category
in turn; for each category it looks up the coding-system bound to it
and checks whether it may be used to decode the characters in
question[2].  If you do `C-h v coding-category-ccl', you'll see that it
is bound to `cp1251'.

Now the problem which surprised you: /each/ of the three encodings you
pass to `prefer-coding-system' belongs to the category
`coding-category-ccl'.  This has the effect that---unlike with coding
systems which belong to different categories---each call to
`prefer-coding-system' /overrides/ the former.


BTW, even if that were different, I wonder whether you'd see any
effect.  I'd guess (though I don't /know/) that cp1251 contains a
fairly large amount of valid characters.  

Or are there any documents encoded in koi8-r containing characters
which are not valid in cp1251?


[1] More precisely: for /decoding/ it is determined by a C data
structure which is initialized based on `coding-category-list' by the
function `set-coding-priority-internal'

[2] This happens in the C function detect_coding_mask, called from
detect_coding, called from Finsert_file_contents. For
coding-category-ccl, for instance, detect_coding_mask calls

16 Brumaire an 213 de la Révolution
Liberté, Egalité, Fraternité!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]