help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coding system prefer


From: Sergio
Subject: Re: Coding system prefer
Date: Wed, 4 Mar 2009 03:01:58 -0800 (PST)
User-agent: G2/1.0

On 4 мар, 15:11, Peter Dyballa <address@hidden> wrote:
> Am 03.03.2009 um 21:55 schrieb Maze:

>> Can't  believe, that Emacs don't auto-detect these encodings...

> How do you detect the encodings?
> Can you write the algorithm? I mean, not in Emacs Lisp, just in English?

The FAR file manager, http://en.wikipedia.org/wiki/FAR_Manager does it
quite
reliably using statistics about the character frequency distribution.
The tables themselves are quite small (about 1 Kbyte); the 8-bit
encodings are language-dependent, the Unicode encodings are
autodetected in a more general way.

Here are the tables I have for Cyrillic:

,----
| c:/Program Files/FAR/Addons/Tables/Cyrillic:
| total used in directory 12 available 29790096
| drwxrwxrwx  1 spokrovs Domain Users    0 09-22 19:24 .
| drwxrwxrwx  1 spokrovs Domain Users    0 09-22 19:24 ..
| drwxrwxrwx  1 spokrovs Domain Users    0 09-22 19:24 E-Mail Double
Conversion
| -rw-rw-rw-  1 spokrovs Domain Users 1079 2005-07-04  DKOI8
(Mainframe).reg
| -rw-rw-rw-  1 spokrovs Domain Users 1063 2005-07-01  DM (Amiga).reg
| -rw-rw-rw-  1 spokrovs Domain Users  723 2005-07-04  Descript.ion
| -rw-rw-rw-  1 spokrovs Domain Users 1111 2006-02-13  Dist.Rus.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1153 2006-02-13  Dist.Ukr.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1108 2006-03-23  ISO-8859-5.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1084 2006-03-23  KOI8-R.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1016 2006-03-23  KOI8-U.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1186 2006-03-23  Macintosh
Standard.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1104 2005-07-01  RUSCII (GOST
Ukrainian).reg
| -rw-rw-rw-  1 spokrovs Domain Users 1073 2006-03-23
Windows-1251.reg
`----

I think there is a similar package in emacs, although its emphasis is
on language recognition rather then on the encoding.

--
Sergei



reply via email to

[Prev in Thread] Current Thread [Next in Thread]