[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coding system prefer

From: Sergio
Subject: Re: Coding system prefer
Date: Thu, 5 Mar 2009 00:06:14 -0800 (PST)
User-agent: G2/1.0

On Mar 5, 10:19 am, Miles Bader <address@hidden> wrote:
> Sergio <address@hidden> writes:
>> The FAR file manager, it
>> quite reliably using statistics about the character frequency
>> distribution.

> Does that work for anything except text files containing prose?

Yes, it does.

Of course it does not work for a binary file; but it works fine for a
text file in formal language, like C program with Russian strings or a
text with HTML markup.

I never explored the internals, but I guess that normally one can
ignore the ASCII part; only codes greater than 127 really matter.  Of
these, one can easily detect utf-8 or other unicode encoding (at least
for the alphabetic planes; I never need the CJK part).  And there are
8-bit codes, in which the higher part is characteristic.

And usually the noise part (like markup or formal language statements)
is in ASCII.

I never needed EBCDIC or any other encoding which is not a superset of


reply via email to

[Prev in Thread] Current Thread [Next in Thread]