[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Q: something like autoload for coding-systems?

From: Kenichi Handa
Subject: Re: Q: something like autoload for coding-systems?
Date: Tue, 13 Nov 2001 11:55:15 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

Richard Stallman <address@hidden> writes:
>>  Isn't it possible for find-coding-system-region to work with a much
>>  smaller amount of information?  For instance, it could use a list of
>>  intervals (in the mathematical sense), saying that coding system FOO
>>  handles character codes X through Y and A through B and C through D.
>>  This would be far less information than the mapping table.

>     Such a way is not that worth for many cpXXX coding systems.
>     For instance, see this codepage (in codepage.el).

> If it doesn't save much for cpXXX coding systems,
> we don't have to use it for them.  It will save a large amount
> for the Chinese-based coding systems.

>     And, even if we adopt that way, such information is useful
>     only for answering the question "Is this character encodable
>     by this coding-system?".  To utilize that information in
>     find-coding-system-region, we must look up such lists for
>     ALL coding systems for all characters in the region.

> Not necessarily.  That is not the only way to implement it.

Ah, yes.  For instance, we can gradually shrink the set of
coding systems to check.

> Anyway, if we use this only for the large Han-character coding
> systems, it should not be very slow, and it will get big savings.

For Emacs 21.1, we already preload all the chinese-based
coding systems.  And, that preloading doesn't require much
memory because, when a coding system supports a specific
chinese-based charset, it means that it supports all
characters in that charset, not part of it.

But, for the future (Unicode-based) Emacs, of course, this
is not true.  I've just checked how big the interval list
will be for GB2312 charset.  I tried to make a vector of the
forms [ (FROM-CHAR . TO-CHAR) ... ], and to optimize, if
FROM-CHAR == TO-CHAR, put FROM-CHAR in a element instead of
cons.  Provided that each number consumes one word, and each
cons consumes three words, the vector roughly consumes 6300
words.  It's about 25K-byte.  It is surely smaller than the
whole mapping table.  Provided that we make such a list for
about 10 chiense-based charsets, we need about 250K-byte.
How do you think about this amount of memory?

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]