[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Q: something like autoload for coding-systems?
From: |
Kenichi Handa |
Subject: |
Re: Q: something like autoload for coding-systems? |
Date: |
Tue, 13 Nov 2001 11:55:15 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
Richard Stallman <address@hidden> writes:
>> Isn't it possible for find-coding-system-region to work with a much
>> smaller amount of information? For instance, it could use a list of
>> intervals (in the mathematical sense), saying that coding system FOO
>> handles character codes X through Y and A through B and C through D.
>> This would be far less information than the mapping table.
> Such a way is not that worth for many cpXXX coding systems.
> For instance, see this codepage (in codepage.el).
> If it doesn't save much for cpXXX coding systems,
> we don't have to use it for them. It will save a large amount
> for the Chinese-based coding systems.
> And, even if we adopt that way, such information is useful
> only for answering the question "Is this character encodable
> by this coding-system?". To utilize that information in
> find-coding-system-region, we must look up such lists for
> ALL coding systems for all characters in the region.
> Not necessarily. That is not the only way to implement it.
Ah, yes. For instance, we can gradually shrink the set of
coding systems to check.
> Anyway, if we use this only for the large Han-character coding
> systems, it should not be very slow, and it will get big savings.
For Emacs 21.1, we already preload all the chinese-based
coding systems. And, that preloading doesn't require much
memory because, when a coding system supports a specific
chinese-based charset, it means that it supports all
characters in that charset, not part of it.
But, for the future (Unicode-based) Emacs, of course, this
is not true. I've just checked how big the interval list
will be for GB2312 charset. I tried to make a vector of the
forms [ (FROM-CHAR . TO-CHAR) ... ], and to optimize, if
FROM-CHAR == TO-CHAR, put FROM-CHAR in a element instead of
cons. Provided that each number consumes one word, and each
cons consumes three words, the vector roughly consumes 6300
words. It's about 25K-byte. It is surely smaller than the
whole mapping table. Provided that we make such a list for
about 10 chiense-based charsets, we need about 250K-byte.
How do you think about this amount of memory?
---
Ken'ichi HANDA
address@hidden
- Re: Q: something like autoload for coding-systems?, (continued)
- Re: Q: something like autoload for coding-systems?, Kenichi Handa, 2001/11/06
- Re: Q: something like autoload for coding-systems?, Kenichi Handa, 2001/11/07
- Re: Q: something like autoload for coding-systems?, Kenichi Handa, 2001/11/08
- Re: Q: something like autoload for coding-systems?, Kenichi Handa, 2001/11/11
- Re: Q: something like autoload for coding-systems?,
Kenichi Handa <=
- Re: Q: something like autoload for coding-systems?, Kenichi Handa, 2001/11/14
- Re: Q: something like autoload for coding-systems?, Kenichi Handa, 2001/11/15