[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: size of emacs executable after unicode merge
From: |
Kenichi Handa |
Subject: |
Re: size of emacs executable after unicode merge |
Date: |
Fri, 31 Oct 2008 14:29:28 +0900 |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) |
In article <address@hidden>, "Richard M. Stallman" <address@hidden> writes:
> If I comment the load_charset_map_from_file call in unify_charset the
> data segment size is back to normal.
> Although these are loaded "on demand", perhaps something "demands" them
> at build time.
It's not that simple. This is the strategy of the charset
map loading mechanism. I took that approach expecting that
char-tables that are garbage-collected before dumping are
not in the dumped file.
(0) At first, Emacs assigns a unique linear character code
space in upper Unicode area (#x110000-) to each big
character set (e.g. GB, JIS, KSC) (*see the note at the
tail). The decoding of a character of a specific
charset into this area is quite fast (done just by a few
steps of arithmetic calculation). Encoding is the same
too.
(1) While building Emacs, when unify-charset is called, we
update two char-tables Vchar_unify_table, and
Vchar_unified_charset_table. The former maps a
character in the above upper area to Unicode area, and
the latter maps the character to charset symbol.
Unify-charset also builds deunifier char-table for each
charater set that maps a character in Unicode area to
the upper area that is unique to each charset.
So at this time, the full maps is build.
(2) Just before dumping, clear-charset-maps is called. This
function sets all char-tables built in (1) (except for
Vchar_unified_charset_table) to nil. Then set
Vchar_unify_table to Vchar_unified_charset_table, and
set Vchar_unified_charset_table to nil.
Then, garbage-collect is called. After that, the living
char-table is Vchar_unify_table only, and the contents
is not that big because it maps upper area characters to
charset, and each charset has linear upper area, thus
most succeeding charaters have the same value.
(3) When the dumped Emacs runs, at the time of
decoding/encoding charsets that are unified as above, by
checking if the value of Vchar_unify_table for a
character is symbol or not, Emacs knows whether it has
to load the mapping table again or not.
So, that way, Emacs loads maps on demand.
*Note:
The reason Emacs assigns those linear area is because such
big charsets tend to have their own private use area, and we
must keep a unique characte code for them. Those private
characters are decoded and encoded without being mapped to
Unicode are.
---
Kenichi Handa
address@hidden
- Re: size of emacs executable after unicode merge, Emanuele Giaquinta, 2008/10/30
- Re: size of emacs executable after unicode merge, Eli Zaretskii, 2008/10/30
- Re: size of emacs executable after unicode merge, Richard M. Stallman, 2008/10/30
- Re: size of emacs executable after unicode merge,
Kenichi Handa <=
- Re: size of emacs executable after unicode merge, Chong Yidong, 2008/10/31
- Re: size of emacs executable after unicode merge, Kenichi Handa, 2008/10/31
- Re: size of emacs executable after unicode merge, Eli Zaretskii, 2008/10/31
- gdb error [Re: size of emacs executable after unicode merge], Kenichi Handa, 2008/10/31
- Re: gdb error [Re: size of emacs executable after unicode merge], Eli Zaretskii, 2008/10/31
- Re: size of emacs executable after unicode merge, Stephen Berman, 2008/10/31
- Re: size of emacs executable after unicode merge, YAMAMOTO Mitsuharu, 2008/10/31
- Re: size of emacs executable after unicode merge, Dan Nicolaescu, 2008/10/31
- Re: size of emacs executable after unicode merge, Stefan Monnier, 2008/10/31
- Re: size of emacs executable after unicode merge, Richard M. Stallman, 2008/10/31