[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: size of emacs executable after unicode merge

From: Kenichi Handa
Subject: Re: size of emacs executable after unicode merge
Date: Fri, 31 Oct 2008 14:29:28 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

In article <address@hidden>, "Richard M. Stallman" <address@hidden> writes:

>     If I comment the load_charset_map_from_file call in unify_charset the
>     data segment size is back to normal.

> Although these are loaded "on demand", perhaps something "demands" them
> at build time.

It's not that simple.  This is the strategy of the charset
map loading mechanism.  I took that approach expecting that
char-tables that are garbage-collected before dumping are
not in the dumped file.

(0) At first, Emacs assigns a unique linear character code
    space in upper Unicode area (#x110000-) to each big
    character set (e.g. GB, JIS, KSC) (*see the note at the
    tail).  The decoding of a character of a specific
    charset into this area is quite fast (done just by a few
    steps of arithmetic calculation).  Encoding is the same

(1) While building Emacs, when unify-charset is called, we
    update two char-tables Vchar_unify_table, and
    Vchar_unified_charset_table.  The former maps a
    character in the above upper area to Unicode area, and
    the latter maps the character to charset symbol.
    Unify-charset also builds deunifier char-table for each
    charater set that maps a character in Unicode area to
    the upper area that is unique to each charset.

    So at this time, the full maps is build.

(2) Just before dumping, clear-charset-maps is called.  This
    function sets all char-tables built in (1) (except for
    Vchar_unified_charset_table) to nil.  Then set
    Vchar_unify_table to Vchar_unified_charset_table, and
    set Vchar_unified_charset_table to nil.

    Then, garbage-collect is called.  After that, the living
    char-table is Vchar_unify_table only, and the contents
    is not that big because it maps upper area characters to
    charset, and each charset has linear upper area, thus
    most succeeding charaters have the same value.

(3) When the dumped Emacs runs, at the time of
    decoding/encoding charsets that are unified as above, by
    checking if the value of Vchar_unify_table for a
    character is symbol or not, Emacs knows whether it has
    to load the mapping table again or not.

    So, that way, Emacs loads maps on demand.


The reason Emacs assigns those linear area is because such
big charsets tend to have their own private use area, and we
must keep a unique characte code for them.  Those private
characters are decoded and encoded without being mapped to
Unicode are.

Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]