[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: size of emacs executable after unicode merge

From: Kenichi Handa
Subject: Re: size of emacs executable after unicode merge
Date: Mon, 10 Nov 2008 10:59:27 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

In article <address@hidden>, Chong Yidong <address@hidden> writes:

> Kenichi Handa <address@hidden> writes:
> > The problem is that lisp/international/characters.el setups
> > syntax-table and category-table for many characters by
> > map-charset-chars.
> >
> > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)
> >
> > To know which (Unicode) characters belongs to
> > chinese-gb2312, Emacs has to load a mapping table.

> Could you try to describe what needs to be done in more detail?  That
> way, even if you don't have time to implement this, someone else might
> be able to take a stab at it.

map-charset-chars calls FUNCTION (modify-category-entry in
the above case) on all characters in CHARSET.  But, to know
which characters belongs to CHARET (chinese-gb2312 in the
above case), we must consult with
"etc/charsets/GB2312.map".  The contents is something like

0x2121-0x2123 0x3000
0x2124 0x30FB
0x2125 0x02C9

From this file, we know that #x3000, #x3001, #x3002, #x30FB,
#x02C9, ... belong to chinese-gb2312.

We must find a way to make map-charset-chars work without
loading that map into a char-table.

One idea is to have a single boolean vector of size #x110000
(139264 bytes), setup it for CHARSET everytime when we call
map-charset-chars for the different charset.  In that
vector, only the bit for #x3000, #x3001, #x3002, etc are 1
for chinese-gb2312.  Then map-charset-chars can know for
which characters FUNCTION must be called.

Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]