emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: idn.el and confusables.txt


From: Eli Zaretskii
Subject: Re: idn.el and confusables.txt
Date: Sat, 14 May 2011 17:38:11 +0300

> From: Ted Zlatanov <address@hidden>
> Date: Sat, 14 May 2011 08:40:48 -0500
> 
> EZ> You see, the uni-*.el files we create out of the Unicode DB are not
> EZ> used anywhere in application code, AFAIK.  We use them to display
> EZ> character properties in the likes of "C-u C-x =", and that's it.  I'm
> EZ> not even sure they are organized in a way that makes them useful.
> 
> markchars.el could use other Unicode properties if people ask.

I'm talking about the details.  The way we currently set the tables in
uni-*.el is that many of the values are symbols.  For example:

  (get-char-code-property ?1 'general-category) => Nd
  (get-char-code-property ?א 'bidi-class) => R
  (get-char-code-property ?\( 'mirrored) => Y

The `Nd', `R', and `Y' are symbols.

Now, suppose you wanted to use these values in some code that needs to
be fast -- how would you feel about having to write multi-branch
`cond' forms to compare the value against all the possibilities?

For bidi reordering, which runs in the innermost loop of the display
engine, using the `bidi-class' or `mirrored' properties that are
symbols would be prohibitively expensive.

For now, with markchars.el, all you need is a boolean value for each
character.  However, in other use cases, some other Lisp code will
want the paired character.  Yet another application will want to
compare characters such that confusable pairs will compare equal.  Can
a single table satisfy all these needs efficiently?  Maybe it can, but
we need to design that table carefully.

> But specifically regarding the ones I'm proposing for inclusion,
> since we've started using the GNU ELPA more and markchars.el lives
> in it, we can put uni-confusables.el and uni-idn.el in the GNU ELPA
> instead of the Emacs trunk.

I'm not arguing about where to put them.  I'm saying that for such
basic infrastructure, we should consider the possible uses before we
rush into implementation.  Otherwise, we will again repeat the same
mistake, whose result is that the only real user of bidirectional
properties cannot use uni-bidi.el!

> EZ> So I'd really like to avoid introducing yet another huge table whose
> EZ> only effects are to show one more property in "C-u C-x =" and bloat
> EZ> the ELisp manual some more.
> 
> IMO it's not a huge table

??? It's a char-table that can be indexed by any character supported
by Emacs.  Even if you count only the characters mentioned in
confusables.txt, there are 20 thousand of them.  char-tables are
memory-efficient, but their footprint is not negligible.

The bloat may be insignificant by comparison, but if the _only_ useful
effect is the bloat, why should we do that?

> Also the char-table doesn't have to
> cover the Asian confusables--I'm not sure anyone would need those.

Well, the Unicode consortium definitely thought they were needed.
Either we follow established standards, or we don't.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]