[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: faster unicode character name completion

From: Kenichi Handa
Subject: Re: faster unicode character name completion
Date: Tue, 08 Dec 2009 10:45:56 +0900

In article <address@hidden>, Stefan Monnier <address@hidden> writes:

>>> I don't understand what ucs-name-filter is trying to do.

> > ?? It simply filters out elements that doesn't match with
> > STR from NAMES (alist).

> But then why is it needed?
> Doesn't `completion-table-dynamic' take care of that already?

I don't know.  The info says this:

 -- Function: completion-table-dynamic function
     This function is a convenient way to write a function that can act
     as programmed completion function.  The argument FUNCTION should be
     a function that takes one argument, a string, and returns an alist
     of possible completions of it.  You can think of
     `completion-table-dynamic' as a transducer between that interface
     and the interface for programmed completion functions.

I thought that FUNCTION should return an alist that contains
ONLY valid completions.

> But I have a better idea: most of the time is not spent building the
> completion table, but rather just weeding out all the "chars" that don't
> have names, or should I say, looking for the few rare chars that do
> have a name.

> So the patch below seems to eb a good compromise: it uses up just about
> 1000K cons cells (i.e. 16KB on 64bit systems) to keep the precomputed
> set of ~34K chars that do have a name, so that building the completion
> table takes only a couple seconds.

Ah, interesting approach.  But, I've just found that
dotimes-with-progress-reporter of the original code didn't
exclude the big unused range U+30000..U+DFFFF (about 75% of
the range currently checked).  Just excluding that part in
the original code achieves almost the same performance as
your patch.   Attached is that simpler version.

Kenichi Handa

(defun ucs-names ()
  "Return alist of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
  (or ucs-names
      (let ((ranges
             '((#x00000 . #x033FF)
               ;; (#x03400 . #x04DBF) CJK Ideograph Extension A
               (#x04DC0 . #x04DFF)
               ;; (#x04E00 . #x0x09FFF) CJK Ideograph
               (#x0A000 . #x0D7FF)
               ;; (#x0D800 . #x0FAFF) Surrogate/Private
               (#x0FB00 . #x1FFFF)
               ;; (#x20000 . #xDFFFF) CJK Ideograph Extension A, B, etc, unsed
               (#xE0000 . #xE01EF)))
            c end name names)
        (dolist (range ranges)
          (setq c (car range)
                end (cdr range))
          (while (<= c end)
            (if (setq name (get-char-code-property c 'name))
                (push (cons name c) names))
            (if (setq name (get-char-code-property c 'old-name))
                (push (cons name c) names))
            (setq c (1+ c))))
        (setq ucs-names names))))

reply via email to

[Prev in Thread] Current Thread [Next in Thread]