emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character group folding in searches


From: Eli Zaretskii
Subject: Re: Character group folding in searches
Date: Sat, 07 Feb 2015 17:31:57 +0200

> From: Stefan Monnier <address@hidden>
> Cc: address@hidden, address@hidden
> Date: Sat, 07 Feb 2015 10:02:52 -0500
> 
> To me the simplest option is to have a DFA which returns an integer
> (this integer being "the equivalence class number", and which will
> usually be one of the characters in the equivalence class).
> 
> Each DFA node could be a char-table.  So if all equivalence classes are
> made up of single-chars, the DFA collapses is just a plain-old
> char-table mapping chars to the canonical element of their
> equivalence classes.  For 2-char elements, we'll arrange for the
> entry for the first char (in the main char-table) to be not an integer
> but another char-table.  Being a DFA, this could easily handle complex
> elements (matching arbitrary regular expressions), tho whether we'd make
> much use of this particular feature is not very important.

I'm sorry, I don't understand how this will solve the use-cases
brought up in this thread.  Can you explain?

The use-cases I have in mind are:

  . exact match -- only exactly the same codepoints match

  . base-character match -- this ignores any combining marks,
    diacriticals, etc.

  . matching ligatures, such as ffi and ffi

  . ignoring punctuation, like string-collate-equalp does,
    i.e. "foobar" will match "foo.bar"

  . ignoring isolated zero-width or non-combining marks and
    directional controls

I understand very well how these can be handled by several different
char-tables, but you seem to say that a single char-table can do all
this, and I don't see how.

Also, what does DFA have to do with all this?

> Since some of the nodes in the DFA would likely only handle a very few
> chars specially, we could later improve the representation so that those
> nodes don't use up a whole char-table.

Now I'm completely confused: char-tables don't need this optimization,
as you well know: they already are space-efficient for storing
characters that map to the table's default value.  So I probably
misunderstand your whole idea, if it does need such an optimization.

> PS: And this same kind of "char-table extended into a DFA" could be
> useful for syntax-tables in order to provide much more flexible support
> for multi-character comment markers or "paren-like nested elements".

If that's your itch to scratch, I'm impatiently waiting for patches ;-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]