[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character group folding in searches

From: Stefan Monnier
Subject: Re: Character group folding in searches
Date: Mon, 09 Feb 2015 11:33:29 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

> I guess I'm still struggling to understand your idea of using DFAs.
> E.g., you talk about each node of a DFA being a char-table, but AFAIK
> a DFA node is just a state of the automaton, so how can that be

A DFA node is a state with labeled arcs going out to other states.

It's usually implemented as a "table" (array, hash-table, char-table, ...)
that maps the labels to the next state.

Does it make more sense, now?

>> But how do you use current char-tables to handle multi-char input
>> entities (i.e. to recognize things like "=>")?
> I don't understand the question, sorry.  The simple answer is that a
> char-table entry can be any Lisp object, including a string, but you
> already know that.

That doesn't tell me how you'd use it.  Would the ?= char be mapped to
a list of strings (one of them being "=>") and then you'd check if the
next (few) chars match one of those strings?
What I suggest is to map the ?= char to another char-table which then
maps the ?> char to (say) ?⇒.

> If you mean how to compare "=>" with "⇒", then the latter will be
> "folded" to the former using a char-table,

[ I always get confused by this terminology since "folding" to me
  implies making things smaller, so I'd call it "unfolding" in that
  direction.  ]

> and then the results will be compared, either as strings or character
> by character.  Is this what you were asking?

But how would this handle an equivalence class that includes both "=>"
and "->"?

>> > Who and how will create such a DFA?
>> They'd be mechanically constructed (by hand-written code), for example
>> driven by the existing Unicode tables.
> What would be the input language for specifying such a DFA?  I mean,
> how would we specify which sequence of states are acceptable (yielding
> a match for the search) and which aren't?

Depends.  For the Unicode-defined equivalence classes, we'd use the
Unicode tables directly and build the DFA nodes from it without going
through some intermediate "specification".

For other cases, we could specify the DFA with a list of strings.
Or with regular expressions.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]