[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Char-folding: how can we implement matching multiple characters as a

From: Artur Malabarba
Subject: Re: Char-folding: how can we implement matching multiple characters as a single "thing"?
Date: Tue, 1 Dec 2015 14:18:30 +0000

All right. For now, I've gone with Paul's suggestion and just made the
algorithm dumber. It won't catch every single scenario, but that's
better than catching none.

I too agree that the ideal approach would be to implement this
entirely in C, but so far we lack the necessary human effort.

There's also a 3rd option. I posted some code here a while ago that
implemented char-folding by temporarily replacing the
(current-case-table) with a char-fold-table. This was fast, and much
nicer than the current regexps, but it had the limitation of only
being a character-to-character relation. So it couldn't do something
as basic as 'a' matching "รค" (because that's 1 char matching 2).

However, it's possible that we could combine the two solutions, using
this case-table for as much as possible and then using regexps for
anything else. This way the regexp pattern that replaces each input
character would likely be considerably smaller than 45 chars (I'd
guess between 3 and 15 depending on the character).
The number of branches would still scale badly with the input string
size. but the smaller multiplicative factor should give us more leeway
before scaling up to 10k chars.

2015-11-30 21:48 GMT+00:00 John Wiegley <address@hidden>:
>>>>>> Eli Zaretskii <address@hidden> writes:
>> Volunteers are welcome to work on the ultimate solution, which should indeed
>> include normalization of both the search string and the buffer/string text
>> that is searched.
> I imagine this would be done iteratively, with caching of what had been
> normalized if we happen to back-track within a certain bound.
> Any takers for working on the "ultimate solution"?
> John

reply via email to

[Prev in Thread] Current Thread [Next in Thread]