bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #13859] matching '([A]|[B]){2}' in different locales


From: Stepan Kasal
Subject: Re: [bug #13859] matching '([A]|[B]){2}' in different locales
Date: Thu, 21 Jul 2005 11:25:24 +0200
User-agent: Mutt/1.4.1i

Hello Charles,

On Wed, Jul 20, 2005 at 05:51:14PM -0400, Charles Levert wrote:
> This begs the question:  what does grep's
> built-in DFA actually provides (in terms of
> performance, chiefly) once the regexp code has
> been updated to glibc/gnulib's current version?

Arnold has removed DFA from gawk when he adopted the new regexp.

Later on, he had to back out this change, because the performance
went down too much, at least for some regexps.

> Is there still a justification for grep's DFA?

So yes, dfa.c is needed, at least for C (a.k.a. POSIX) locale.

I'm afraid that a lot of work still has to be done with regex.c
before we can drop dfa.c.

> Does the answer depend on the chosen locale?

According to Tim's research, dfa.c doesn't speed things up for UTF-8.
Actually, for locales != C the performance is not so critical.

So it sounds good to use dfa.c for C locale only.

But we cannot remove the locale code from dfa.c, since gawk uses our
dfa.c for all locales.

Stepan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]