bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: major gawk bug


From: Stepan Kasal
Subject: Re: major gawk bug
Date: Tue, 8 Jun 2004 15:47:10 +0200
User-agent: Mutt/1.4.1i

Hello Arnold,

On Tue, Jun 08, 2004 at 04:07:45PM +0300, Aharon Robbins wrote:
> RE_ICASE is not the answer. That flag is for use when the regex is
> *compiled*.  It would be a performance killer to have to recompile every
> regex if IGNORECASE changed since the last time the regex was compiled.

RE_ICASE is the answer, this bit is in fact part of the regex specification
so it's fair that it's compiled in.

I beleive that if the user changes IGNORECASE frequently, he is asking
for recompilation of regexes.  So the "normal" behaviour is to recompile
them.  We can make some tricks to improve performance, but it's
"advanced".

> Believe me, if regexec() had a flag to ignore case, I'd be using it.

If I were to implement such a feature of regexec, I'd basically made place
for two compiled versions, compiled both in regcomp time and used the
appropriate one.

gawk already compiles certain regexes twice, if they are specified as RS or
FS.  I proposed an optimization: compile on demand (so in most cases only one
version is compiled), but you didn't accept it.

For "normal" regexes, I propose the same: keep two compiled versions:
case-sensitive and case-insensitive.  From performance point of view,
it is important to compile them on demand.

I think that all this functionality has to be inside re.c, most of it
inside research().
When it is there, the FS and RS code in field.c and io.c can also take
advantage of this functionality.  (But field.c and io.c has still take
care of the problems which might arise from "deferred parsing"
optimization, of course.)

Yours,
        Stepan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]