[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: major gawk bug
From: |
Aharon Robbins |
Subject: |
Re: major gawk bug |
Date: |
Wed, 9 Jun 2004 15:08:49 +0300 |
I'm glad my patches work. I may send you some further patches
for testing.
Code using tolower() is marginally slower for things like
BEGIN {
IGNORECASE = 1
for (i = 1; i < 10000000; i++)
val += ("ONE STRING" == "one string")
print val
}
I have a fast machine, making it hard for me to judge whether the difference
is worth keeping the current code. I need to think about it some more.
I do believe that just using RE_ICASE will work and will probably make tht
the main solution for re.c.
I am also concerned about portability issues; while GLIBC tolower() is
highly functional etc, GLIBC and Linux are not my entire customer base. :-)
Arnold
> Date: Wed, 9 Jun 2004 15:20:54 +0400
> From: Stanislav Ievlev <address@hidden>
> To: Aharon Robbins <address@hidden>
> Cc: Stepan Kasal <address@hidden>, address@hidden
> Subject: Re: major gawk bug
>
> Hello,
>
> On Tue, Jun 08, 2004 at 06:59:48PM +0300, Aharon Robbins wrote:
> > > I beleive the right fix for regexes is to use RE_ICASE flag instead
> > > of the translate table.
> > > The hard-coded table is also used in gawk for various case-insensitive
> > > comparisons; these should be replaced by a call to tolower().
> > > The hard-coded table should be then removed.
> >
> > I have some tentative changes in place that work this way. It passes
> > `make check'. I am still concerned about performance, especially
> > the use of tolower().
> >
> > If you or Mr. Ievlev can test them and give me some feedback, let
> > me know and I'll send them to you.
> Arnold, your patch works well.
> (little improvement:
> - if (strcmp(cp, "C") == 0 || strcmp(cp, "POSIX") == 0)
> + if (!cp || strcmp(cp, "C") == 0 || strcmp(cp, "POSIX") == 0)
> )
>
> As I understand, we also have a solution with toupper()/tolower() functions.
>
> I agree with Stepan that these functions already have good optimization in
> glibc. Solution with toupper()/tolower() is better, because currently we
> have two translation tables (first in glibc and second in gawk) and copy one
> to other
> during initialization (load_ignorecase ), it looks strange.
>
> If interpretation of contents of these two tables is identical in gawk
> algorithms, it's eazy to replace one another.
>
> --
> With best regards
> Stanislav Ievlev
>
> ALT Linux Team.
>
>
> #####################################################################################
> This Mail Was Scanned by 012.net Anti Virus Service - Powered by TrendMicro
> Interscan
>