bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] fall back to glibc matcher if a MBCSET is found


From: Jim Meyering
Subject: Re: [PATCH] fall back to glibc matcher if a MBCSET is found
Date: Sun, 12 Sep 2010 10:46:30 +0200

Paolo Bonzini wrote:
> On 09/08/2010 11:05 AM, Jim Meyering wrote:
>> Thank you for the patch.
>>
>> If this change really does fix a correctness bug,
>> then it deserves a NEWS entry with enough detail to confirm that,
>> and, if at all possible, a test suite addition.
>
> It fixes equivalence classes (e.g. matching [[=a=]] against à), but
> only --without-included-regex.  See attached patches.
>
> The presence of this check in regex.m4
>
>              if (sizeof (regoff_t) < sizeof (ptrdiff_t)
>                  || sizeof (regoff_t) < sizeof (ssize_t))
>
> unfortunately means that all existing systems will use the inferior
> gnulib regex rather than glibc regex.  In turn, this means that grep
> will nowhere support equivalence classes out-of-the-box.
>
>> Similarly, if it works around a performance problem,
>> it would help me evaluate it if you were to provide evidence.
>
> yes 1234567890123456789012345678901234567890123456789012567890 | \
>   sed 100000q | time ./grep '[a-z]'
>
> shows 0.91s with the patch and 1.21s without.  Since this is not an
> asymptotic improvement, it is hard to test it reliably, and is
> secondary anyway compared to the correctness problem above.

Hi Paolo,

That patch induces a performance *decrease* on at least one system.

Built using --without-included-regex
Run on an idle i920 @ 2.67GHz, kernel 2.6.18-194.11.3.el5PAE, i686:

  yes 1234567890123456789012345678901234567890123456789012567890 |sed 100000q > 
in
  for i in $(seq 10); do env time --f=%E env LC_ALL=fr_FR.UTF8 \
    ./grep '[a-z]' in;done

  With your patch:

  0:01.76
  0:01.76
  0:01.82
  0:01.77
  0:01.77
  0:01.84
  0:01.76
  0:01.78
  0:01.80
  0:01.80

  without it:

  0:01.71
  0:01.68
  0:01.70
  0:01.73
  0:01.72
  0:01.71
  0:01.70
  0:01.70
  0:01.71
  0:01.70

Also, on that same system, which happens to use centos 5.5 and
glibc-2.5-49.el5_5.4, your new test fails when built --without-included-regex.

Sorry I don't have time to investigate.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]