bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing i


From: Jim Meyering
Subject: Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale)
Date: Sat, 04 Jun 2011 09:48:22 +0200

Jim Meyering wrote:

> Paolo Bonzini wrote:
>
>> On 06/02/2011 11:08 PM, Jim Meyering wrote:
>>>   #if MBS_SUPPORT
>>> -      int b2 = wctob ((unsigned char) b);
>>> -      if (b2 == EOF || b2 == b)
>>> +      /* Below, note how when b2 != b and we have a uni-byte locale
>>> +         (MB_CUR_MAX == 1), we set b = b2.  I.e., in a uni-byte locale,
>>> +         we can safely call setbit with a non-EOF value returned by wctob. 
>>>  */
>>> +      int b2 = wctob (b);
>>> +      if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0))
>>
>> Can you explain again the reason for testing "b2 == EOF"?  It seems
>> wrong, and without it you can just make
>>
>> if (MB_CUR_MAX == 1 || b2 == b)
>>   setbit ((unsigned char) b, c);
>
> Hi Paolo,
>
> Your test would disable DFA-based matching for some bytes in a locale like
> ru_RU.KOI8-R, because a pattern like [\360] leads to "wint_t b" having
> the value 1055 (0x041F), and that is obviously too large to
> be used as the first argument to setbit.  However, converting
> that "B" back to a single-byte value, B2, gives us back \360,
> which is ok to use there.  Hence the "(b=b2)" part of that
> admittedly ugly expression.
>
> The b2 == EOF part is required for the somewhat similar bug I fixed
> a month ago:
>
>     fix a bug whereby echo c|grep '[c]' would fail for any c in 0x80..0xff
>     8da41c930e03a8635cbd8c89e3e591374c232c89
>
> The corresponding test demonstrates the need:
>
>     tests: exercise bug with 0x80..0xff in [...]
>     d98338ebf842ec9b69631837eee50ebdcd543505
>
> Thanks for the feedback.
> If you see a better way, I'm sure you'll let me know.
>
> BTW, seeing your cast, I now think it'd be prudent to
> guard that setbit use:
>
> #if MBS_SUPPORT
>       /* Below, note how when b2 != b and we have a uni-byte locale
>          (MB_CUR_MAX == 1), we set b = b2.  I.e., in a uni-byte locale,
>          we can safely call setbit with a non-EOF value returned by wctob.  */
>       int b2 = wctob (b);
>       if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0))
> #endif
>         if (b < 256)
>           setbit (b, c);

Ahem.

s/256/NOTCHAR/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]