bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine


From: Norihiro Tanaka
Subject: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine
Date: Wed, 05 Mar 2014 08:12:42 +0900

Paul Eggert wrote:
> IIRC it's because a CSET matches any byte, while the corresponding
> MBCSET only matches that byte if it is a single-byte character.
> So for example, say "\x82\x61" is a two-byte character.  The CSET "A"
> will match it but the corresponding MBCSET will not.
> 
> This can happen in the Shift-JIS encoding.

First, I also thoutht such a case.  But perhaps it's no problem, because
DFA will never come across CSET on second byte in Shift_JIS.

  "grep -i A" -> [Aa] -> CSET
  "grep -i $"\x82A" -> [$"\x82\x82A"$"\x82\x82"] -> \x82 A CAT \x82 \x82 CAT OR

Laster will be never \x82 [A\x82] -> \x82 CSET CAT.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]