[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine
From: |
Norihiro Tanaka |
Subject: |
bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine |
Date: |
Wed, 05 Mar 2014 08:12:42 +0900 |
Paul Eggert wrote:
> IIRC it's because a CSET matches any byte, while the corresponding
> MBCSET only matches that byte if it is a single-byte character.
> So for example, say "\x82\x61" is a two-byte character. The CSET "A"
> will match it but the corresponding MBCSET will not.
>
> This can happen in the Shift-JIS encoding.
First, I also thoutht such a case. But perhaps it's no problem, because
DFA will never come across CSET on second byte in Shift_JIS.
"grep -i A" -> [Aa] -> CSET
"grep -i $"\x82A" -> [$"\x82\x82A"$"\x82\x82"] -> \x82 A CAT \x82 \x82 CAT OR
Laster will be never \x82 [A\x82] -> \x82 CSET CAT.
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Norihiro Tanaka, 2014/03/01
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Paul Eggert, 2014/03/01
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Norihiro Tanaka, 2014/03/01
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Paul Eggert, 2014/03/03
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Paolo Bonzini, 2014/03/04
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine,
Norihiro Tanaka <=
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Paolo Bonzini, 2014/03/05
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Norihiro Tanaka, 2014/03/05
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Paolo Bonzini, 2014/03/05
- bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine, Norihiro Tanaka, 2014/03/05