bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19095: [PATCH] grep: grep -F fails to match at the next position aft


From: Jim Meyering
Subject: bug#19095: [PATCH] grep: grep -F fails to match at the next position after matched middle of a multi-byte character
Date: Thu, 20 Nov 2014 14:53:57 -0800

On Tue, Nov 18, 2014 at 3:18 PM, Norihiro Tanaka <address@hidden> wrote:
> On Tue, 18 Nov 2014 09:26:26 -0800
> Jim Meyering <address@hidden> wrote:
>> Condensing your example, and being careful to run on a system for
>> which such a locale is actually installed (check via "locale -a|grep
>> -i jis"; I had to adjust the locale name on this debian unstable
>> system). Before the patch:
>>
>>   $ printf '\203AA\n'|LC_ALL=ja_JP.SHIFT_JIS src/grep -qF A||echo fail
>>   fail
>>
>> After the patch, it matches and the above command prints nothing.
>>
>> This is a good argument for making the test framework work harder
>> to find a locale like that, and if not found, to suggest how to install
>> it, so the test is not skipped so often.
>
> Thanks for the review.
>
> I tested on CentOS which did not have SHIFT_JIS locale by default.  So I
> added it before the test.  However, we can determine the name arbitrarily.
>
> On Linux I also think `ja_JP.SHIFT_JIS' is most appropriate, but
> `ja_JP.SJIS' may be also used.  And on Solaris `ja_JP.PCK' and on HP-UX
> `ja_JP.SJIS' or `japanese.SJIS' and on AIX `ja_JP.IBM-943' or `ja_JP' are
> used by default.  On the other hands, there are many locales which `jis'
> is included in the name. Further more,  EUC-JP is also called `ja_JP.ujis'.
>
> EUC-JISX0213.gz
> JIS_C6220-1969-JP.gz
> JIS_C6220-1969-RO.gz
> JIS_C6229-1984-A.gz
> JIS_C6229-1984-B-ADD.gz
> JIS_C6229-1984-B.gz
> JIS_C6229-1984-HAND-ADD.gz
> JIS_C6229-1984-HAND.gz
> JIS_C6229-1984-KANA.gz
> JIS_X0201.gz
> SHIFT_JIS.gz
> SHIFT_JISX0213.gz
>
> So It is difficult that we find whether SHIFT_JIS locale is installed or
> not on a machine.
>
> However, especially `SHIFT-JIS', `SJIS' and `PCK' will be used frequently.
> (See http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18983)
>
> By the way, most of Japanese do not use SHIFT_JIS locale on Linux, and
> the number of Japanese users has also been decreasing.
>
>> After the patch, it matches and the above command prints nothing.
>>
>> This is a good argument for making the test framework work harder
>> to find a locale like that, and if not found, to suggest how to install
>> it, so the test is not skipped so often.
>>
>> Did you determine which commit introduced the bug?
>> In this project, we make a point of including that information
>> in the commit log for any bug fix.
>
> Yes, the bug is introduced at commit fb7d53887851476c84f38ecc9a63901d5d620806.

Thanks.  I've added the condensed "git desc" style string,
"v2.18-119-gfb7d538" the commit log message.
I have made additional changes: the comment just preceding your change
had been deceptively out of date for a while, so I rewrote it.
I changed NEWS to emphasize just how much of a hard-to-reach corner
case this really is.

Attachment: 0001-grep-F-could-erroneously-fail-to-match-in-non-UTF8-m.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]