bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gawk: dfa bug


From: KIMURA Koichi
Subject: gawk: dfa bug
Date: Mon, 01 Aug 2005 09:07:55 +0900

Hi,

I think I found bug of dfa of gawk.

Situation:
In Japanese ShiftJIS locale, half-witdth katakana in character class
does not match appropriately.

Reproduce:
set LANG=ja_JP.SJIS
export LANG
echo ABCDE | sed -ne '/[A-E]\+/p'

Actually, A B C D E is half-width katakana character.
(data to reprodcue is appended at end of this mail (uuencoded SJIS data))

Result:
nothig printed.

I guess patch below solve this problem, but I'm not confident
that influence doesn't go out to other environments.

regards,

--- dfa.c.orig  2005-05-12 00:28:14.000000000 +0900
+++ dfa.c       2005-07-31 22:32:08.000000000 +0900
@@ -2890,7 +2900,8 @@ dfaexec (struct dfa *d, char const *begi
            {
              remain_bytes
                = mbrtowc(inputwcs + i, begin + i, end - begin - i + 1, &mbs);
-             if (remain_bytes <= 1)
+             if (remain_bytes < 1
+                  || (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))
                {
                  remain_bytes = 0;
                  inputwcs[i] = (wchar_t)begin[i];



begin 644 testkana.sh
M<V5T($Q!3D<]:F%?2E`N4TI)4PIE>'!O<address@hidden;F]T('!R:6YT"F5C!
;:&address@hidden;address@hidden"!G87=K("<O6[$MM5TK+R<*H
``
end
size 72

-- 
KIMURA Koichi





reply via email to

[Prev in Thread] Current Thread [Next in Thread]