[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gawk: dfa bug
From: |
KIMURA Koichi |
Subject: |
gawk: dfa bug |
Date: |
Mon, 01 Aug 2005 09:07:55 +0900 |
Hi,
I think I found bug of dfa of gawk.
Situation:
In Japanese ShiftJIS locale, half-witdth katakana in character class
does not match appropriately.
Reproduce:
set LANG=ja_JP.SJIS
export LANG
echo ABCDE | sed -ne '/[A-E]\+/p'
Actually, A B C D E is half-width katakana character.
(data to reprodcue is appended at end of this mail (uuencoded SJIS data))
Result:
nothig printed.
I guess patch below solve this problem, but I'm not confident
that influence doesn't go out to other environments.
regards,
--- dfa.c.orig 2005-05-12 00:28:14.000000000 +0900
+++ dfa.c 2005-07-31 22:32:08.000000000 +0900
@@ -2890,7 +2900,8 @@ dfaexec (struct dfa *d, char const *begi
{
remain_bytes
= mbrtowc(inputwcs + i, begin + i, end - begin - i + 1, &mbs);
- if (remain_bytes <= 1)
+ if (remain_bytes < 1
+ || (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))
{
remain_bytes = 0;
inputwcs[i] = (wchar_t)begin[i];
begin 644 testkana.sh
M<V5T($Q!3D<]:F%?2E`N4TI)4PIE>'!O<address@hidden;F]T('!R:6YT"F5C!
;:&address@hidden;address@hidden"!G87=K("<O6[$MM5TK+R<*H
``
end
size 72
--
KIMURA Koichi
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- gawk: dfa bug,
KIMURA Koichi <=