[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
more on failing test 'invalid-mb-seq-UMR.sh'
From: |
Assaf Gordon |
Subject: |
more on failing test 'invalid-mb-seq-UMR.sh' |
Date: |
Sun, 5 Jun 2016 01:16:17 -0400 |
Hello,
The test 'invalid-mb-seq-UMR.sh' still fails on few systems even with the
latest update [1].
The test uses valgrind to ensure invalid multibyte sequence does not cause
uninitialized memory access,
but also validates the returned output (after the invalid multiple sequence is
processed).
The failure seems to be that the returned value does not match the expected
result (so not valgrind or invalid memory access).
At least on Mac OS X 10.10, it seems the locale 'ja_JP.EUCJP' behaves
differently than other locales, and also differ from the same locale on Debian,
when presented with invalid input.
The following demonstrates:
On Mac OS X (ja_JP.eucJP results differ):
$ for l in $(locale -a | grep ja_JP) ; do
echo "Locale: $l" ;
echo a | LC_ALL="$l" ./sed/sed 's/a/b\U\xb2c/' | od -tx1co1 ;
done
Locale: ja_JP
0000000 62 b2 43 0a
b ≤ C \n
142 262 103 012
Locale: ja_JP.eucJP
0000000 62 b2 e3 0a
b ≤ „ \n
142 262 343 012
Locale: ja_JP.SJIS
0000000 62 b2 43 0a
b ≤ C \n
142 262 103 012
Locale: ja_JP.UTF-8
0000000 62 b2 43 0a
b ≤ C \n
142 262 103 012
While on Debian 8.4, all locales return the same result:
$ for l in $(locale -a | grep ja_JP) ; do
echo "Locale: $l" ;
echo a | LC_ALL="$l" ./sed/sed 's/a/b\U\xb2c/' | od -tx1co1 ;
done
Locale: ja_JP
0000000 62 b2 43 0a
b 262 C \n
142 262 103 012
Locale: ja_JP.eucjp
0000000 62 b2 43 0a
b 262 C \n
142 262 103 012
Locale: ja_JP.ujis
0000000 62 b2 43 0a
b 262 C \n
142 262 103 012
Locale: ja_JP.utf8
0000000 62 b2 43 0a
b 262 C \n
142 262 103 012
I'm not sure where the problem is (also not that familiar with EUC encodings),
but I'll continue to investigate.
regards,
- assaf
[1]
http://git.savannah.gnu.org/cgit/sed.git/commit/?id=49a0f87d9bbc66038de74afb9c25a53cd89a4ec5
- more on failing test 'invalid-mb-seq-UMR.sh',
Assaf Gordon <=