[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: more on failing test ''

From: Assaf Gordon
Subject: Re: more on failing test ''
Date: Fri, 17 Jun 2016 00:06:34 -0400


> On Jun 5, 2016, at 01:16, Assaf Gordon <address@hidden> wrote:
> The test '' still fails on few systems even with the 
> latest update [1].

The search continues...

I think I noticed a strange (wrong?) behavior, specific to Mac OS X (perhaps 
few other OSes) with ja_JP.eucJP and ja_JP.sjis.
It seems with these locales, mbrtowc(3) returns incorrect results.
Test program attached: it calls mbrtowc(3) trying to convert a string starting 
with '\262' (=\xB2).
This is an invalid UTF-8 character, but valid ja_JP.shiftjis character.
I haven't yet found an authoritative answer as to whether it is a valid 
ja_JP.eucJP character, but I suspect it is not:

  $ env printf '\262' | iconv -f EUC-JP -t UTF-16BE
  iconv: (stdin):1:0: incomplete character or shift sequence

Tested with:
    gcc -o test-ilseq test-ilseq.c
    for l in $(locale -a | grep ja_JP\. ) ; do
       echo LOCALE=$l ; LC_ALL=$l ./test-ilseq

On Ubuntu 14.04, results seem correct:

  test-ilseq: mbrtowc failed (n=-1): Invalid or incomplete multibyte or wide 
  mbtowc returned 1, wc = 65394 / ff72
  test-ilseq: mbrtowc failed (n=-1): Invalid or incomplete multibyte or wide 

On Mac OS X, results are strange:
1.  The conversion succeeds in 'eucJP', and also produces 2 characters.
This is a source of the failed test in sed (,
as this consumes 1 byte from the input string, and produces two bytes.

2. The conversion is incorrect in 'SJIS' - should return 2-bytes, 0xFF72, not
one byte 0xB2 (which is just copied from the input).

  mbtowc returned 2, wc = 45795 / b2e3
  mbtowc returned 1, wc = 178 / b2
  test-ilseq: mbrtowc failed (n=-1): Illegal byte sequence

Solution might be to
1. change to locale test in '' to ja_JP.UTF-8
2. use gnulib's mbrtowc() in such cases (though quite hard to detect, if the 
system doesn't have ja_JP.eucJP locales).

to be continued,
 - assaf

Attachment: test-ilseq.c
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]