sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] sed: fix minor multibyte parsing bug


From: Jim Meyering
Subject: Re: [PATCH] sed: fix minor multibyte parsing bug
Date: Sat, 25 Jun 2016 10:06:27 -0600

On Wed, Jun 22, 2016 at 6:45 PM, Assaf Gordon <address@hidden> wrote:
> Hello,
>
> The attached patch fixes two somewhat obscure bugs in parsing of multibyte 
> characters under very specific conditions.
>
> Sadly it seems that "ja_JP.SJIS" locale under Mac OS X is buggy, and it 
> treats invalid sequences like '\203\060' as valid, so one test fails on Mac 
> OS X (similar to the situation with 'invalid-mb-sea-umr.sh').
>
> Perhaps a more robust solution is to add a detailed M4 configuration file to 
> detect valid Japanese locales, and also pull-in gnulib's mbrtowc replacement.

Hi Assaf,
Thank you for finding/fixing those bugs!
Since they are bugs, this patch deserves a NEWS entry, which requires
to identify (where possible and not too hard to do) which older
versions are affected.

Please also make these syntactic adjustments:

  - In prose (comments, documentation), use the so-called active
voice, not the "passive voice", i.e., make this change in at least two
comments:
     - * Returns non-zero if 'ch' is...
     + * Return non-zero if CH is...
    Note how that change also converts 'ch' to CH: this is from the
gnu coding guidelines: refer to variable names using ALL_CAPS.

  - please stick with the GNU formatting style, i.e.,
    - space-before-function-open-paren: s/ IS_MB_CHAR(/ IS_MB_CHAR (/
    - in a function definition, put its type and name on separate
lines: i.e., change "int is_mb_char(" to "int\nis_mb_char ("

I agree: I would like to skip (warning of broken OS X locale) any test
like mb-charclass-non-utf8, and think we can do that without resorting
to compile+run tests. This prints 1 with the broken locale (both with
coreutils' wc and with /usr/bin/wc), yet prints 2 as one would expect
on linux-based systems:

  printf '\203\060'|LC_ALL=ja_JP.eucJP wc -m



reply via email to

[Prev in Thread] Current Thread [Next in Thread]