Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messa

qemu-trivial

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messa

From:	Philippe Mathieu-Daudé
Subject:	Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages
Date:	Wed, 21 Nov 2018 12:39:38 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0

On 20/11/18 23:01, Eric Blake wrote:

[adding Markus in CC, since git didn't do it automatically from the'Reported-by']
On 11/20/18 3:28 PM, John Snow wrote:
On 11/20/18 3:36 PM, Eric Blake wrote:
While most developers are now using UTF-8 environments, it's
harder to guarantee that error messages will be output to
a multibyte locale. Rather than risking error messages that
get corrupted into mojibake when the user runs qemu in a
non-multibyte locale, let's stick to straight ASCII error
messages, rather than assuming that our use of UTF-8 in source
code string constants will work unchanged in other locales.

Reported-by: Markus Armbruster <address@hidden>
Signed-off-by: Eric Blake <address@hidden>
---
  hw/misc/tmp105.c | 2 +-
  hw/misc/tmp421.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
Do we have any policy in place to prohibit this in the future?
(Presumably a policy that is automatic and won't interfere with QEMU
localization efforts which may rightly attempt to use UTF-8 for those
locales.)
Not that I know of.
Do you have a script or trick to find utf-8 containing strings in our
source?
Markus found these two, probably by reading over a list resulting fromhis claim of finding 217 out of 6455 files (53 of them binary, whichdon't count):
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg04017.html

My quick and dirty attempt, which does not quite reproduce his numbers:

$ LC_ALL=C git grep -l $'[\x80-\xff]' | wc
     279     279    7490
Thus, by forcing a unibyte locale (where encoding errors are impossible)with sane range expressions (POSIX says only the C locale is required tointerpret regex ranges according to byte value - all bets are off inother locales) and using $'' to type non-UTF-8 bytes into my search, Ifound 279 files with at least one byte outside of ASCII. But the use of-l has no easy way to filter which of those files are binary; whiledropping -l claims 2138 "lines" with non-ASCII, which gets tedious toscroll through, especially considering there ARE binary files in the mix.
Narrowing the search to a more specific pattern:

$ LC_ALL=C git grep $'".*[\x80-\xff].*"' | grep -v 'Binary file' | wc
     129     685    8808
is a bit more manageable, with MOST of the hits in pc-bios/qemu.rsrc(false positive hits, due to interesting? comments), in po/ (whichdoesn't count), or in scripts/ for python. And the proof for THIS patch:
$ LC_ALL=C git grep -l $'".*[\x80-\xff].*"' origin -- '**/*.[ch]' | cat
origin:hw/misc/tmp105.c
origin:hw/misc/tmp421.c


Can we add the last 3 lines in the commit message?


Only curious, don't hold this patch up on my account. I'm not raising a
challenge.


Maybe checkpatch.pl could be taught to do a similar check?


It looks easier in shell than perl...

We could add a checkpatch.sh which finally call 'exec -l checkpatch.pl$@' or similar?


Reviewed-by: Philippe Mathieu-Daudé <address@hidden>

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-trivial] [PATCH] misc: Avoid UTF-8 in error messages, Eric Blake, 2018/11/20
- Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages, John Snow, 2018/11/20
  - Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages, Eric Blake, 2018/11/20
    - Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages, John Snow, 2018/11/20
    - Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages, Philippe Mathieu-Daudé <=
    - Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages, Markus Armbruster, 2018/11/21
- Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages, Thomas Huth, 2018/11/21
- Re: [Qemu-trivial] [PATCH] misc: Avoid UTF-8 in error messages, Laurent Vivier, 2018/11/22

Prev by Date: Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages
Next by Date: Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages
Previous by thread: Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages
Next by thread: Re: [Qemu-trivial] [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages
Index(es):
- Date
- Thread