Re: [PATCH v2] decodetree: Open files with encoding='utf-8'

On Fri, Jan 8, 2021 at 10:58 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
>
> On Fri, Jan 08, 2021 at 07:09:52PM +0100, Philippe Mathieu-Daudé wrote:
> > When decodetree.py was added in commit 568ae7efae7, QEMU was
> > using Python 2 which happily reads UTF-8 files in text mode.
> > Python 3 requires either UTF-8 locale or an explicit encoding
> > passed to open(). Now that Python 3 is required, explicit
> > UTF-8 encoding for decodetree source files.
> >
> > To avoid further problems with the user locale, also explicit
> > UTF-8 encoding for the generated C files.
> >
> > Explicit both input/output are plain text by using the 't' mode.
>
> I believe the 't' is unnecessary. But it's harmless and makes it
> more explicit.
>
> >
> > This fixes:
> >
> > $ /usr/bin/python3 scripts/decodetree.py test.decode
> > Traceback (most recent call last):
> > File "scripts/decodetree.py", line 1397, in <module>
> > main()
> > File "scripts/decodetree.py", line 1308, in main
> > parse_file(f, toppat)
> > File "scripts/decodetree.py", line 994, in parse_file
> > for line in f:
> > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> > return codecs.ascii_decode(input, self.errors)[0]
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
> > ordinal not in range(128)
> >
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
> However:
>
> > ---
> > v2: utf-8 output too (Peter)
> > explicit default text mode.
> > ---
> > scripts/decodetree.py | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> > index 47aa9caf6d1..d3857066cfc 100644
> > --- a/scripts/decodetree.py
> > +++ b/scripts/decodetree.py
> > @@ -1304,7 +1304,7 @@ def main():
> >
> > for filename in args:
> > input_file = filename
> > - f = open(filename, 'r')
> > + f = open(filename, 'rt', encoding='utf-8')
> > parse_file(f, toppat)
> > f.close()
> >
> > @@ -1324,7 +1324,7 @@ def main():
> > prop_size(stree)
> >
> > if output_file:
> > - output_fd = open(output_file, 'w')
> > + output_fd = open(output_file, 'wt', encoding='utf-8')

I misunderstand the cause, this is a better way

> > else:
> > output_fd = sys.stdout
>
> This will still use the user locale encoding for sys.stdout. Can
> be solved with:
>
> output_fd = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

For output to console/terminal. I suggest to use

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding=sys.stdout.encoding, errors="ignore")

When the console/terminal encoding still can not represent the char in the decodetree, still won't

cause script failure. And that failure can not be fixed by other means.

errors="ignore" are important, from my experince, even there is `char` can not represent

in utf8

>
> (Based on a suggestion from Yonggang Luo)
>

> --
> Eduardo
>

--
此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo

From:	Yonggang Luo
Subject:	Re: [PATCH v2] decodetree: Open files with encoding='utf-8'
Date:	Fri, 8 Jan 2021 21:41:29 -0800