[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hid

From: jonas . hahnfeld
Subject: Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden)
Date: Sat, 04 Apr 2020 00:16:40 -0700

On 2020/04/03 22:15:06, hanwenn wrote:
> On 2020/04/03 22:00:02, dak wrote:
> > Is this likely related to the problems in `make check` that James
> > experiences?
> Yes. 
> Unfortunately, the default encoding depends on the environment
> "
>     In text mode, if encoding is not specified the encoding used is
>     dependent: locale.getpreferredencoding(False) is called to get the
> "
> this means that -depending on locale settings- you may get ascii or
> encoding.
> I didn't get a problem at first, but if I set encoding='ascii' in the
> open_write_file definition, I also get encoding errors.

It's even more weird than that, Python changed its default in version
3.7. See also one of my commit messages from January:

commit e0c78a4c710c51e1ea87d2b144c0ae713923a2af
Author: Jonas Hahnfeld <address@hidden>
Date:   Wed Jan 15 16:39:56 2020 +0100

    Issue 5663/1: Use to decode as utf-8
    This is in preparation for Python 3.5 where the default encoding
    depends on the value of the LANG environment variable. As far as
    I can tell, this was changed later on and at least Python 3.7 and
    version 3.8 always default to 'utf-8' on Linux. As I'm proposing to
    make Python 3.5 the required minimum, we can't rely on this and need
    to force 'utf-8' when reading files that could contain Unicode.

So likely James is using Python 3.5 or 3.6, that's why some of us (with
other versions of Python) are not seeing the issue.

As such: LGTM! Please note that is not needed anymore in
Python 3, it was only needed for compatibility with Python 2.4. We
should likely replace all occurrences with plain open() as this patch

reply via email to

[Prev in Thread] Current Thread [Next in Thread]