bug#20623: XML and HTML files with encoding/charset="utf-8" declaration

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20623: XML and HTML files with encoding/charset="utf-8" declaration

From:	Vincent Lefevre
Subject:	bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date:	Sat, 11 Aug 2018 17:41:01 +0200
User-agent:	Mutt/1.10.1+58 (10c1ac4b) vl-108074 (2018-07-29)

On 2018-08-11 13:45:17 +0300, Eli Zaretskii wrote:
> > Date: Sat, 11 Aug 2018 12:13:41 +0200
> > From: Vincent Lefevre <vincent@vinc17.net>
> > Cc: monnier@iro.umontreal.ca, rgm@gnu.org, sledergerber@gmx.net,
> >     a.s@realize.ch, 20623@debbugs.gnu.org
> > 
> > On 2018-08-11 12:15:31 +0300, Eli Zaretskii wrote:
> > > In this case, I cannot but express my extreme surprise to see such a
> > > minor issue described as "grave".  The alleged data loss is minor, if
> > > it exists at all (the BOM is not data important for the user,
> > 
> > You're completely wrong. The presence of BOM or not is very important
> > for some applications, such as Firefox (not to determine the charset,
> > but the MIME type of local files).
> 
> Please provide the details, including the use case, if possible.  I'm
> still in the dark regarding the importance of the BOM in UTF-8 encoded
> HTML stuff.

  https://bugzilla.mozilla.org/show_bug.cgi?id=1422889

for HTML. Wontfix because of:

  https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm

For text/plain only (but this is another example that BOM can matter
in practice), there's

  https://bugzilla.mozilla.org/show_bug.cgi?id=1071816

(which is a bug that should be fixed).

> > It can be repaired, but the problems are the user doesn't know
> > what's going on and this breaks things.
> 
> I agree about the user not knowing, but that doesn't yet qualify as
> "data loss", which has an widely accepted meaning.

This is data corruption, which is a form of data loss, because some
information is lost in the process (I recall that Emacs does not
provide any information to the user about this transformation).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[Prev in Thread]

Current Thread

[Next in Thread]

bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Vincent Lefevre, 2018/08/08
- bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Stefan Monnier, 2018/08/08
- bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Eli Zaretskii, 2018/08/11
  - bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Vincent Lefevre, 2018/08/11
    - bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Eli Zaretskii, 2018/08/11
    - bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Vincent Lefevre <=
    - bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Eli Zaretskii, 2018/08/11
    - bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Vincent Lefevre, 2018/08/11
    - bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Stefan Monnier, 2018/08/11
    - bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save, Vincent Lefevre, 2018/08/11

Prev by Date: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Next by Date: bug#31039: 27.0.50; allow silencing auto-save
Previous by thread: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Next by thread: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Index(es):
- Date
- Thread