bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20623: XML and HTML files with encoding/charset="utf-8" declaration


From: Eli Zaretskii
Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Sun, 10 Dec 2017 21:17:00 +0200

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rgm@gnu.org,  a.s@realize.ch,  sledergerber@gmx.net,  
> 20623@debbugs.gnu.org
> Date: Mon, 04 Dec 2017 16:08:14 -0500
> 
> > Isn't it better to fix this in sgml-xml-auto-coding-function?  That's
> > where the root cause is, AFAIU.
> 
> I'd expect the same problem would affect all other uses.

Not sure what you meant by "all other uses".  Could you please
elaborate?

> > And I don't understand the comment about latin-1-mac: I don't think we
> > have such problems in Emacs.  The -with-signature variety is
> > different, because it is not about EOL format.
> 
> You might be right, but I don't know where/how this is handled.

I would like to propose the following alternative patch, which accepts
utf-8-with-signature and utf-8-hfs as variants of utf-8 for the
purposes of encoding of XML files.  Comments?  Do we want a similar
treatment for UTF-16?  (That doesn't seem to be required by the bug
report, and UTF-16 in XML files is non-standard anyway.  But what
about HTML?)

diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 857fa80..5ff1acf 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2493,7 +2493,17 @@ sgml-xml-auto-coding-function
            (let* ((match (match-string 1))
                   (sym (intern (downcase match))))
              (if (coding-system-p sym)
-                 sym
+                  ;; If the encoding tag is UTF-8 and the buffer's
+                  ;; encoding is one of the variants of UTF-8, use the
+                  ;; buffer's encoding.  This allows, e.g., saving an
+                  ;; XML file as UTF-8 with BOM when the tag says UTF-8.
+                  (if (and (coding-system-equal 'utf-8
+                                                (coding-system-type sym))
+                           (coding-system-equal sym
+                                                (coding-system-type
+                                                 buffer-file-coding-system)))
+                      buffer-file-coding-system
+                   sym)
                (message "Warning: unknown coding system \"%s\"" match)
                nil))
           ;; Files without an encoding tag should be UTF-8. But users
@@ -2506,7 +2516,8 @@ sgml-xml-auto-coding-function
                    (coding-system-base
                     (detect-coding-region (point-min) size t)))))
             ;; Pure ASCII always comes back as undecided.
-            (if (memq detected '(utf-8 undecided))
+            (if (memq detected
+                      '(utf-8 'utf-8-with-signature 'utf-8-hfs undecided))
                 'utf-8
               (warn "File contents detected as %s.
   Consider adding an encoding attribute to the xml declaration,





reply via email to

[Prev in Thread] Current Thread [Next in Thread]