bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM


From: Eli Zaretskii
Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Tue, 11 May 2021 15:04:05 +0300

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: "R. Diez" <rdiezmail-emacs@yahoo.de>,  larsi@gnus.org,
>   48324@debbugs.gnu.org
> Date: Mon, 10 May 2021 20:05:33 +0200
> 
> On Mai 10 2021, Eli Zaretskii wrote:
> 
> > FTR, here's a shorter and easier recipe:
> >
> >   emacs -Q
> >   C-x C-f foo.txt RET
> >   C-x RET f utf-8-with-signature-dos RET
> >   1 2 3
> >   C-x C-s
> >   M-x hexl-mode RET
> >   M-x hexl-insert-hex-char RET 00 RET
> 
> I guess the gist is that hexl-mode not only needs to account for the EOL
> type, but also for the signature when computing original-point.

Actually, it turned out that wasn't the main problem.  (It was still a
problem, but the same problem happened in a buffer produced by
hexl-find-file.)  The main problems were that (a) hexl.el handled null
bytes as characters that need to be encoded before inserting them (as
if they were non-ASCII characters), and (b) its handling of non-ASCII
characters when the encoding of the original file used a BOM was
incorrect (because encode-coding-char didn't remove the BOM from the
encoded byte sequence).  By contrast, hexl-find-file visits the file
literally, so its encoding of a null byte was trivially correct.

This should be now fixed on the master branch.

The capability of inserting multibyte characters via Hexl is somewhat
problematic, so I made a point of describing the issues in the
relevant doc strings (because the problems are intrinsic and IMO hard
or impossible to solve in general).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]