[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
From: |
Eli Zaretskii |
Subject: |
bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM |
Date: |
Tue, 11 May 2021 15:04:05 +0300 |
> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: "R. Diez" <rdiezmail-emacs@yahoo.de>, larsi@gnus.org,
> 48324@debbugs.gnu.org
> Date: Mon, 10 May 2021 20:05:33 +0200
>
> On Mai 10 2021, Eli Zaretskii wrote:
>
> > FTR, here's a shorter and easier recipe:
> >
> > emacs -Q
> > C-x C-f foo.txt RET
> > C-x RET f utf-8-with-signature-dos RET
> > 1 2 3
> > C-x C-s
> > M-x hexl-mode RET
> > M-x hexl-insert-hex-char RET 00 RET
>
> I guess the gist is that hexl-mode not only needs to account for the EOL
> type, but also for the signature when computing original-point.
Actually, it turned out that wasn't the main problem. (It was still a
problem, but the same problem happened in a buffer produced by
hexl-find-file.) The main problems were that (a) hexl.el handled null
bytes as characters that need to be encoded before inserting them (as
if they were non-ASCII characters), and (b) its handling of non-ASCII
characters when the encoding of the original file used a BOM was
incorrect (because encode-coding-char didn't remove the BOM from the
encoded byte sequence). By contrast, hexl-find-file visits the file
literally, so its encoding of a null byte was trivially correct.
This should be now fixed on the master branch.
The capability of inserting multibyte characters via Hexl is somewhat
problematic, so I made a point of describing the issues in the
relevant doc strings (because the problems are intrinsic and IMO hard
or impossible to solve in general).
- bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters, R. Diez, 2021/05/09
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, R. Diez, 2021/05/09
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2021/05/10
- Message not available
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2021/05/10
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2021/05/10
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Andreas Schwab, 2021/05/10
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2021/05/10
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, R. Diez, 2021/05/10
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2021/05/10
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Andreas Schwab, 2021/05/10
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM,
Eli Zaretskii <=
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Glenn Morris, 2021/05/11
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2021/05/12
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2021/05/10
bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters, Lars Ingebrigtsen, 2021/05/11