bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM


From: R. Diez
Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Sun, 9 May 2021 23:38:18 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

I think that hexl-mode has problems with the UTF-8 BOM byte sequence at the 
beginning of a text file. The steps to reproduce this issue are:

Create a text file with a single line with 3 characters: 123

Do a (set-buffer-file-coding-system 'utf-8-with-signature-dos) and save the 
file.

The file should now have the following contents (8 bytes):

ef bb bf 31 32 33 0d 0a

That is the UTF-8 BOM (ef bb bf), the ASCII digits 1, 2 and 3, and end-of-line 
sequence (CR LF).

Now change to hexl-mode, place the cursor at the '1' character (31 in hex), call hexl-insert-hex-char, and enter 00 in order to replace the '1' with a binary zero (NUL character).

The result is puzzling. Instead of replacing the '1' (31) with NUL (00), the UTF-8 BOM is duplicated, the characters '1' and '2' and '3' have been overwritten with the new copy of BOM, character CR has been replaced with NUL, and character LF is intact:

ef bb bf ef bb bf 00 0a

If you save, close and reload the file, it gains one byte, but that is probably 
not important, just a consequence of having lost the CR character:

ef bb bf ef bb bf 00 0d 0a





reply via email to

[Prev in Thread] Current Thread [Next in Thread]