bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM

From:	Eli Zaretskii
Subject:	bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date:	Wed, 12 May 2021 16:50:15 +0300

> From: Glenn Morris <rgm@gnu.org>
> Cc: Andreas Schwab <schwab@linux-m68k.org>,  48324@debbugs.gnu.org,  
> rdiezmail-emacs@yahoo.de,  larsi@gnus.org
> Date: Tue, 11 May 2021 16:37:51 -0400
> 
> Eli Zaretskii wrote:
> 
> > This should be now fixed on the master branch.
> 
> The change to encode-coding-char in f3f1947e5b5b causes
> test subr-string-limit-coding to fail. Ref eg
> https://hydra.nixos.org/build/142879118

Thanks, I fixed that.

The original test results seemed strange, to say the least: it's as if
we shoot first and draw the target later so that it fits.  E.g., how
can the last 4 bytes of encoding "foóá" with UTF-16 be
"\376\377\000\341", with the 2 first bytes coming from the BOM?

This actually reveals a design flaw in string-limit: we cannot simply
use encode-coding-char to encode the characters one by one.  I added a
FIXME comment to explain why, as I don't currently have any clever
ideas for how to implement it more correctly, except by iterations,
which is inelegant.  Ideas welcome.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, (continued)
- bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters, Lars Ingebrigtsen, 2021/05/11

Prev by Date: bug#47488: Treatment of # in js.el
Next by Date: bug#47488: Treatment of # in js.el
Previous by thread: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Next by thread: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Index(es):
- Date
- Thread