[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: emacs-26 8f18d12: Improve documentation of decoding into a unibyte b

From: Stefan Monnier
Subject: Re: emacs-26 8f18d12: Improve documentation of decoding into a unibyte buffer
Date: Tue, 28 May 2019 13:43:47 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

> "Use the source, Luke!"

But the dark side is so enticing!

>   (let* ((str1 (string-as-multibyte (string char)))
>        (str2 (string-as-multibyte (string char char)))

Why on earth do we call string-as-multibyte here?  AFAIK, the only cases
where `string` returns a unibyte string is when char <128 (it could make
sense to also do that for char ≥128 and <160, but we don't seem to do
that currently) and these are better turned into multibyte via
string-TO-unibyte (tho here we don't even need that, since the unibyte
string works just as well for what we do) than string-AS-unibyte.

I think this is an error.  The patch below seems in order.

>        (found (find-coding-systems-string str1))
>       enc1 enc2 i1 i2)
>     (if (and (consp found)
>            (eq (car found) 'undecided))
>       str1  <<<<<<<<<<<<<<<<<<<<<<<<<
> If we return here, the value is str1, which is a multibyte string, see
> how it was calculated.

I think it's a bug.  Largely harmless since it only applies to ASCII
chars for which we conflate the char/byte status, but still, it's a wart.

> I didn't think enough about this to figure out if there can be less
> trivial use cases.  If you can describe all the cases where
> find-coding-systems-string will return a list whose 'car' is
> 'undecided', my hat off to you.

AFAIK it only happens for pure-ASCII strings.


diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index 2b0aaca664..391efbedc8 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -2926,12 +2926,11 @@ encode-coding-char
 If CODING-SYSTEM can't safely encode CHAR, return nil.
 The 3rd optional argument CHARSET, if non-nil, is a charset preferred
 on encoding."
-  (let* ((str1 (string-as-multibyte (string char)))
-        (str2 (string-as-multibyte (string char char)))
+  (let* ((str1 (string char))
+        (str2 (string char char))
         (found (find-coding-systems-string str1))
        enc1 enc2 i1 i2)
-    (if (and (consp found)
-            (eq (car found) 'undecided))
+    (if (not (multibyte-string-p str1))
       (when (memq (coding-system-base coding-system) found)
        ;; We must find the encoded string of CHAR.  But, just encoding

reply via email to

[Prev in Thread] Current Thread [Next in Thread]