bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23

From: Reiner Steib
Subject: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
Date: Mon, 01 Dec 2008 23:48:32 +0100
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/22.1 (gnu/linux)

On Mon, Dec 01 2008, Stefan Monnier wrote:

> Having looked at the code again, I'm more than ever confident that
> string-to-unibyte is the right thing to use.  Maybe the code I installed
> back then failed to fallback to string-as-unibyte when string-to-unibyte
> was not available, which caused a bug for Simon?

Yes, it didn't fall back to string-as-unibyte:

--- nnimap.el   17 Aug 2004 14:27:16 -0000      7.7
+++ nnimap.el   30 Aug 2004 18:13:58 -0000      7.8
@@ -845,9 +847,12 @@
     (nnoo-status-message 'nnimap server)))
 (defun nnimap-demule (string)
-  (funcall (if (and (fboundp 'string-as-multibyte)
-                   (subrp (symbol-function 'string-as-multibyte)))
-              'string-as-multibyte
+  ;; BEWARE: we used to use string-as-multibyte here which is braindead
+  ;; because it will turn accidental emacs-mule-valid byte sequences
+  ;; into multibyte chars.  --Stef
+  (funcall (if (and (fboundp 'string-to-multibyte)
+                   (subrp (symbol-function 'string-to-multibyte)))
+              'string-to-multibyte
           (or string "")))

> In any case the newly committed code has a prenthesis typo that makes
> it still use the old code and ignore the new config var
> nnimap-demule-use-string-to-multibyte.

Oops, stupid me.

> Also I recommend to just use the patch below instead.  The first hunk
> removes an unnecessary use of nnimap-demule since the output will be
> inserted into a unibyte buffer.

Thanks for your analysis. Please install the patch.  I'll pull it into
Gnus CVS ASAP (unless Miles syncs first).

> +;; We used to use a string-as-multibyte here, but it is really incorrect.
> +;; This function is used when we're about to insert a unibyte string
> +;; into a potentially multibyte buffer.  The string is either an article
> +;; header or body (or both?), undecoded.  When Emacs is asked to convert
> +;; a unibyte string to multibyte, it may either use the equivalent of
> +;; nothing (e.g. non-Mule XEmacs), string-make-unibyte (i.e. decode using
> +;; locale), string-as-multibyte (decode using emacs-internal coding system)
> +;; or string-to-multibyte (keep the data undecoded as a sequence of bytes).
> +;; Only the last one preserves the data such that we can reliably later on
> +;; decode the text using the mime info.
> +(defalias 'nnimap-demule 'mm-string-to-multibyte)

In Emacs 21 (which Gnus still aim to be compatible with), we have
string-as-multibyte, but not string-to-multibyte.  So your proposed
code (i.e. mm-string-to-multibyte) runs
  (string-as-multibyte (char-to-string string))
whereas we used to run
  (string-as-multibyte string)
Does char-to-string matter here?

(defalias 'mm-string-to-multibyte
   ((featurep 'xemacs)
   ((fboundp 'string-to-multibyte)
    (lambda (string)
      "Return a multibyte string with the same individual chars as string."
       (lambda (ch) (mm-string-as-multibyte (char-to-string ch)))
       string "")))))

Bye, Reiner.
      (o o)
