Re: decode-coding-string gone awry?

From: David Kastrup
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 03:28:32 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

Kenichi Handa <address@hidden> writes:

> In article <address@hidden>, David Kastrup <address@hidden> writes:
>> I have the problem that within preview-latex there is a function
>> that assembles UTF-8 strings from single characters.  This
>> function, when used manually, mostly works.
> It seems that you are caught in a trap of automatic
> unibyte->multibyte conversion.
>> (defun preview-error-quote (string)
>>   "Turn STRING with potential ^^ sequences into a regexp.
>> To preserve sanity, additional ^ prefixes are matched literally,
>> so the character represented by ^^^ preceding extended characters
>> will not get matched, usually."
>>   (let (output case-fold-search)
>>     (while (string-match 
>> "\\^\\{2,\\}\\(\\(address@hidden)\\|[8-9a-f][0-9a-f]\\)"
>>                       string)
>>       (setq output
>>          (concat output
>>                  (regexp-quote (substring string
>>                                           0
>>                                           (- (match-beginning 1) 2)))
> If STRING is taken from a multibyte buffer, it is a
> multibyte string.  Thus, the above substring also returns a
> multibyte string.
>>                    (char-to-string
>>                     (string-to-number (match-string 1 string) 16))))
> But, this char-to-string produces a unibyte string.  So, on
> concatinating them, this unibyte string is automatically converted
> to multibyte by string-make-multibyte function which usually
> produces a multibyte string containing latin-1 chars.

Oh.  Latin-1 chars.  Can't I tell char-to-string to produce the same
sort of raw-marked chars that raw-text (as process-coding system)
appears to produce?

>>   (setq output (decode-coding-string output buffer-file-coding-system))
> And this decode-coding-string treats the internal byte
> sequence of a multibyte string OUTPUT as utf-8, thus you get
> some garbage.
>> Unfortunately, when I call this stuff by hand instead from the
>> process-sentinel, it mostly works
> That is because the string you give to preview-error-quote
> is a unibyte string in that case.  The Lisp reader generates
> a unibyte string when it sees ASCII-only string.
> Ex: (multibyte-string-p "abc") => nil
> This will also return incorrect string.
> (preview-error-quote
>   (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$"))
> So, the easiest fix will be to do:
>   (setq string (string-as-unibyte string))
> in the head of preview-error-quote.

Sigh.  XEmacs-21.4-mule does not seem to have string-as-unibyte.  I'll
have to see whether it happens to work without it on XEmacs.  If not,
I'll have to come up with something else.

Thanks for the analysis!

David Kastrup, Kriemhildstr. 15, 44793 Bochum

