[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: decode-coding-string gone awry?
From: |
David Kastrup |
Subject: |
Re: decode-coding-string gone awry? |
Date: |
Mon, 14 Feb 2005 19:41:19 +0100 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) |
Stefan Monnier <address@hidden> writes:
>>> instead of being processed directly from the process filter, then
>>> you should also ensure that this buffer is unibyte.
>
>> Yuk. The problem is that this buffer is not only processed by
>> preview-latex, but also by AUCTeX, and the versions that get combined
>> may be different. AUCTeX uses the source code buffer's file encoding
>> by default, which is fine for basically unibyte based coding systems.
>
> If you can't change this part, then your best bet might be to do something
> like:
>
> (defun preview-error-quote (string)
> "Turn STRING with potential ^^ sequences into a regexp.
> To preserve sanity, additional ^ prefixes are matched literally,
> so the character represented by ^^^ preceding extended characters
> will not get matched, usually."
> (let (output case-fold-search)
> (while (string-match
> "\\^*\\(\\^\\^\\(\\(address@hidden)\\|[8-9a-f][0-9a-f]\\)\\)+"
> string)
> (setq output
> (concat output
> (regexp-quote (substring string 0 (match-beginning 1)))
> (decode-coding-string
> (preview-dequote-thingies (substring (match-beginning 1)
> (match-end 0)))
> buffer-file-coding-system))
> string (substring string (match-end 0))))
> (setq output (concat output (regexp-quote string)))
> output)))
>
> BTW, you can use the 3rd arg to string-match to avoid consing strings for
> `string'.
>
> This way you only apply decode-coding-string to the part of the
> string which is still undecoded but not to the rest.
No use. The gag precisely is that TeX may decide to split a _single_
Unicode character into some bytes that it will let go through
unchanged, and some bytes that it will transcribe into ^^ba notation.
If decode-coding-string is supposed to have a chance of reassembling
this junk, it must only be run at the end of reconstructing the byte
stream. Yes, this is completely insane. No, I can't avoid having to
deal with it somehow.
Give me a clue: what happens if a process inserts stuff with 'raw-text
encoding into a multibyte buffer? 'raw-text is a reconstructible
encoding, isn't it, so the stuff will get converted into some prefix
byte indicating "isolated single-byte entity instead of utf-8 char"
and the byte itself or something, right? And decode-encoding-string
does not want to work on something like that?
I have to admit to total cluelessness.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/14
- Re: decode-coding-string gone awry?, David Kastrup, 2005/02/14
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/14
- Re: decode-coding-string gone awry?, David Kastrup, 2005/02/14
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/14
- Re: decode-coding-string gone awry?,
David Kastrup <=
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/14
- Re: decode-coding-string gone awry?, David Kastrup, 2005/02/14
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/14
- Re: decode-coding-string gone awry?, David Kastrup, 2005/02/14
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/14
- Re: decode-coding-string gone awry?, David Kastrup, 2005/02/14
Re: decode-coding-string gone awry?, David Kastrup, 2005/02/14
Re: decode-coding-string gone awry?, Richard Stallman, 2005/02/15
Re: decode-coding-string gone awry?, David Kastrup, 2005/02/15