[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coding system robustness?

From: Kenichi Handa
Subject: Re: Coding system robustness?
Date: Sat, 19 Mar 2005 09:52:26 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, David Kastrup <address@hidden> writes:
> I'd like to know whether coding systems in general are supposed to be
> robust, meaning that decoding some random byte string into the coding
> system and reencoding it is guaranteed to deliver the same byte string
> again?

In genenral, no.

> Background for that question: I do error association in preview-latex
> (via AUCTeX) with the original source text, and generally unrobust
> transformations of the input may happen, such as splitting a
> multibyte-char in the middle, or translitering some of those chars,
> but not others.  I currently work this by having the process use a
> raw-text encoding, replace potentially questionable stuff and reencode
> when it turns out that the contexts do not match the source file.
> This has the disadvantage that

> a) I need to go through the works even in case TeX is set up nicely
> enough to deliver mostly working characters, since the raw encoding
> will match much less often than a properly decoded stream.

> b) The displayed output looks like junk unnecessarily.  If we are
> talking about multi-file documents written in different encodings,
> this problem is not possible to avoid with tolerable effort, but in
> the case where the encodings in one document match, it would be nicer
> to have AUCTeX have a nicer output buffer.

> So what encodings are expected to be "transparent" for what versions
> of Emacs (we are only interested in 21.3 and newer)?

These are detected as transparent automatically by the
attached code by the latest code.

chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2
iso-latin-3 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9
iso-safe japanese-iso-8bit japanese-shift-jis
korean-iso-8bit raw-text

I expect more CCL-based coding systems (lots of CPXXX) are
also transparent (at least utf-XX are so), but can't be
detected automatically.

Ken'ichi HANDA

(let ((round-trip-safe nil))
  (dolist (elt (coding-system-list t))
    (and (not (coding-system-pre-write-conversion elt))
         (not (coding-system-post-read-conversion elt))
         (let ((type (coding-system-type elt)))
           (if (memq type '(0 1 3 5))
               (push elt round-trip-safe)
             (if (eq type 2)
                 (let ((flags (coding-system-flags elt)))
                   (if (and (not (consp (aref flags 0)))
                            (not (consp (aref flags 1)))
                            (not (consp (aref flags 2)))
                            (not (consp (aref flags 3)))
                            (not (aref flags 8)))
                       (push elt round-trip-safe))))))))
  (pp round-trip-safe)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]