[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Coding system robustness?
From: |
David Kastrup |
Subject: |
Re: Coding system robustness? |
Date: |
Sat, 19 Mar 2005 10:10:07 +0100 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) |
Kenichi Handa <address@hidden> writes:
> In article <address@hidden>, Stefan Monnier <address@hidden> writes:
>
>>> I'd like to know whether coding systems in general are supposed to be
>>> robust, meaning that decoding some random byte string into the coding
>>> system and reencoding it is guaranteed to deliver the same byte string
>>> again?
>
>> AFAIK, (encode-coding-string (decode-coding-string STR 'foo) 'foo)
>> should always return STR, otherwise it's a bug.
>> With the introduction of eight-bit-*, this should be true of "all"
>> coding-systems in Emacs-21,
>
> No. Redundant escape sequences in iso-2022 based coding
> systems are just ignored. For instance,
>
> (decode-coding-string "\e(J" 'iso-2022-jp) => ""
>
> And we can't recover "\e(J" on encoding.
Ok, making the problem somewhat more confined: if I have a file that
is written _by_ _Emacs_ in some coding system, and then externally I
chop parts of it into pieces (not dropping material) not taking into
account multibyte boundaries, convert these pieces with interspersed
ASCII) into the original decoding, encode it again to a unibyte
string, properly replace the ASCII-fied pieces with the original
material and decode to the original decoding (phew), I am pretty sure
that I have round-trip behavior, right?
Well, almost. On escape-based coding systems I don't see in the first
place that one can encode/decode string parts in isolation, so I am
afraid that it is not really feasible to promise anything. Do the
escapes at least start fresh every line? I am just being curious
here, there is no actual chance that I am going to support such a
coding system, and I don't see how I sensibly could.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum