[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23701: Decoding broken by sequence ESC comma

From: Taylan Ulrich Bayırlı/Kammer
Subject: bug#23701: Decoding broken by sequence ESC comma
Date: Mon, 06 Jun 2016 01:35:26 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Andreas Schwab <address@hidden> writes:

> address@hidden (Taylan Ulrich "Bayırlı/Kammer") writes:
>> The occurrence of the sequence of the bytes 1B 2C (ASCII ESC and comma)
>> messes up Emacs's decoding of an ASCII file from that point on.
> This is one of the ISO 2022 escape sequences.
>> This doesn't happen in any other text-displaying application I tested,
>> including a terminal emulator (given it's an escape sequence and all).
> None of them know about ISO 2022, apparently.
> Andreas.

Hmm, OK.  I figure it's an obscure use-case, but perhaps so is its
accidental(?) occurrence in a text file.

On the meanwhile I found out C-x RET r us-ascii RET fixes my issue.

The file in which I encountered this (mailing list archives of R6RS)
actually contains the sequences escape, comma, capital-a, and that in
places where these seem intentionally positioned, such as between
sentences.  I wonder what this is about.  Whatever it means, if this is
more common than uses of that ISO 2022 sequence, that would be a problem
I suppose.  Here's the relevant snippet from the file, with literal ESC
characters changed to ^[:

>  | On Fri, Sep 11, 2009 at 10:46 PM, Aubrey Jaffer<agj at alum.mit.edu> wrote:
>  | > ^[,A | Date: Wed, 9 Sep 2009 00:30:18 -0400
>  | > ^[,A | From: Lynn Winebarger <owinebar at gmail.com>
>  | > ^[,A |
>  | > ^[,A | ...
>  | > ^[,A | The advent of hygeinic macros marked the end of the era in which
>  | > ^[,A | symbols could be equated with identifiers. ^[,A Identifiers have 
> a lot
>  | > ^[,A | more information in them.
>  | >
>  | > The SLIB implementations of syntactic-closures, syntax-case,

I just grepped all the files and the archives seem to contain a few more
files in which the ESC , sequence appears, such as:

    G^[,Avdel vs Godel vs Goedel

    ^[,Hylem vs ^[,Hylen vs the same with proper vowel symbols

    ... I know that there is a single bit sequence that specifies
    strings, and it's not ^[,A+;^[(Bs; I know that there's another
    single sequence that specifies ellipsis, and it's not ^[$,1s&^[(B

These aren't ISO-8859-1 either.  I don't know what encoding they're
supposed to be in.  Could also be a mail server breaking things.

All in all, I'm just throwing this out there; I have no idea how
commonly used ISO 2022 is, but handling it by default certainly breaks
some files that contain ESC , either by accident or with some other
purpose.  Maybe it should not be handled by default.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]