[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Email text that confuses charset recognition in emacs

From: Paul Eggert
Subject: Re: Email text that confuses charset recognition in emacs
Date: Tue, 16 Apr 2013 21:37:08 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

On 04/16/2013 09:27 AM, Giorgos Keramidas wrote:
> the attached email message confuses the charset
> detection machinery of Emacs, and it starts interpreting all text as
> Japanese text -- even though most of the contents of the file are plain
> us-ascii text.

Although the text is US-ASCII it contains a valid ISO-2022-7bit
coding sequence (the two things are not incompatible)
which Emacs is properly detecting and converting.  The problem is that
the text later contains the invalid escape sequence

   ESC LF > > SP ( B

This text was intended to switch out of a Japanese charset (the immediately
preceding text is valid ISO-2022-7bit Japanese), but a mailer that
*thought* that the text was ASCII inserted LF > > SP after the ESC
and before the ( B, causing the ESC ( B to be corrupted, so Emacs remains
in Japanese mode until the end of the input.

Perhaps when Emacs is decoding ISO-2022-7bit and sees an invalid
escape sequence, it should switch back to ASCII.  That would have
fixed your problem, and wouldn't break the decoding of any valid
ISO-2022-7bit sequence.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]