[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dashes and non-breaking spaces

From: Benjamin Riefenstahl
Subject: Re: dashes and non-breaking spaces
Date: Sat, 15 Jan 2005 18:30:31 +0100
User-agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux)

Hi Karl,

> Benjamin Riefenstahl <address@hidden> writes:
>> Those are not "escapes" strictly speaking.  If you decode UTF-8 as
>> cp1252 or latin-1 you just get sequences of unusual non-ASCII
>> characters.

Karl Eichwalder writes:
> The point is that Emacs treats them as escapes...

What does "escape" mean here?  There are no escapes of any kind in
latin-1 in the non-ASCII region, as I understand it.  If this is just
another term for "control character", the only valid control
character, if you want to call it that, is \u00A0, NBSP (see subject).

In email, if there are characters in a latin-1 text from the reserved
C1 region (\u0080-\u009F), that is just an indication that it's not
actually latin-1, but that windows-1252 (cp1252) is used.  That
confusion is extremely common.

If there are byte sequences in there that are valid in UTF-8, that
means that it *is* UTF-8 (with a very high degree of certainty).

> (And as I said, those messages are broken; only a part (the quoted
> text) is "wrong".)

Well, that could mean that the sender has seen it in this form and
wants you to see it this way.  Or it can mean that the sender was too
lazy to correct it.

If you wanted to fix it in Emacs, you'd have to treat each quoted
block separately.  The algorithm that I gave would still be


reply via email to

[Prev in Thread] Current Thread [Next in Thread]