[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dashes and non-breaking spaces

From: Benjamin Riefenstahl
Subject: Re: dashes and non-breaking spaces
Date: Sat, 15 Jan 2005 15:05:52 +0100
User-agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux)

Hi Karl, all,

Karl Eichwalder writes:
> Broken by the mail program of my mail partner.  His mail program
> treats all mails as iso-8859-1 resp. windows-1252 encoded and while
> he answers the encoding mixture happens.
> [...]
> I think it isn't worth the trouble (to many false positives?).  If
> you want to try: If an iso-8859-1 labeled text contains escapes,
> most probably it is windows-1252 encoded; if there are still escapes
> and the text is quoted, try to treat it as UTF-8.

Those are not "escapes" strictly speaking.  If you decode UTF-8 as
cp1252 or latin-1 you just get sequences of unusual non-ASCII

If the problem occurs regularly with texts marked as iso-8859-1, you
can try UTF-8 first and than fall back to cp1252.

First try to decode the text as UTF-8.  Because UTF-8 follows some
very strict rules, it's possible to check for these rules, and than
the probability to mistake any non-UTF-8 text for UTF-8 is very low in
general (< 99%, I believe, even for short texts).  This is even more
so for latin-1 or cp1252 texts, because these encode languages where
sequences of non-ASCII characters are rare in the first place.

If the text is not UTF-8, just treat it as cp1252.  Encoding-wise all
texts that are latin-1 can be displayed as cp1252 without any


reply via email to

[Prev in Thread] Current Thread [Next in Thread]