[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Casting as wide a net as possible

From: Random832
Subject: Re: Casting as wide a net as possible
Date: Tue, 15 Dec 2015 14:03:55 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

I’ve been too clever for my own good. My “C1 controls” example
was not properly encoded as UTF-8, and I ignored the warnings
provided by Gnus for this situation. Below is, I hope, my
message as it was intended to appear (all properly encoded as

Random832 <address@hidden> writes:
> There are occasional accented words e.g. naïve, borrowed from
> other languages. And also punctuation marks (more common with
> people who use certain word processing software packages that
> automatically replace typewriter quotes with them).
> Well, obviously there’s Latin-1 and UTF-8. There’s also
> Windows-1252, which is semi-compatible with Latin-1. You can
> sometimes end up with the Windows-1252 bytes treated as if they
> were Latin-1 C1 controls (and perhaps encoded further into
> UTF-8). There are also older encodings that aren’t used much
> anymore e.g. DOS 437/850, MacRoman, etc.
> I¹ve also seen content that was mechanically translated from one
> to another using an 8-bit mapping table, with incompatible
> characters mapped arbitrarily. For example, if you ever see
> something with quotes/apostrophes replaced with superscripts,
> like in this paragraph, this probably means the text originated
> in MacRoman and was translated to Latin-1 with the ³André
> Pirard² mapping.
> Anyway, the point is, since non-ASCII characters aren’t
> pervasive, it’s easy to miss noticing that something’s wrong
> with them. For one last demo, this paragraph features UTF-8,
> treated as Windows-1252, and then re-encoded as UTF-8 again.

P.S.  It may be instructive to note that my message was
apparently detected by Gnus as being in some kind of Japanese

reply via email to

[Prev in Thread] Current Thread [Next in Thread]