[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pan-users] Re: Missing Content-Type ignors non-ascii chracters
From: |
Duncan |
Subject: |
[Pan-users] Re: Missing Content-Type ignors non-ascii chracters |
Date: |
Thu, 10 May 2007 15:02:09 +0000 (UTC) |
User-agent: |
Pan/0.129 (Benson & Hedges Moscow Gold) |
Seppo Mylläri <address@hidden>
posted address@hidden, excerpted below, on
Wed, 09 May 2007 17:41:01 +0300:
> Between pan_0.127 and 0.129 I have notised that in some newsmessages
> there are any local (FI_fi) characters outside us-ascii.
>
> All messages with "Content-Type: charset=something" are shown right.
>
> Messages with:
> X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
> X-RFC2646: Format=Flowed; Original
> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
> have no characters outside us-ascii, e.g. "Tämä" -> "Tm"
>
> Pan works in Ubuntu 7.10 with LANGUAGE=fi_FI:fi:en_GB:en and
> LC_CTYPE="fi_FI.UTF-8"
There were some recent changes to pan's charset handling. As an English-
only speaker, I don't know enough about the subject to comment with the
normal degree of certainty, but with that caveat, from my parsing of the
changelog and bugs, pan now assumes UTF8 in certain instances where it
formerly assumed ISO-8859-1 or -15. This will unavoidably break display
of certain messages with charset unset and bad assumptions, while the
former way broke display of other messages. FWIW, the new way is more
accepted as the "proper" modern way of doing it, UTF being a "universal"
representation, altho it is recognized that this WILL break against
certain MS software in particular. If I read the discussion in the bug
correctly, the problem is that the old ISO-8859-X made use of some high-
bit-set octets that are reserved under UTF-8 as the leading byte of a
multi-byte character. Since there are messages being sent without
charset set that make incompatible opposing assumptions, display of
something WILL be broken, so it's a matter of making the choice of which /
something/ is higher priority, and ensuring that works at the expense of
the other. The "accepted" method is to go with UTF-8, as pan now does,
even tho it is recognized that doing so will break direct
interoperability with certain MS implementations.
Check the related bugs in the last few releases if you need more. Maybe
it's possible to add a bit more special-case code to pan, or maybe
something's still more broken that it has to be and pan can yet be made
better in this area, but unfortunately, AFAIK there's no way to fix it
for every case, because conflicting assumptions make that impossible.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman