pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Missing Content-Type ignors non-ascii chracters


From: Duncan
Subject: [Pan-users] Re: Missing Content-Type ignors non-ascii chracters
Date: Thu, 10 May 2007 15:02:09 +0000 (UTC)
User-agent: Pan/0.129 (Benson & Hedges Moscow Gold)

Seppo Mylläri <address@hidden>
posted address@hidden, excerpted below, on 
Wed, 09 May 2007 17:41:01 +0300:

> Between pan_0.127 and 0.129 I have notised that in some newsmessages
> there are any local (FI_fi) characters outside us-ascii.
> 
> All messages with "Content-Type: charset=something" are shown right.
> 
> Messages with:
>         X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
>         X-RFC2646: Format=Flowed; Original
>         X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
> have no characters outside us-ascii, e.g. "Tämä" -> "Tm"
> 
> Pan works in Ubuntu 7.10 with LANGUAGE=fi_FI:fi:en_GB:en and 
> LC_CTYPE="fi_FI.UTF-8"

There were some recent changes to pan's charset handling.  As an English-
only speaker, I don't know enough about the subject to comment with the 
normal degree of certainty, but with that caveat, from my parsing of the 
changelog and bugs, pan now assumes UTF8 in certain instances where it 
formerly assumed ISO-8859-1 or -15.  This will unavoidably break display 
of certain messages with charset unset and bad assumptions, while the 
former way broke display of other messages.  FWIW, the new way is more 
accepted as the "proper" modern way of doing it, UTF being a "universal" 
representation, altho it is recognized that this WILL break against 
certain MS software in particular.  If I read the discussion in the bug 
correctly, the problem is that the old ISO-8859-X made use of some high-
bit-set octets that are reserved under UTF-8 as the leading byte of a 
multi-byte character.  Since there are messages being sent without 
charset set that make incompatible opposing assumptions, display of 
something WILL be broken, so it's a matter of making the choice of which /
something/ is higher priority, and ensuring that works at the expense of 
the other.  The "accepted" method is to go with UTF-8, as pan now does, 
even tho it is recognized that doing so will break direct 
interoperability with certain MS implementations.

Check the related bugs in the last few releases if you need more.  Maybe 
it's possible to add a bit more special-case code to pan, or maybe 
something's still more broken that it has to be and pan can yet be made 
better in this area, but unfortunately, AFAIK there's no way to fix it 
for every case, because conflicting assumptions make that impossible.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]