lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Possible bug report - content type


From: Leonid Pauzner
Subject: Re: lynx-dev Possible bug report - content type
Date: Sat, 14 Feb 2004 23:28:05 +0300 (MSK)

11-Feb-2004 10:19 Henry Nelson wrote:
> On Tue, Feb 10, 2004 at 07:55:06AM -0800, Doug Kaufman wrote:
>> On Tue, 10 Feb 2004, Henry Nelson wrote:
>> > On Tue, Feb 10, 2004 at 12:04:23AM -0800, Doug Kaufman wrote:
>> > > recovery. I tried to go to "http://sourceforge.jp/projects/lha/";, but

> This document has (as you posted earlier):
>   `` <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP"> ''
> Lynx honors the content and charset declarations in a meta tag.  The server
> also sends a charset declaration in the header, which Lynx honors:
>   `` Content-Type: text/html; charset=euc-jp ''
> (I _think_ the server declaration has precedence, but I'm not sure.)

>> mode or toggling into CJK mode doesn't change anything. If I set the
>> display character set to euc-jp, shift-jis, or transparent, then
>> it displays OK (like other Japanese web pages). For example, I can

> AFAIK, Lynx's chartrans code was written such that this is "expected"
> behavior.  I once discussed this aspect of Lynx with Klaus a number of
> years ago.  Unfortunately my brain power wouldn't let me keep up with
> him.  I do recall both of us testing this very problem of honoring or
> ignoring the meta tag in relation to the 3 Japanese encodings commonly
> used in web pages, and it definitely was an advantage to have Lynx honor
> the tag.  "Advantage" means no/less "mojibake" or distortion of characters.

>> display the following page without problems with my usual settings
>> (display charset=cp437):
>> "http://www2m.biglobe.ne.jp/~dolphin/lha/lha.htm";

> This page does not have a charset declaration, neither within the
> document as a meta tag, nor from the server.  The server only states
> "Content-Type: text/html".

>> That site has content type "text/html". I think that the charset
>> appended to the content-type is causing the problem.

> Yes, I think it is fairly certain that Lynx is acting on the charset
> declaration.  From my perspective, I wouldn't call it a "problem", but
> rather a feature.  Leonid is the man to talk to, though.

> __Henry

Well, since I was mentioned several times in this thread I add few words.

I just visited the first referred page:

HTTP charset declaration to euc-jp force my lynx to download the page onto
the disk, the same page with META charset tag being read from the local disk
displayed (at least).

Apparently lynx action on HTTP charset and META charset differs.
My guess (not looking at the code):
on HTTP header stage lynx test UC_CanTranslateFromTo() and reject loading
if false (could not happen for any non-CJK charsets(*));
when reading META it is probably too late to stop loading and lynx just
silently ignore the requested translation.

Perhaps we can relax restrictions on HTTP stage if we got the consensus.

(*) any 8bit charset could be translated to any 8bit charset,
using "7bit approximation" fallback when necessary.



; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]