[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev lynx: have bug (fwd)

From: Leonid Pauzner
Subject: Re: lynx-dev lynx: have bug (fwd)
Date: Sun, 21 Mar 1999 23:59:01 +0300 (MSK)

21-Mar-99 12:38 Klaus Weide wrote:
> On Sun, 21 Mar 1999, Leonid Pauzner wrote:
>> One certain "problem" I personally run into is a utf-8 URL encoding:
>> when HREF= have *open 8-bit text* the remote server (script)
>> may (1) expect such bytes %xx-encoded,
>> but lynx now (2) translate URLs from document charset to utf-8
>> and then sent each byte %xx-encoded (an obvious check -
>> a number of %xx encoded bytes increased).

> But URLs should never *have* unencoded 8-bit chars - and lynx
> never generates such URLs as a result of form submission (I hope).

In real life I saw a server that sent a dynamically generated HTML
with embedded HREF= with unencoded 8-bit chars as a request for CGI...
No words whether it is correct or not but lynx convert such text to utf-8
and the resulted request fails (assume BIG TWO succeed).

>> UTF-8 URL-encoding was proposed in several recent drafts
>> (not handy, but I remember a note that certain protocols
>> or servers may expect blind %xx encoding, not utf-8
>> so we may need a configurable option between (1) and (2) for compatibility.
>> Also I doubt lynx do (2) in all cases, saw it only for HTML's -

I mean the translation to utf-8 exist and document charset is not iso-8859-1.
> It may not do it if in raw or transparent mode, or if Display character set ==
> document charset (or assumed charset?), or if CJK, or some other combination
> of factors.  It shouldn't have anything to do with HTML or not though.

>> a proper solution here may be to not include open 8-bit bytes in HREF=url
>> but only %xx-encoded by page authors).

> Right, at least for now.   At some point in the future, that may be
> different.

>> At least I18N (RFC2070) describe the problem:  [ snipped ]

>> > It may not be a bug, but you have to set up lynx correctly.
>> > Try it with -raw (or the equivalent '@' key toggle), or with
>> > -assume_charset=iso-8859-9 (you possibly also want
>> > -assume_local_charset=iso-8859-9).

> This could also apply to the UTF-8-in-URLs problem.

> But we don't know whether this has anything to do with Turan Yuksel's
> problem.  We don't even know yet whether that problem has anything to
> do with METHOD=GET forms, the only ones where the data becomes part
> of the URL; it might be different for METHOD=POST.

>    Klaus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]