lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Chartrans patches impressions..


From: Hynek Med
Subject: Re: LYNX-DEV Chartrans patches impressions..
Date: Sun, 2 Mar 1997 13:51:34 +0100 (MET)

On Sat, 1 Mar 1997, Klaus Weide wrote:

> I am not particularly eager to claim space on the limited real estate of
> the Options screen. (until someone rewrites the whole thing...)

What about another Option screen just for character set things? :-)

> Especially for a "feature" that is only needed to accomodate careless (or
> ignorant) content providers and bad precedents, and will ideally go away
> soon. (we can hope...)

Well, the problem is that there's still some 95% pages in .cz domain
unmarked - this is just my guess. 

> Note that I don't actually use my own code... There just aren't many
> non-ISO-8859-1 pages I {want to,am able to} read. 

I know that, and thank you for the work on it.

> So maybe you should
> explain to me how you use the code.  With some real examples.  Maybe then
> I can better understand why the toggling that the '@' key provides,
> together with -assume_charset, is not enough.

Many Czech pages (the 95% I write about above) - and I guess this is the
same with most other non-US-ASCII pages - are in various Czech/Central
European character sets, but they don't have their character-set marked.
(This is wrong, but that's how life goes. Marked pages (mostly as
Windows-1250) would be unreadable in other lynxes, anyway.)  The character
set is usualy determined by users' choice (there's a link to something
like "select your encoding" on the page) or by a hint of the browser - the
recoding is done by cgi-scripts or modules to Apache httpd. So, when I
want to see such a page, I must either select the raw mode or use the new
assume_charset command line switch to get the character set right
(otherwise would lynx try to translate between ISO-8859-1 and my display
character set, which would produce incorrect results). When I select the
raw mode/assume_charset, it works right - now the only thing needed is to
put these options to lynx.cfg for the system administrators, which is not
possible now, if I understand it right. 

Oh, I'm trying that as I write, and it looks like the -assume_charset
looks it doesn't work. 

A real example. Load an ISO-8859-2 font (setfont lat2-16.psf on your Linux
console). Then select this in your Options screen, save your options
and quit lynx. (You have to select the ISO-8859-2 preffered document
charset to make the httpd send you ISO-8859-2 encoded document - it
sends Windows-1250 and unmarked as default.)

     display (C)haracter set      : ISO Latin 2
     Raw 8-bit or CJK m(O)de      : OFF
     preferred document lan(G)uage: cz,en
     preferred document c(H)arset : ISO-8859-2
 
Then run lynx http://pes.eunet.cz. What you see is wrong, for example the
first option in the form reads [Dne^1ni eislo_____] (with acutes above the
i's). When you run lynx -raw pes.eunet.cz, it's right, [Dnesni
cislo______], with acutes above the i's and a caron above the s.

All in all, lynx -raw works.

On the other hand, lynx -assume_charset=ISO-8859-2 or lynx -assume_charset
ISO-8859-2 (which of these is correct, btw?) doesn't work, it produces the
same output as without the raw switch, which shouldn't, and it's even in
the document info wrong:

Charset: iso-8859-1 (assumed)

If you want to see an example that works, try http://modrysvet.codalan.cz
or http://www.atlas.cz. These are Windows-1250 and marked, when you have
Display charset ISO Latin 2, translation is done right.

As I look on it in detail.. there's one minor thing, though - the Linkname
in the info screen (=) is still wrong, both for the current document and
for the link you are on (try to see the "slozitejsi dotaz" link on
www.atlas.cz - when you are on it and press = key, on the info screen you
see (slo 3/4itij^1i dotaz"  instead), it looks like it didn't undergo the
translation. And so didn't the Title on the top of screen - it reads
"ATLAS: vyhledavani v Eeskem Internetu" instead of "ATLAS: vyhledavani v
Ceskem Internetu", with some accents. (The problem is that C with caron
changed to E in the work "Ceskem".) 

> (I am not even sure that the '@' key, together with the new
> -assume_charset etc. options, work.  I am sure there are situations were
> they don't.  More feedback requested.  I also don't know whether I have
> messed up the CJK charset handling.)  

Well, as you see even I didn't know. At first I thought everything
worked right and only needs to be saveable, but I learned that the
-assume_charset switch doesn't work at all and that some things don't get 
translated.. :-) 

BTW, if there's someone else from non US-ASCII countries (Drazen?) could
he/she tell us what his/her opinion on this all is?

Hynek

--
Hynek Med, address@hidden



;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]