lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Character set support 2


From: Klaus Weide
Subject: Re: LYNX-DEV Character set support 2
Date: Wed, 14 May 1997 18:39:15 -0500 (CDT)

On Wed, 14 May 1997, Michael Sokolov wrote:

>    This is a continuation of the character set support discussion. Klaus
> was right when he said that KOI8-R "probably" is not what I'm looking for.
> I'm NOT using Windows (this is lynx-dev, not netscape-dev), and I want to
> use the encoding that all Russian DOS users use. It's even called "DOS
> Cyrillic Encoding" in vernacular.

I find the number of character sets used for cyrillic, and the various
names associated with them (vernacular or official) rather confusing.
There's various KOI's, I don't know whether there is one "alt" or many,
and whether or not one of them is the same as ""DOS Cyrillic Encoding",
and so on.  So I go by published specs (the KOI8-R RFC, ISO-8859-5 table
available from many sources) where I think I know what they're talking
about and where it is clear to me what the name means, and leave the rest
to those who want to actually use it.

>    I did install the current developmental version, and I did appreciate
> the improvements in character set support. However, the main problem
> remains unsolved. I have looked at the README* files and tables in
> src/charsets, but what I haven't seen anywhere is a description of how the
> new character set support system handles high control codes used in the IBM
> PC character set and many others.
>    I have said in the previous request that some WWW sites (most actually)
> don't bother to specify a character set in the HTTP response. The only way
> to support such sites is to use raw mode. Also the character sets are so
> numerous and some of them are so weird that supported all of them
> transparently is an unrealistic dream. Therefore, the correct
> implementation of raw mode is the most important thing for me and many
> other Russians users for now.
>    In particular, if the terminal character set selected in Options uses
> high control codes, Lynx must pass them unchanged in raw mode. 

That is the intention: "high control codes" (and 8-bit characters in
general) may be passed to the terminal when the selected display character
set allows it, but not when they are not displayable characters in that
display character set.  Note that "raw mode" doesn't mean "absolutely
raw, under all circumstances" - it never has meant that in the Lynx code
(as far as I know).  The details depend on the selected display character
set - like blocking of C1 control chars which are not allowed; also for
example for the "ISO Latin 1" setting (and other ISO Latin N) the bytes
0xA0 and 0xAD (corresponding to   and ­) are always treated
specially whether "raw mode" is in effect or not.

I take "raw mode" to mean: treat an incoming document (with unspecified
charset) as if it were in the charset which corresponds to the display
character set.  That seems to be the logical extension of what "raw" does
in the standard Lynx code, and still allows for recognizing characters
specially if appropriate (NBSP etc.).

If an incoming document *does* have a specified charset, then my view is
that "raw" shouldn't do much either way.  If the incoming charset agrees
with the currently selected display character set, the text should be
shown, "raw" doesn't make a difference.  If the incoming charset is
different from the display character set, Lynx should translate, whatever
"raw" says (and if you don't want that, change the display character set).

For more "rawness" there should be the "Transparent" pseudo-"display
character set" which can be selected from the Options screen.  I think it
doesn't quite live up to that idea yet (but you may want to test it and
share your observations).

> But that's
> not the current behavior in the current developmental version.

It is the intended behavior (as described above).  It may not quite work
that way yet. I will look at the URL you gave and try to see what's
happening there.  I am making some changes right now, they should appear
in the next version of the development code.

There haven't been many testers of these things, significant feedback has
come from one person (Hynek) and he is mostly interested in iso-8859-2 and
related character sets. I am mostly just using Latin-1 myself, so I may not 
notice when things don't work unless I am told about it...

   Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]