[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev lynx and other character sets

From: David Woolley
Subject: Re: lynx-dev lynx and other character sets
Date: Sun, 27 Jun 1999 10:25:11 +0100 (BST)

> 1. What does lynx do about character-sets other than US-ASCII?  Are they

The default transmission character set for HTML over HTTP is 
ISO 8859/1, not US-ASCII.  Internally, and for the purposes of entity
values, it is Unicode (in particular, ’ is an invalid Unicode
character not some sort of directed quote mark).

> supported?  I mean things like Cyrillic or Greek.

It translates from the document character set to the declared display
character set, within the limitations of the latter, and with some
ability to override the former for those pages where (an implied)
ISO 8859/1 really means my favourite local character set.  It may
also be configurable to cope with cases where an explicit CP-1252
means the font mapping of my favourite Windows font, but I think it
may honour the explicit character set instead.

It is limited to the character sets for which people have written
translation tables.

It doesn't support the abuse of fonts to misrepesent the transmitted
character codes, most often seen with Symbol.

> 2. Is there any work on implementing BIDI tags (<BDO> and dir) in HTML4.0?
> Do you think this can be done without redesigning the engine?

Note that <BDO and dir= are not necessary for bidirectional text in 
HTML4; the browser is expected to know the natural direction for
every Unicode character which has a unique direction (HTML is always
Unicode from an internal processing point of view) and to honour Unicode
direction overrides.  It is probably desirable to have dir= on the HTML
tag to allow for direction neutral characters (it is also reccommended
to have lang=, but I never see it!)

BDO is only needed for material where characters of well defined 
directionality violate the Unicode rules.

The point here is that most of the complexity arises from bi-directional
text, not from the associated HTML elements and attributes.

> 3. What about Unicode? Is there any program to support it somehow and in
> some way?

That's nebulous.  Windows NT supports Unicode, but the UK version won't
display Hebrew characters.  Lynx supports it internally and might even
drive a Unicode terminal, but Unicdode terminals with the full character
set are rare.  (Linux can be put into Unicode mode, but can only display
the subset corresponding to the loaded code page; I don't think that
it does BIDI.)

Actually, in spite of the lack of BIDI, Lynx's support of non-Western
European character sets was strong before this became an issue with the
GUI browsers; one of the prime movers appears to have greek ancestry,
there is a significant contributor to the list from the CIS and
there is a native English speaker living in Japan and using Lynx for
Japanese pages.  Although one person on the list for a long time had
an Indian name, there seems to be no interest in Indic scripts, and
I'm not aware of anyone interested in Korean, Arabic, Hebrew, or to
the extent that it is not covered by Japanese support, Chinese.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]