lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Lynx 2.7.1: broken multilaguage bookmarks


From: Klaus Weide
Subject: Re: LYNX-DEV Lynx 2.7.1: broken multilaguage bookmarks
Date: Wed, 27 Aug 1997 19:38:21 -0500 (CDT)

On Wed, 27 Aug 1997, Leonid Pauzner wrote:

> Lynx Version 2.7.1ac-0.49:
> 
> Then I save to bookmark a document with non us-ascii name
> between <TITLE> </TITLE> (really it was koi8-r with <META...> tug),
> it saves without any charset info (I check lynx_bookmark.html).
> 
> Therefore, titles in bookmark looks as garbage:
> they appears in ISOLatin-1, but display charset=koi8-r.
> try to bookmark  http://www.siber.com/sib/russify/ache/title.html
> 
> I got Lynx for DOS from http://www.fdisk.com/doslynx/wlynx/lynx_386.zip
> and it identifies itself from "Options menu" as Lynx Version 2.7.1ac-0.49

Yes, there is a problem, and I don't know what the best general solution
is.  However, there are workarounds, so it may be that for you, and most
people with "usual" requirements, it isn't really a problem once you know
what to do.

First, let me try to describe what Lynx does.  A document's TITLE is
immediately translated, when it is encountered, to the current Display
character set (as given on the Options screen).  Everything that is done
later with the Title is operating on this translated string.  That
includes writing to the Bookmark file after 'a' is pressed, and the
"Title: " prompt immediately before writing.  (So if you are using cp866
as display character set, Cyrillic characters would be written in cp866
encoding, not in KOI8-R encoding.)

When the bookmark file later is read in ('v' key), it is treated like any
other local file (for the purpose of character translation).  That means
that by default the bytes will be interpreted as ISO-8859-1 characters,
but this can be changed with -assume_local_charset=... on the command line
or ASSUME_LOCAL_CHARSET in the lynx.cfg file.  So if local files with
8-bit characters on your computer are always in charset XYZ, AND you are
using the Lynx display character set corresponding to XYZ, then just
set ASSUME_LOCAL_CHARSET to XYZ, and things should work as expected.

When the 'a' key _creates_ a _new_ bookmark file (i.e. when there was no
bookmark file before), Lynx writes a META tag at the top, with the current
"Display character set"'s charset.  This is only done if it would differ
from the current "ASSUME" default charset, and it is just "guessing"
that future additions to the bookmark file will probably use the same
character set.  If this guess is wrong, the bookmark file may have to
be changed by hand, to adjust or remove the <META ...> line.

All this doesn't solve the problem for people who want to add bookmark
entries in different charsets (which presumably means, pressing 'a' while
different "display character set" settings are in effect), or the general
problem of mixing character encodings in files.  That basically is a HTML
"limitation", since a HTML document can only have one document character
set and a file can have only one character encoding.  There are various
things that could be done, but I don't like any of them much:

- We could invent some incompatible, nonstandard extension to label the
  charset of a bookmark title (or generally, of anchor text) and use that
  in bookmark files, something like
   ...
   <a href="..." text-charset=ISO-8859-2>Title With Latin-2 8-bit Chars</a>
   <a href="..." text-charset=KOI8-R>Title With KOI8-R 8-bit Chars</a>
   ...
  and give up the idea that boomark files should be (basically) valid
  HTML files that can be understood by any competent HTML tool.
  (There is a CHARSET attribute in the i18n RFC and the HTML 4.0 draft,
  but it means something else.)

- We could translate titles (to always the same charset) before writing to
  the bookmark file.  To be general enough, such a target charset for
  bookmark file translation would have to be one that could encode all
  the characters from the various possible source charsets.  That means
  it should be Unicode/UTF-8.  Nice idea, but then the bookmark files
  couldn't be handled correctly by any other tools (editors etc.) that
  are not UTF-8 capable.  It also wouldn't work currently for CJK
  character set and others where there is no Unicode translation table or
  mechanism (Mac, NeXT...).

- As a variation of the last point, translate all non-ASCII characters to
  &#nnnn;character references before writing.  Again this wouldn't work
  for CJK character sets, and of course would make such text unreadable
  within an editor etc.

So I think if bookmarks should be usable by other programs (at least, a
text editor) in addition to Lynx, the current behavior is about the best
we can do without additional complications.

To summarize and extend,
- Set ASSUME_LOCAL_CHARSET to something appropriate (or use 
  `lynx -assume_[local_]charset')
- and/or change/add/delete META tag near top off bookmark file,
  if necessary, with an editor.
- If you must save titles in various character encodings, and don't
  always use the same setting for "display character set" (and maybe
  Raw flag?), try to use multiple bookmarks files, with different META
  tags if necessary (not tested).
- The simplest "solution" is not to accept the default proposed by Lynx,
  but at the "Title:" prompt after 'a' change it to something that will
  be displayed correctly when 'v'iewing the bookmark file.


   Klaus


;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]