lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV ac81 chartrans problems


From: Klaus Weide
Subject: Re: LYNX-DEV ac81 chartrans problems
Date: Mon, 13 Oct 1997 10:26:00 -0500 (CDT)

On Mon, 13 Oct 1997, Leonid Pauzner wrote:

> Klaus,
> 
> I just tested ac81 for DOS, there are several chartrans problems found.
> They are not new, but now "time".

(What do you mean by 'now "time"' ?)

> 1) if I fill up the input form from keyboard
> few characters in cyrillic not displayed properly (cp866 keyb&display):
> cyrillic small "p" replaced by "/"
> cyrillic small "a" replaced by space
> cyrillic caps "Che" replaced by nothing
> cyrillic caps "yer~I" override the next coming character

I'll look into those, but please give more info: what was the charset of
the document you were looking at?  was it explicit (META etc.) or
"assumed"?  A local file?  Try to give all the relevant settings.
Also if it's on the web give a URL; even if you see the problem
in "all documents" giving a URL doesn't hurt.

> 2) Are you sure in "A)ssume charset if unknown" (Options)?
> There are several "assume charset" in command line (from help):
>     -assume_charset=MIMEname  charset for documents that don't specify it
>     -assume_local_charset=MIMEname  charset assumed for local files
>     -assume_unrec_charset=MIMEname  use this instead of unrecognized charsets

I am not sure of much :) but I think I don't understand your question.
The Options Screen "A)ssume charset if unknown" corresponds to the first
of those three command line options, not the third.  Well it should, if
it doesn't work as you shink it should please report in detail.  I don't
think I'll squeeze the other two on the Options Screen...

> Seems we need definitely another then assume_unrec_charset
> (there is almost no such pages for Lynx
> because of a huge list of chartrans you supported).
> Instead, a lot of real pages don't specify its charset
> but "keep in mind" not ISOLatin1, witch is default.
> So, I asked you about assume_charset: "Assume charset if not specified".

Maybe just the wording confuses you?
  "if not specified"  ==  "if unknown"  (-assume_charset)
  "if not recognized"  ==  specified somehow but we don't understand it
                           (-assume_unrec_charset), doesn't occur normally

> 3) if I "print" the html source to local file,
> it saves in "current display charset",
> but META charset (if exist on the page) not changed.
> Therefore later I got messed up html file.

Yes.  You have to change it by hand.
Of course if people would specify the charset in the HTTP header where it
belongs, instead of a META tag which is just a hack, this problem wouldn't
occur in this form...

I have an idea how do work around that.  It probably won't be perfect.

It's just a fact of life that, if you copy a HTML document to your local
disk, not everything will work as in the original context.  Especially
if you use 'P' instead of 'D'.  Another thing that needs adjustment after
you have copied a HTML document from the web is the BASE.  Lynx works
around that by prepending a <BASE HREF=...> line but this creates invalid
documents so even though it appears to "work" you should modify the
resulting file (put the BASE in the HEAD element where it belongs), and 
the some hack wouldn't work for "META ... charset" anyway (unless if there
are two "META ... charset" tags the second doesn't have any effect, I
haven't really checked that).

> 4) a known problem with history mechanism for "assumed" charset
> while browsing the source "/".
> I think "history" should come from the state if you press "/" again,
> not from any other file witch was looked before, as you explain:

(/ == \)

That might require a significant change :(
It wouldn't work anyway for "lynx -source" since then there never IS any
previous charset information that could be remembered.
Another problem that would not occur if people did specify the charset
where it belongs.

> > If a "charset" is only in a META tag, Lynx can only know about it if the
> > HTML is parsed and the META is interpreted.  Normally Lynx would forget
> > what it knows about a document when you switch to SOURCE with '\', since
> > it has to be reloaded, and then the normal ASSUME_* would be in effect.
> >
> > As a partial workaround, there is some "remember the charset from the last
> > time" when '\' is used, but it is not complete.  It does not work if Lynx
> > _totally_ forgets about a loaded text before reloading, and whether that
> > happens depends on various things in the wwwlib mechanism which keeps
> > track of documents and links between them, and which didn't change from
> > previous versions.  Approximately, if there are links from other loaded
> > documents to the current one, then the "total forgetting" does not happen.
> > So if you have just followed a link to a HTML text, and then type '\',
> > Lynx should use the same charset; if you went to the current doc with 'g'
> > and there are no links to it from where you were before, or if it is the
> > startup (first) document, Lynx doesn't remember the previous charset.
> >
> > Is this acceptable?
> >
> > As a workaround, you could try to go to the text whose source you want to
> > see through a link; adding it to a bookmark file (maybe temporarily) and
> > then going from there should have this effect, also going through the
> > 'V'isited Links Page (but not the History Page), also, if it is a local
> > file, going through a directory listing (but not if local dired is in
> > effect since there is some extra expiring going on - I think this caveat
> > doesn't apply for Lynx386 since dired is not compiled in (?)).


    Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]