lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Lynx character entity references fix


From: Klaus Weide
Subject: Re: lynx-dev Lynx character entity references fix
Date: Fri, 12 Mar 1999 00:54:29 -0600 (CST)

On Thu, 11 Mar 1999, Leonid Pauzner wrote:

> OK, changing of "assume charset" for unlabelled document gives the folowing
> (grep UC_MapGN from trace log):
> 
> UC_MapGN: Using 1 <- 26 (windows-1251)
> UC_MapGN: Using 1 <- 1 (iso-8859-15)
> UC_MapGN: Using 2 <- 2 (cp850)
> UC_MapGN: Using 1 <- 3 (windows-1252)
> UC_MapGN: Using 2 <- 4 (cp437)
> UC_MapGN: Using 1 <- 5 (dec-mcs)
> UC_MapGN: Using 2 <- 6 (macintosh)
> UC_MapGN: Using 1 <- 7 (next)
> UC_MapGN: Using 2 <- 8 (hp-roman8)

Ok, you have probably just now done more runtime testing of what switching
occurs than I ever did.  :)   Note that those TRACE lines only occur when
one of the four slots is changed (not when it is re-used).

> It is for "forward" translation and apparently slots #3 and #4 are not used.

Make that #0 and #3.

But #0 is used for iso-8859-1.  The four slots are initially set to fixed
tables, whose correspondence to charsets can be seen in UCdomap.h:

    CONST char *UC_GNsetMIMEnames[4] =
        {"iso-8859-1", "x-dec-graphics", "cp437", "x-transparent"};

The first and last one are then never changed.  This code in UC_MapGN()
flips between using (and changing) the two middle ones:

        if (UC_lastautoGN == GRAF_MAP) {
            Gn = IBMPC_MAP;
        } else {
            Gn = GRAF_MAP;
        }

So it's a primitive caching scheme.  As long as one switches between
a set of documents with two different charsets, or three different
charsets of which one is iso-8859-1, no re-initializing of the
tables that depend on charset->Unicode mapping is necessary.

The "x-transparent" slot may never actually get used, and I am not sure
whether that ever was the case.

> > Not invented by me, taken from the original linux code.

And so are the initial contents of the four slots - they are given by the
four hardwired tables in UCdomap.c.

That means that there is some very minimal support for doing some forward
translations before any reading of the .tbl data, but this may never be
used (since chartrans initialization occurs early), and changes in other
functions may be necessary.

> >> So I just "add" num_n256 so things works without index overrun
> >> (and hopefully with a proper result) and postpone more UCDomap.c changes
> >> for dev.Next - patch from your side really welcome :-)
> 
> > Are changes necessary, and for what purpose?
> Removing of num_n256 staff gives core dump at startup.

How about just testing for (UCInfo[UC_charset_in_hndl].unicount == NULL)
in the places where you use UCInfo[UC_charset_in_hndl].num_n256?
That should make num_n256 unnecessary, and shows more directly what
you are trying to avoid - access of unicount and unitable tables that are
not there.

> Another way may be to set UChndl = -1 in LYRegister_with_LYCharSets()
> to simulate "old" style behaviour (but not for utf-8).
> All UCTrans* functions preserved by UChndl >= 0 check.

UChndl = -1 used to have a useful meaning: that a character set
is known to "old method", but not known to "new method".  This was
used in the UCCan* functions (therefore also in UCSetTransParams())
[and maybe other places?] With your changes (as of dev.19) UChndl != -1
has become invariantly true (except in the case of some internal error).
So you had to make some changes in UCCan* / UCNeedNot*.  [Hey I'm sure
you know all this; I'm kind of recapitulating for myself.]

In general your changes seem to aim at simplifying things (with the
final goal to get completely rid of "old" stuff?) and and at making
things clearer.  I think using UChndl = -1 to mean something else than
it used to doesn't make things clearer though.

I leave it to you to find the best way (and reserve right to complain...)

There are just too many combinations of settings/flags that can occur,
N document charsets X N display charsets X where (plain text,html text,
ALT text,HREF) X raw flag X <???>, it's near impossible to systematically
test all these cases when some internal changes are made.  Well that's my
excuse for not wanting to change too much (if it ain't broke don't fix
it).  [The other excuse is to keep things as flexible as possible (leaving
some stuff in that is "currently" unused or underused - for "one day" using
it), but otoh getting rid of some redundancy is a worthwhile goal...]


      Klaus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]