lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Lynx character entity references fix


From: Klaus Weide
Subject: Re: lynx-dev Lynx character entity references fix
Date: Tue, 9 Mar 1999 12:45:29 -0600 (CST)

On Sun, 7 Mar 1999, Leonid Pauzner wrote:

> > On Fri, 5 Mar 1999, Leonid Pauzner wrote:
> 
> >> >      * From: Jacob Poon <address@hidden>
> >> >         - Fixed some typos in the old references. (fixed: b.delta)
> >> Thanks, I'm now working on old-style entities code, will integrate your 
> >> fix.
> 
> BTW, an interesting side effect found:
> if you look  /test/unicode.html with Lynx dev.19
> and set "display charset" to x-transparent
> you got a nice picture:
> I was not sure whether the chars < 128 would be converted properly (OK),
> but occusionally Latin1 chars got reverse translated to character entities
> and the original source was numeric entities!!!
> See around line 0x0100.
> This is due to my recent changes, no such things for 2.8.1.
> Apparently x-transparent should fallback unicodes to 7bit like CJK does
> but an interesting internal things became visible.

Do you mean this is good, bad, or just interesting?  Do you want to
leave it this way?  (I think it would be better to restore the
use-SevenBitApproximations behavior.)  Can you explain why this is
happening?

Also, I haven't seen a patch for the ifdef'd entities.h tables - did I just
miss it?

  ----

Among the previous changes (that are in dev.18/dev.19), the following
looks wrong.  In UC_con_set_trans():

    for (i = 0; i < UCInfo[UC_charset_in_hndl].num_n256; i++) {
        if ((j = UCInfo[UC_charset_in_hndl].unicount[i])) {
            ptrans[i] = *p;
            for (; j; j--) {
                p++;
            }
        } else {
            ptrans[i] = 0xfffd;
        }
    }

Here ptrans points to one of the four tables (slots) in translations[].
Your change leaves the table unchanged when it should be re-initialized.
So (to-Unicode translation for) one charset could effectively inherit
the translations for a completely different charset that used the same slot
before.

The closer equivalent to previous behavior would be to initialize all 256
elements to 0xfffd.

It *seems* that *currently* this code will never be called for any of the
charsets with num_n256==0 -- as long as they also have num_uni==0.
UC_con_set_trans() is only called from UC_MapGN(), and all calls to
UC_MapGN() are "protected" by a preceding

    if (!UCInfo[UChndl_in].num_uni)
        return -11;


   Klaus







reply via email to

[Prev in Thread] Current Thread [Next in Thread]