lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Patch for "stopping when viewing a site" hang


From: Leonid Pauzner
Subject: Re: lynx-dev Patch for "stopping when viewing a site" hang
Date: Wed, 18 Aug 1999 13:37:13 +0400 (MSD)

18-Aug-99 02:59 Klaus Weide wrote:
> On Wed, 18 Aug 1999, Henry Nelson wrote:

>> > Could you please summarize the two problems you are talking about?
>>
>> They are both evident in the trace I sent:
>>    http://www.flora.org/lynx-dev/html/month0799/msg00564.html.

> Hmm, it was never clear to me that that thread was about a chartrans
> problem.  It was never mentioned that the problem only occurs with
> a CJK display character set, or with more than the given site.
> So I thought it was just a weird networking problem.  (Yes, you did
> say the same happened with a local copy...  I ignored that part....)

> I saw the "Unknown entity" line in the trace, but didn't think it was
> significant for the hanging.  I was wrong.

>>      HTML:begin_element[8]: adding style to stack - HeadingLeft
>>      SGML: Unknown entity 'reg' 174 -3

I was reporting that strange trace line to lynx-dev a couple of months ago
but nobody replayed; now I got a clear picture (it is ALT= specific etc.)

>> This is the one I (hope I) fixed.  I've been aware of it for a while,
>> but in general don't much care for those "extra" characters, so I didn't
>> pursue it until now.  In the past, EUC-JP (so as to not generalize) always
>> defaulted to the 7 bit approximations, and suddenly stopped doing so.
>> Not being a programmer, plus there seemingly being a *bunch* of dead code
>> lying around, it's hard for me to say, but it seems that someone didn't
>> think about what all LYCharSets[] in LYCharSets.c was doing.

> I haven't looked at your patch in detail (or tried it), I hope Leonid
> will.
I have already replayed on that letter separately (IMHO Henry's fix redundant).
As for 1999-03-04 changes you were asking to clarify, it was assumed

* chartrans: old-style declarations of charsets which do not have Unicode
  tables (CJK, x-transparent, also UTF-8) now moved from LYCharSets.c to
  UCdomap.h and now included with UCInit() in UCdomap.c in a standard way - LP
  (Please re-test CJK and UTF-8)

Yes, that time I was planing to remove "old" entities code like
>                   name = HTMLGetEntityName(code - 160);
from lynx entirely but postpone it for a while, sorry.
Seems you found a caveat.


>> > I tried to find out what happens to 'entities in the decimal 160-255 range'
>> > by setting display character set to a CJK one (I picked Korean, also
>> > tried EUC-JP), then
>> > loading <http://sol.slcc.edu/lynx/current/lynx2-8-3/test/ALT88592.html>.
>> > I got a lynx hanging(!) (looping?) in LYUCFullyTranslateString_1.
>>
>> Yes, Lynx is not in as good health as some would like to think (what I
>> was grumbling about the other day).  I assume you did not apply my patch,
>> and so you are seeing the second problem I refer to.

> Yes.

>> The last two lines
>> in the trace I sent, which is the last output before Lynx hangs, give a
>> hint to what is happening:
>>      SGML: Start <IMG>
>>      stop_curses: done.
>> If you have an entity which is unknown within an ALT string, Lynx will
>> hang.

> Well, only for some display character sets.  which means most people don't
> see the problem even when they try to reproduce it, unless they know the
> necessary condition.

>> Since my patch makes entities become "known", it ends up hiding
>> the real problem.

> Appended is a patch that solves the other (more severe) half of the
> problem.  The "hang" problem was caused by a combination of removing
> too much under 'case S_check_name' in 'LYUCFullyTranslateString_1',
> _and_ having having some Latin 1 character codes that are untranslatable
> to the display character set.

>> Another complicating factor is that if you had
>> "0x5c U+00a5" (gives me a true yen sign on a Japanese Windows machine)
>> instead of "U+00a5:YEN" in def7_uni.tbl you wouldn't be aware of the
>> problem either (alt="&yen;").

> 0x5c is '\' (backslash), does that mean Japanese Windows machines
> cannot display a backslash but show a yen sign instead???

>> "ALT88592.html" is sort of overkill :).
>> What tipped me off was:
>>         <img src="/design/pentium/qit/pix/pent1.gif" align="left" hspace="0"
>>         width="175" height="184" ALT="Pentium&#174; processor package">
>> That person trying to read the Chinese page was hanging on:
>>      <a href="http://www.educities.edu.tw/";><img src="/images/brand.gif"
>>         border="0" alt="&uml;&Egrave;&uml;&ocirc;&yen;&laquo;" WIDTH="61"
>>         HEIGHT="21"></a>
>>
>> Don't you just love that ascii art?

> Mother of bogosities.

>> __Henry
>>
>> BTW, on another topic, Lynx doesn't know about "hspace".  Is that okay?

> Just another unrecognized attribute, why should it matter?  Look at traces
> for other sites, they are often full of "SGML: Unknown attribute" lines.

>    Klaus


> Index: lynx2-8-3/src/LYCharUtils.c
> --- lynx2-8-3.old/src/LYCharUtils.c Sat, 26 Jun 1999 03:47:04 -0500 lynxdev
> +++ lynx2-8-3/src/LYCharUtils.c Wed, 18 Aug 1999 00:32:19 -0500 lynxdev
> @@ -2290,6 +2290,22 @@
>                   */
>                   state = S_got_outchar;
>                   break;
> +
> +                 /* The following disabled section doesn't make sense
> +                 ** any more.  It used to make sense in the past, when
> +                 ** S_check_named would look in "old style" tables
> +                 ** in addition to what it does now.
> +                 ** Disabling of going to S_check_name here prevents
> +                 ** endless looping between S_check_uni and S_check_names
> +                 ** states, which could occur here for Latin 1 codes
> +                 ** for some cs_to if they had no translation in that
> +                 ** cs_to.  Normally all cs_to *should* now have valid
> +                 ** translations via UCTransUniChar or UCTransUniCharStr
> +                 ** for all Latin 1 codes, so that we would not get here
> +                 ** anyway, and no loop could occur.  Still, if we *do*
> +                 ** get here, FALL THROUGH to case S_recover now.  - kw
> +                 */
> +#if 0
>                   /*
>                   **  If we get to here, convert and handle
>                   **  the character as a named entity. - FM
> @@ -2298,6 +2314,7 @@
>                   name = HTMLGetEntityName(code - 160);
>                   state = S_check_name;
>                   break;
> +#endif
>               }

>       case S_recover:






reply via email to

[Prev in Thread] Current Thread [Next in Thread]