[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV new release?

From: Foteos Macrides
Subject: Re: LYNX-DEV new release?
Date: Sun, 17 Aug 1997 19:44:15 -0500 (EST)

Klaus Weide <address@hidden> wrote:
>On Sat, 16 Aug 1997, Foteos Macrides wrote:
>> Klaus Weide <address@hidden> wrote:
>> >[...]
>> >- Review and change string translation stuff in LYCharUtils.c, to finally
>> >  make all kinds of charsets work in attributes.  Pain ita.
>> >  Probably means lots of changes throughout HTML.c, possibly elsewhere.
>>      I did that long ago in the fotemods.  Did you find a problem with
>> it, or just not notice it?
>The change of function LYUnEscapeEntities from less than 500 lines to
>more than 1000 lines did not go unnoticed.  I admit I am intimidated
>by the pure size and number of levels of it.  So I have sucessfully
>managed to avoid looking at it in detail, so far.  Of course I can't
>really complain, I think my changes in SGML.c for chartrans don't look
>much better to someone else...  The function seems to work fine for what
>it does, as far as I have tested it (although not the UTF-8 output) on

        LYUnEscapeEntities() in the current fotemods reproduces, linearly,
so to speak, for a string passed to it, all of the things being done in
SGML.c and the functions it calls in HTML.c for named and numeric character
references as SGML_character() is called, byte-by-byte, for a text/html
input stream.  It is much more complicated than need be, I think, and is
intimidating, because all the macros and functions you're using in SGML.c
are "strung out", linearly, so to speak, so you do see the complexity and
redundencies directly.  I doubt, frankly, that anyone besides you or I
could understand either the SGML.c/HTML.c or LYCharUtils.c chartrans code
without investing at least a month or so trying to understand it, and that's
just because you and I already have spent at least that kind of time studying
it and/or writing it.  I did add a lot of comments in LYUnEscapeEntities so
that I can "figure it out again" reasonably quickly after I've been away from
it a while, and usually do that with an ALT value indentical to a text string
when I need to "figure out again" what all that chartrans stuff in SGML.c and
the functions it calls in HTML.c are actually doing.  I don't have the range
of terminals and charset options that would be needed to "see" the results
for most of the display character sets, but the byte streams created for ALT
values versus text appear to be the same (and should be, in theory, whether
or not they are actually "correct" in either case :).

>But the "problem" is that it solves only part of the problem.  It
>does translation of entities and numerical character references, but
>translation of raw bytes in a charset different from ISO-8859-1 is
>still not done correctly.  IOW you have generalized LYUnEscapeEntities,
>but LYExpandString also needs to be generalized.  Those two function
>are nearly always used together in HTML.c, like for example
>            if (current_char_set)
>                LYExpandString(&temp);
>            /*
>             *  Convert any HTML entities or decimal escaping. - FM
>             */
>            LYUnEscapeEntities(temp, TRUE, FALSE);
>(and the 'if (current_char_set)' criterium isn't really valid any
>more).  So my idea is to fold them into one function which would to
>all the required translations of a string, entities and NCRs and raw
>bytes; and (ideally) for all possible combinations of 'from' and 'to'
>charsets.  I started writing a function for that (well, the
>LYExpandString-corresponding part), but haven't finished or tested it.
>It is not trivial since there can be a lot of different cases for the
>kind of 'from' and 'to' encodings.

        You're mixing apples and oranges.  LYUnEscapeEntities() does
for named and numeric character references in attribute values the
kinds of things being done in the S_ero, S_cro, S_incro, S_entity and
related states of SGML_character().  LYExpandString() is called first,
and is for doing the homologous raw byte character conversions and/or
the accumulation and conversion of multibytes as on entries into
SGML_character().  It has not yet been updated for the chartrans
support.  You had returned from your extended disappearence before
I got to that, and I was assuming you'd be incorporating the fotemods'
LYUnEscapeEntities() into the development code, and moving on to
LYExpandString(), or at least generating "feedback" before I moved
on to it.  If you combine them into an LYFruitJuice(), you are likely
to lose flexibilty, and ability to see the parallels between the
text versus attribute value handling, and special things being done
in addition for attribute values when their context dictates special
handling (i.e., because they are not simply ALT values to be handled
as if just text).

>Can you remind me why we are doing all this from HTML.c, instead of in
>SGML.c?  I keep coming up with reasons and then forgetting or
>discarding them.  I think hidden (and other?) form fields are part of
>it, they should go untranslated.  (Which brings up another area that
>still should be dealt with better, labelling and/or translation of
>form submissions.) [...]

        That's not a simple request with a short answer, nor with
unabiguously "correct" answers possible, particularly with the latest
round of capitulations to "market forces" in the W3C HTML 4.0 draft
that are unlikely to by overturned.  Since I've written replies to that
question or it's equivalents before, perhaps it would be better to wait
until it's not likely to be just more spare time wasted due to forgetting,
or new developments, or differences in our judgments on how to deal with
bad situations. :)


 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]