lynx-dev lynx2.8.2dev.19 patch #5 (long

lynx-dev
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
lynx-dev lynx2.8.2dev.19 patch #5 (long - entities and more...)

From:	Leonid Pauzner
Subject:	lynx-dev lynx2.8.2dev.19 patch #5 (long - entities and more...)
Date:	Wed, 10 Mar 1999 04:04:44 +0300 (MSK)
This patch also include

Subject: lynx-dev Lynx character entity references fix
>From: Jacob Poon <address@hidden>
Date: Thu, 4 Mar 1999 16:38:43 -0500
  and
>From: address@hidden
Subject: Re: lynx-dev lynx2.8.2dev.18 UCdomap.c on OS/390
Date: Fri, 5 Mar 1999 13:07:50 -0700 (MST)


* entities.h: clean HTML4.0 entities table added, it is #ifdef'ed with
  ENTITIES_HTML40_ONLY (may be useful for page validation),
  file entities.h moved to src/chrtrans directory - LP
* entities.h: added support for &euro, fixed duplicate &loz definitions,
  fixed b.delta mapping - Jacob Poon
* save few KB of static memory by storing unicodes as 'u16' (was 'long') - LP
* trace log toggle now really interruptable - LP
* UCDomap.h: fix expanding of &#1234 for x-transparent display charset - LP
* Remove non-ANSI (struct unimapdesc_str) cast and use of
  initializers in expressions from UCdomap.h - PG


Please move entities.h file from WWW tree to src/chrtrans
before applaying this patch.



diff -u old/src/chrtrans/caselowe.h ./src/chrtrans/caselowe.h
--- old/src/chrtrans/caselowe.h Wed Jan 13 03:37:34 1999
+++ ./src/chrtrans/caselowe.h   Fri Mar  5 11:26:42 1999
@@ -21,10 +21,11 @@

  */

+#include <UCkd.h> /* typedef u16 */

 typedef struct {
-       long upper;
-       long lower;
+       u16 upper;
+       u16 lower;
 } unipair;

 static CONST unipair unicode_to_lower_case[] =

diff -u old/src/chrtrans/entities.h ./src/chrtrans/entities.h
--- old/src/chrtrans/entities.h Sat Dec 12 20:10:36 1998
+++ ./src/chrtrans/entities.h   Wed Mar 10 02:50:24 1999
@@ -1,33 +1,332 @@
 /*     Entity Names to Unicode table
-**     -----------------------------
-**
-*
-*      Whole entities[] thing (and much more) now present
-*      in this kind of structure.  The structured streams to which
-*      the SGML modules sends its output could then easily have access
-*      to both entity names and unicode values for each (special)
-*      character.  Probably the whole translation to display characters
-*      should be done at that later stage (e.g., in HTML.c).
-*      What's missing is a way for the later stage to return info
-*      to SGML whether the entity could be displayed or not.
-*      (like between SGML_character() and handle_entity() via FoundEntity.)
-*      Well, trying to do that now.
-*      Why keep two structures for entities?  Backward compatibility..
-*/
-
-#ifndef ENTITIES_H
-#define ENTITIES_H 1
-
-#include <HTUtils.h>
-#include <SGML.h>
-
-/* UC_entity_info structure is defined in SGML.h.
-   This has to be sorted alphabetically (case-sensitive),
-   bear this in mind when you add some more entities..  */
+ *     -----------------------------
+ *
+ *     This is a one-way mapping to Unicode so chartrans implementation
+ *     now process character entities like &nbsp the similar way it handles
+ *     the numeric entities like &#123.
+ *     The only call to this structure is via HTMLGetEntityUCValue().
+ *

-/*
+Unlike the numeric entities &#234 which may be for any Unicode character,
+the character references should be defined within HTML standards
+to get a compatibility between browsers.

-This table available from ftp://ftp.unicode.org/
+Now we have a choice: use clean HTML4.0 entities list
+(and reject everithing others), or use a relaxed list
+with lots of synonyms and new symbols found at
+ftp://ftp.unicode.org/MAPPINGS/VENDORS/MISC/SGML.TXT
+
+We hold both: #define ENTITIES_HTML40_ONLY for strict version,
+otherwise relaxed.
+
+ */
+
+#include <UCkd.h> /* typedef u16 */
+typedef struct {
+    char* name; /* sorted alphabetically (case-sensitive) */
+    u16 code;
+} UC_entity_info;
+
+static CONST UC_entity_info unicode_entities[] =
+
+
+#ifdef ENTITIES_HTML40_ONLY
+/*********************************************************************
+
+   The full list of character references defined as part of HTML 4.0.
+   http://www.w3.org/TR/PR-html40/sgml/entities.html
+
+   Informal history:
+   * ISO Latin 1 entities for 160-255 range were introduced in HTML 2.0
+   * few important entities were added, including &lt, &gt, &amp.
+   * Greek letters and some math symbols were finally added in HTML 4.0
+
+   Totally 252 entries (Nov 1997 HTML 4.0 draft), it is 1:1 mapping.
+   Please do not add more unless a new HTML version will be released,
+   try the #else table for experiments and fun...
+
+****/
+{
+ {"AElig",  198}, /* latin capital letter AE = latin capital ligature AE, 
U+00C6 ISOlat1 */
+ {"Aacute", 193}, /* latin capital letter A with acute, U+00C1 ISOlat1 */
+ {"Acirc",  194}, /* latin capital letter A with circumflex, U+00C2 ISOlat1 */
+ {"Agrave", 192}, /* latin capital letter A with grave = latin capital letter 
A grave, U+00C0 ISOlat1 */
+ {"Alpha",    913}, /* greek capital letter alpha, U+0391 */
+ {"Aring",  197}, /* latin capital letter A with ring above = latin capital 
letter A ring, U+00C5 ISOlat1 */
+ {"Atilde", 195}, /* latin capital letter A with tilde, U+00C3 ISOlat1 */
+ {"Auml",   196}, /* latin capital letter A with diaeresis, U+00C4 ISOlat1 */
+ {"Beta",     914}, /* greek capital letter beta, U+0392 */
+ {"Ccedil", 199}, /* latin capital letter C with cedilla, U+00C7 ISOlat1 */
+ {"Chi",      935}, /* greek capital letter chi, U+03A7 */
+ {"Dagger",  8225}, /* double dagger, U+2021 ISOpub */
+ {"Delta",    916}, /* greek capital letter delta, U+0394 ISOgrk3 */
+ {"ETH",    208}, /* latin capital letter ETH, U+00D0 ISOlat1 */
+ {"Eacute", 201}, /* latin capital letter E with acute, U+00C9 ISOlat1 */
+ {"Ecirc",  202}, /* latin capital letter E with circumflex, U+00CA ISOlat1 */
+ {"Egrave", 200}, /* latin capital letter E with grave, U+00C8 ISOlat1 */
+ {"Epsilon",  917}, /* greek capital letter epsilon, U+0395 */
+ {"Eta",      919}, /* greek capital letter eta, U+0397 */
+ {"Euml",   203}, /* latin capital letter E with diaeresis, U+00CB ISOlat1 */
+ {"Gamma",    915}, /* greek capital letter gamma, U+0393 ISOgrk3 */
+ {"Iacute", 205}, /* latin capital letter I with acute, U+00CD ISOlat1 */
+ {"Icirc",  206}, /* latin capital letter I with circumflex, U+00CE ISOlat1 */
+ {"Igrave", 204}, /* latin capital letter I with grave, U+00CC ISOlat1 */
+ {"Iota",     921}, /* greek capital letter iota, U+0399 */
+ {"Iuml",   207}, /* latin capital letter I with diaeresis, U+00CF ISOlat1 */
+ {"Kappa",    922}, /* greek capital letter kappa, U+039A */
+ {"Lambda",   923}, /* greek capital letter lambda, U+039B ISOgrk3 */
+ {"Mu",       924}, /* greek capital letter mu, U+039C */
+ {"Ntilde", 209}, /* latin capital letter N with tilde, U+00D1 ISOlat1 */
+ {"Nu",       925}, /* greek capital letter nu, U+039D */
+ {"OElig",   338}, /* latin capital ligature OE, U+0152 ISOlat2 */
+ {"Oacute", 211}, /* latin capital letter O with acute, U+00D3 ISOlat1 */
+ {"Ocirc",  212}, /* latin capital letter O with circumflex, U+00D4 ISOlat1 */
+ {"Ograve", 210}, /* latin capital letter O with grave, U+00D2 ISOlat1 */
+ {"Omega",    937}, /* greek capital letter omega, U+03A9 ISOgrk3 */
+ {"Omicron",  927}, /* greek capital letter omicron, U+039F */
+ {"Oslash", 216}, /* latin capital letter O with stroke = latin capital letter 
O slash, U+00D8 ISOlat1 */
+ {"Otilde", 213}, /* latin capital letter O with tilde, U+00D5 ISOlat1 */
+ {"Ouml",   214}, /* latin capital letter O with diaeresis, U+00D6 ISOlat1 */
+ {"Phi",      934}, /* greek capital letter phi, U+03A6 ISOgrk3 */
+ {"Pi",       928}, /* greek capital letter pi, U+03A0 ISOgrk3 */
+ {"Prime",    8243}, /* double prime = seconds = inches, U+2033 ISOtech */
+ {"Psi",      936}, /* greek capital letter psi, U+03A8 ISOgrk3 */
+ {"Rho",      929}, /* greek capital letter rho, U+03A1 */
+ {"Scaron",  352}, /* latin capital letter S with caron, U+0160 ISOlat2 */
+/* there is no Sigmaf, and no U+03A2 character either */
+ {"Sigma",    931}, /* greek capital letter sigma, U+03A3 ISOgrk3 */
+ {"THORN",  222}, /* latin capital letter THORN, U+00DE ISOlat1 */
+ {"Tau",      932}, /* greek capital letter tau, U+03A4 */
+ {"Theta",    920}, /* greek capital letter theta, U+0398 ISOgrk3 */
+ {"Uacute", 218}, /* latin capital letter U with acute, U+00DA ISOlat1 */
+ {"Ucirc",  219}, /* latin capital letter U with circumflex, U+00DB ISOlat1 */
+ {"Ugrave", 217}, /* latin capital letter U with grave, U+00D9 ISOlat1 */
+ {"Upsilon",  933}, /* greek capital letter upsilon, U+03A5 ISOgrk3 */
+ {"Uuml",   220}, /* latin capital letter U with diaeresis, U+00DC ISOlat1 */
+ {"Xi",       926}, /* greek capital letter xi, U+039E ISOgrk3 */
+ {"Yacute", 221}, /* latin capital letter Y with acute, U+00DD ISOlat1 */
+ {"Yuml",    376}, /* latin capital letter Y with diaeresis, U+0178 ISOlat2 */
+ {"Zeta",     918}, /* greek capital letter zeta, U+0396 */
+ {"aacute", 225}, /* latin small letter a with acute, U+00E1 ISOlat1 */
+ {"acirc",  226}, /* latin small letter a with circumflex, U+00E2 ISOlat1 */
+ {"acute",  180}, /* acute accent = spacing acute, U+00B4 ISOdia */
+ {"aelig",  230}, /* latin small letter ae = latin small ligature ae, U+00E6 
ISOlat1 */
+ {"agrave", 224}, /* latin small letter a with grave = latin small letter a 
grave, U+00E0 ISOlat1 */
+ {"alefsym",  8501}, /* alef symbol = first transfinite cardinal, U+2135 NEW */
+/* alef symbol is NOT the same as hebrew letter alef, U+05D0 although the same 
glyph could be used to depict both characters */
+ {"alpha",    945}, /* greek small letter alpha, U+03B1 ISOgrk3 */
+ {"amp",     38}, /* ampersand, U+0026 ISOnum */
+ {"and",      8743}, /* logical and = wedge, U+2227 ISOtech */
+ {"ang",      8736}, /* angle, U+2220 ISOamso */
+ {"aring",  229}, /* latin small letter a with ring above = latin small letter 
a ring, U+00E5 ISOlat1 */
+ {"asymp",    8776}, /* almost equal to = asymptotic to, U+2248 ISOamsr */
+ {"atilde", 227}, /* latin small letter a with tilde, U+00E3 ISOlat1 */
+ {"auml",   228}, /* latin small letter a with diaeresis, U+00E4 ISOlat1 */
+ {"bdquo",   8222}, /* double low-9 quotation mark, U+201E NEW */
+ {"beta",     946}, /* greek small letter beta, U+03B2 ISOgrk3 */
+ {"brvbar", 166}, /* broken bar = broken vertical bar, U+00A6 ISOnum */
+ {"bull",     8226}, /* bullet = black small circle, U+2022 ISOpub  */
+/* bullet is NOT the same as bullet operator, U+2219 */
+ {"cap",      8745}, /* intersection = cap, U+2229 ISOtech */
+ {"ccedil", 231}, /* latin small letter c with cedilla, U+00E7 ISOlat1 */
+ {"cedil",  184}, /* cedilla = spacing cedilla, U+00B8 ISOdia */
+ {"cent",   162}, /* cent sign, U+00A2 ISOnum */
+ {"chi",      967}, /* greek small letter chi, U+03C7 ISOgrk3 */
+ {"circ",    710}, /* modifier letter circumflex accent, U+02C6 ISOpub */
+ {"clubs",    9827}, /* black club suit = shamrock, U+2663 ISOpub */
+ {"cong",     8773}, /* approximately equal to, U+2245 ISOtech */
+ {"copy",   169}, /* copyright sign, U+00A9 ISOnum */
+ {"crarr",    8629}, /* downwards arrow with corner leftwards = carriage 
return, U+21B5 NEW */
+ {"cup",      8746}, /* union = cup, U+222A ISOtech */
+ {"curren", 164}, /* currency sign, U+00A4 ISOnum */
+ {"dArr",     8659}, /* downwards double arrow, U+21D3 ISOamsa */
+ {"dagger",  8224}, /* dagger, U+2020 ISOpub */
+ {"darr",     8595}, /* downwards arrow, U+2193 ISOnum */
+ {"deg",    176}, /* degree sign, U+00B0 ISOnum */
+ {"delta",    948}, /* greek small letter delta, U+03B4 ISOgrk3 */
+ {"diams",    9830}, /* black diamond suit, U+2666 ISOpub */
+ {"divide", 247}, /* division sign, U+00F7 ISOnum */
+ {"eacute", 233}, /* latin small letter e with acute, U+00E9 ISOlat1 */
+ {"ecirc",  234}, /* latin small letter e with circumflex, U+00EA ISOlat1 */
+ {"egrave", 232}, /* latin small letter e with grave, U+00E8 ISOlat1 */
+ {"empty",    8709}, /* empty set = null set = diameter, U+2205 ISOamso */
+ {"emsp",    8195}, /* em space, U+2003 ISOpub */
+ {"ensp",    8194}, /* en space, U+2002 ISOpub */
+ {"epsilon",  949}, /* greek small letter epsilon, U+03B5 ISOgrk3 */
+ {"equiv",    8801}, /* identical to, U+2261 ISOtech */
+ {"eta",      951}, /* greek small letter eta, U+03B7 ISOgrk3 */
+ {"eth",    240}, /* latin small letter eth, U+00F0 ISOlat1 */
+ {"euml",   235}, /* latin small letter e with diaeresis, U+00EB ISOlat1 */
+ {"euro",   8364}, /* euro sign, U+20AC NEW */
+ {"exist",    8707}, /* there exists, U+2203 ISOtech */
+ {"fnof",     402}, /* latin small f with hook = function = florin, U+0192 
ISOtech */
+ {"forall",   8704}, /* for all, U+2200 ISOtech */
+ {"frac12", 189}, /* vulgar fraction one half = fraction one half, U+00BD 
ISOnum */
+ {"frac14", 188}, /* vulgar fraction one quarter = fraction one quarter, 
U+00BC ISOnum */
+ {"frac34", 190}, /* vulgar fraction three quarters = fraction three quarters, 
U+00BE ISOnum */
+ {"frasl",    8260}, /* fraction slash, U+2044 NEW */
+ {"gamma",    947}, /* greek small letter gamma, U+03B3 ISOgrk3 */
+ {"ge",       8805}, /* greater-than or equal to, U+2265 ISOtech */
+ {"gt",      62}, /* greater-than sign, U+003E ISOnum */
+ {"hArr",     8660}, /* left right double arrow, U+21D4 ISOamsa */
+ {"harr",     8596}, /* left right arrow, U+2194 ISOamsa */
+ {"hearts",   9829}, /* black heart suit = valentine, U+2665 ISOpub */
+ {"hellip",   8230}, /* horizontal ellipsis = three dot leader, U+2026 ISOpub  
*/
+ {"iacute", 237}, /* latin small letter i with acute, U+00ED ISOlat1 */
+ {"icirc",  238}, /* latin small letter i with circumflex, U+00EE ISOlat1 */
+ {"iexcl",  161}, /* inverted exclamation mark, U+00A1 ISOnum */
+ {"igrave", 236}, /* latin small letter i with grave, U+00EC ISOlat1 */
+ {"image",    8465}, /* blackletter capital I = imaginary part, U+2111 ISOamso 
*/
+ {"infin",    8734}, /* infinity, U+221E ISOtech */
+ {"int",      8747}, /* integral, U+222B ISOtech */
+ {"iota",     953}, /* greek small letter iota, U+03B9 ISOgrk3 */
+ {"iquest", 191}, /* inverted question mark = turned question mark, U+00BF 
ISOnum */
+ {"isin",     8712}, /* element of, U+2208 ISOtech */
+ {"iuml",   239}, /* latin small letter i with diaeresis, U+00EF ISOlat1 */
+ {"kappa",    954}, /* greek small letter kappa, U+03BA ISOgrk3 */
+ {"lArr",     8656}, /* leftwards double arrow, U+21D0 ISOtech */
+/* Unicode does not say that lArr is the same as the 'is implied by' arrow
+    but also does not have any other character for that function. So ? lArr can
+    be used for 'is implied by' as ISOtech suggests */
+ {"lambda",   955}, /* greek small letter lambda, U+03BB ISOgrk3 */
+ {"lang",     9001}, /* left-pointing angle bracket = bra, U+2329 ISOtech */
+/* lang is NOT the same character as U+003C 'less than' or U+2039 'single 
left-pointing angle quotation mark' */
+ {"laquo",  171}, /* left-pointing double angle quotation mark = left pointing 
guillemet, U+00AB ISOnum */
+ {"larr",     8592}, /* leftwards arrow, U+2190 ISOnum */
+ {"lceil",    8968}, /* left ceiling = apl upstile, U+2308 ISOamsc  */
+ {"ldquo",   8220}, /* left double quotation mark, U+201C ISOnum */
+ {"le",       8804}, /* less-than or equal to, U+2264 ISOtech */
+ {"lfloor",   8970}, /* left floor = apl downstile, U+230A ISOamsc  */
+ {"lowast",   8727}, /* asterisk operator, U+2217 ISOtech */
+ {"loz",      9674}, /* lozenge, U+25CA ISOpub */
+ {"lrm",     8206}, /* left-to-right mark, U+200E NEW RFC 2070 */
+ {"lsaquo",  8249}, /* single left-pointing angle quotation mark, U+2039 ISO 
proposed */
+/* lsaquo is proposed but not yet ISO standardised */
+ {"lsquo",   8216}, /* left single quotation mark, U+2018 ISOnum */
+ {"lt",      60}, /* less-than sign, U+003C ISOnum */
+ {"macr",   175}, /* macron = spacing macron = overline = APL overbar, U+00AF 
ISOdia */
+ {"mdash",   8212}, /* em dash, U+2014 ISOpub */
+ {"micro",  181}, /* micro sign, U+00B5 ISOnum */
+ {"middot", 183}, /* middle dot = Georgian comma = Greek middle dot, U+00B7 
ISOnum */
+ {"minus",    8722}, /* minus sign, U+2212 ISOtech */
+ {"mu",       956}, /* greek small letter mu, U+03BC ISOgrk3 */
+ {"nabla",    8711}, /* nabla = backward difference, U+2207 ISOtech */
+ {"nbsp",   160}, /* no-break space = non-breaking space, U+00A0 ISOnum */
+ {"ndash",   8211}, /* en dash, U+2013 ISOpub */
+ {"ne",       8800}, /* not equal to, U+2260 ISOtech */
+ {"ni",       8715}, /* contains as member, U+220B ISOtech */
+/* should there be a more memorable name than 'ni'? */
+ {"not",    172}, /* not sign = discretionary hyphen, U+00AC ISOnum */
+ {"notin",    8713}, /* not an element of, U+2209 ISOtech */
+ {"nsub",     8836}, /* not a subset of, U+2284 ISOamsn */
+ {"ntilde", 241}, /* latin small letter n with tilde, U+00F1 ISOlat1 */
+ {"nu",       957}, /* greek small letter nu, U+03BD ISOgrk3 */
+ {"oacute", 243}, /* latin small letter o with acute, U+00F3 ISOlat1 */
+ {"ocirc",  244}, /* latin small letter o with circumflex, U+00F4 ISOlat1 */
+ {"oelig",   339}, /* latin small ligature oe, U+0153 ISOlat2 */
+ {"ograve", 242}, /* latin small letter o with grave, U+00F2 ISOlat1 */
+ {"oline",    8254}, /* overline = spacing overscore, U+203E NEW */
+ {"omega",    969}, /* greek small letter omega, U+03C9 ISOgrk3 */
+ {"omicron",  959}, /* greek small letter omicron, U+03BF NEW */
+ {"oplus",    8853}, /* circled plus = direct sum, U+2295 ISOamsb */
+ {"or",       8744}, /* logical or = vee, U+2228 ISOtech */
+ {"ordf",   170}, /* feminine ordinal indicator, U+00AA ISOnum */
+ {"ordm",   186}, /* masculine ordinal indicator, U+00BA ISOnum */
+ {"oslash", 248}, /* latin small letter o with stroke, = latin small letter o 
slash, U+00F8 ISOlat1 */
+ {"otilde", 245}, /* latin small letter o with tilde, U+00F5 ISOlat1 */
+ {"otimes",   8855}, /* circled times = vector product, U+2297 ISOamsb */
+ {"ouml",   246}, /* latin small letter o with diaeresis, U+00F6 ISOlat1 */
+ {"para",   182}, /* pilcrow sign = paragraph sign, U+00B6 ISOnum */
+ {"part",     8706}, /* partial differential, U+2202 ISOtech  */
+ {"permil",  8240}, /* per mille sign, U+2030 ISOtech */
+ {"perp",     8869}, /* up tack = orthogonal to = perpendicular, U+22A5 
ISOtech */
+ {"phi",      966}, /* greek small letter phi, U+03C6 ISOgrk3 */
+ {"pi",       960}, /* greek small letter pi, U+03C0 ISOgrk3 */
+ {"piv",      982}, /* greek pi symbol, U+03D6 ISOgrk3 */
+ {"plusmn", 177}, /* plus-minus sign = plus-or-minus sign, U+00B1 ISOnum */
+ {"pound",  163}, /* pound sign, U+00A3 ISOnum */
+ {"prime",    8242}, /* prime = minutes = feet, U+2032 ISOtech */
+ {"prod",     8719}, /* n-ary product = product sign, U+220F ISOamsb */
+/* prod is NOT the same character as U+03A0 'greek capital letter pi' though 
the same glyph might be used for both */
+ {"prop",     8733}, /* proportional to, U+221D ISOtech */
+ {"psi",      968}, /* greek small letter psi, U+03C8 ISOgrk3 */
+ {"quot",    34}, /* quotation mark = APL quote, U+0022 ISOnum */
+ {"rArr",     8658}, /* rightwards double arrow, U+21D2 ISOtech */
+/* Unicode does not say this is the 'implies' character but does not have
+     another character with this function so ?
+     rArr can be used for 'implies' as ISOtech suggests */
+ {"radic",    8730}, /* square root = radical sign, U+221A ISOtech */
+ {"rang",     9002}, /* right-pointing angle bracket = ket, U+232A ISOtech */
+/* rang is NOT the same character as U+003E 'greater than' or U+203A 'single 
right-pointing angle quotation mark' */
+ {"raquo",  187}, /* right-pointing double angle quotation mark = right 
pointing guillemet, U+00BB ISOnum */
+ {"rarr",     8594}, /* rightwards arrow, U+2192 ISOnum */
+ {"rceil",    8969}, /* right ceiling, U+2309 ISOamsc  */
+ {"rdquo",   8221}, /* right double quotation mark, U+201D ISOnum */
+ {"real",     8476}, /* blackletter capital R = real part symbol, U+211C 
ISOamso */
+ {"reg",    174}, /* registered sign = registered trade mark sign, U+00AE 
ISOnum */
+ {"rfloor",   8971}, /* right floor, U+230B ISOamsc  */
+ {"rho",      961}, /* greek small letter rho, U+03C1 ISOgrk3 */
+ {"rlm",     8207}, /* right-to-left mark, U+200F NEW RFC 2070 */
+ {"rsaquo",  8250}, /* single right-pointing angle quotation mark, U+203A ISO 
proposed */
+/* rsaquo is proposed but not yet ISO standardised */
+ {"rsquo",   8217}, /* right single quotation mark, U+2019 ISOnum */
+ {"sbquo",   8218}, /* single low-9 quotation mark, U+201A NEW */
+ {"scaron",  353}, /* latin small letter s with caron, U+0161 ISOlat2 */
+ {"sdot",     8901}, /* dot operator, U+22C5 ISOamsb */
+/* dot operator is NOT the same character as U+00B7 middle dot */
+ {"sect",   167}, /* section sign, U+00A7 ISOnum */
+ {"shy",    173}, /* soft hyphen = discretionary hyphen, U+00AD ISOnum */
+ {"sigma",    963}, /* greek small letter sigma, U+03C3 ISOgrk3 */
+ {"sigmaf",   962}, /* greek small letter final sigma, U+03C2 ISOgrk3 */
+ {"sim",      8764}, /* tilde operator = varies with = similar to, U+223C 
ISOtech */
+/* tilde operator is NOT the same character as the tilde, U+007E, although the 
same glyph might be used to represent both */
+ {"spades",   9824}, /* black spade suit, U+2660 ISOpub */
+/* black here seems to mean filled as opposed to hollow */
+ {"sub",      8834}, /* subset of, U+2282 ISOtech */
+ {"sube",     8838}, /* subset of or equal to, U+2286 ISOtech */
+ {"sum",      8721}, /* n-ary sumation, U+2211 ISOamsb */
+/* sum is NOT the same character as U+03A3 'greek capital letter sigma' though 
the same glyph might be used for both */
+ {"sup",      8835}, /* superset of, U+2283 ISOtech */
+/* note that nsup, 'not a superset of, U+2283' is not covered by the Symbol
+     font encoding and is not included. Should it be, for symmetry?
+     It is in ISOamsn */
+ {"sup1",   185}, /* superscript one = superscript digit one, U+00B9 ISOnum */
+ {"sup2",   178}, /* superscript two = superscript digit two = squared, U+00B2 
ISOnum */
+ {"sup3",   179}, /* superscript three = superscript digit three = cubed, 
U+00B3 ISOnum */
+ {"supe",     8839}, /* superset of or equal to, U+2287 ISOtech */
+ {"szlig",  223}, /* latin small letter sharp s = ess-zed,  U+00DF ISOlat1 */
+ {"tau",      964}, /* greek small letter tau, U+03C4 ISOgrk3 */
+ {"there4",   8756}, /* therefore, U+2234 ISOtech */
+ {"theta",    952}, /* greek small letter theta, U+03B8 ISOgrk3 */
+ {"thetasym", 977}, /* greek small letter theta symbol, U+03D1 NEW */
+ {"thinsp",  8201}, /* thin space, U+2009 ISOpub */
+ {"thorn",  254}, /* latin small letter thorn with, U+00FE ISOlat1 */
+ {"tilde",   732}, /* small tilde, U+02DC ISOdia */
+ {"times",  215}, /* multiplication sign, U+00D7 ISOnum */
+ {"trade",    8482}, /* trade mark sign, U+2122 ISOnum */
+ {"uArr",     8657}, /* upwards double arrow, U+21D1 ISOamsa */
+ {"uacute", 250}, /* latin small letter u with acute, U+00FA ISOlat1 */
+ {"uarr",     8593}, /* upwards arrow, U+2191 ISOnum */
+ {"ucirc",  251}, /* latin small letter u with circumflex, U+00FB ISOlat1 */
+ {"ugrave", 249}, /* latin small letter u with grave, U+00F9 ISOlat1 */
+ {"uml",    168}, /* diaeresis = spacing diaeresis, U+00A8 ISOdia */
+ {"upsih",    978}, /* greek upsilon with hook symbol, U+03D2 NEW */
+ {"upsilon",  965}, /* greek small letter upsilon, U+03C5 ISOgrk3 */
+ {"uuml",   252}, /* latin small letter u with diaeresis, U+00FC ISOlat1 */
+ {"weierp",   8472}, /* script capital P = power set = Weierstrass p, U+2118 
ISOamso */
+ {"xi",       958}, /* greek small letter xi, U+03BE ISOgrk3 */
+ {"yacute", 253}, /* latin small letter y with acute, U+00FD ISOlat1 */
+ {"yen",    165}, /* yen sign = yuan sign, U+00A5 ISOnum */
+ {"yuml",   255}, /* latin small letter y with diaeresis, U+00FF ISOlat1 */
+ {"zeta",     950}, /* greek small letter zeta, U+03B6 ISOgrk3 */
+ {"zwj",     8205}, /* zero width joiner, U+200D NEW RFC 2070 */
+ {"zwnj",    8204}, /* zero width non-joiner, U+200C NEW RFC 2070 */
+};
+
+#else /* not ENTITIES_HTML40_ONLY: */
+/***************************************************************************
+
+This table prepared from ftp://ftp.unicode.org/MAPPINGS/VENDORS/MISC/SGML.TXT
 original comment follows:


@@ -50,26 +349,44 @@
 # set DTD) in Column 4.  The mapping is not reversible, because many
 # distinctions are unified away in Unicode, particularly between
 # mathematical symbols.
-#
-# The table is sorted case-blind by SGML character entity name.
-#
-# The contents of this table are drawn from various sources, and
-# are in the public domain.
-#
-########################
+

    We just sort it and move column 2 away (line too long, sorry;
    look at sgml.html in test/ directory for details).
-   Also we add a few (obsolete) synonyms:
-   "brkbar"  for "brvbar" 0x00A6
-   "emdash"  for "mdash" 0x2014
-   "endash"  for "ndash" 0x2013
-   "hibar"  for "macr" 0x00AF
-   for exact compatibility with entities[] and previous bevavior.
-   BTW, lots of synonyms found in this table, we shouldn't worry about...
-*/
+
+Changes:
+   * Add few (obsolete) synonyms for compatibility with Lynx/2.5 and up:
+          "brkbar"  for "brvbar" 0x00A6
+          "emdash"  for "mdash" 0x2014
+          "endash"  for "ndash" 0x2013
+          "hibar"  for "macr" 0x00AF
+     BTW, lots of synonyms found in this table, we shouldn't worry about...
+     Totally around 1000 entries.
+
+
+Modified by Jacob Poon <address@hidden>
+
+This table is modified improve support of HTML 4.0 character entity references,
+including Euro symbol support ("euro" 0x20AC).
+
+Known issues:
+
+The original table includes two different definitions of &loz; reference.
+Since HTML 4.0 only uses U+25CA, the U+2727 definition is commented out,
+until there is a good reason to put it back in.
+
+"b.delta" mapping fixed (was 0x03B3 = small gamma).
+
+At the end of the table, there are several unnumbered, commented references.
+These are not defined in HTML 4.0, and will remain so until they are defined
+in future SGML/HTML standards.
+
+The support for obsolete references are for backwards compatibility only.  New
+SGML/HTML documents should not depend on these references just because Lynx can
+display them.

-static CONST UC_entity_info unicode_entities[] = {
+****/
+{
   {"AElig",    0x00C6},  /* LATIN CAPITAL LETTER AE                       */
   {"Aacgr",    0x0386},  /* GREEK CAPITAL LETTER ALPHA WITH TONOS         */
   {"Aacute",   0x00C1},  /* LATIN CAPITAL LETTER A WITH ACUTE             */
@@ -326,7 +643,7 @@
   {"b.alpha",  0x03B1},  /* GREEK SMALL LETTER ALPHA                      */
   {"b.beta",   0x03B2},  /* GREEK SMALL LETTER BETA                       */
   {"b.chi",    0x03C7},  /* GREEK SMALL LETTER CHI                        */
-  {"b.delta",  0x03B3},  /* GREEK SMALL LETTER GAMMA                      */
+  {"b.delta",  0x03B4},  /* GREEK SMALL LETTER DELTA                      */
   {"b.epsi",   0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
   {"b.epsis",  0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
   {"b.epsiv",  0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
@@ -532,6 +849,7 @@
   {"eta",      0x03B7},  /* GREEK SMALL LETTER ETA                        */
   {"eth",      0x00F0},  /* LATIN SMALL LETTER ETH                        */
   {"euml",     0x00EB},  /* LATIN SMALL LETTER E WITH DIAERESIS           */
+  {"euro",     0x20AC},  /* EURO SIGN                                     */
   {"excl",     0x0021},  /* EXCLAMATION MARK                              */
   {"exist",    0x2203},  /* THERE EXISTS                                  */
   {"fcy",      0x0444},  /* CYRILLIC SMALL LETTER EF                      */
@@ -679,7 +997,8 @@
   {"lowast",   0x2217},  /* ASTERISK OPERATOR                             */
   {"lowbar",   0x005F},  /* LOW LINE                                      */
   {"loz",      0x25CA},  /* LOZENGE                                       */
-  {"loz",      0x2727},  /* WHITE FOUR POINTED STAR                       */
+ /*  {"loz",   0x2727},  WHITE FOUR POINTED STAR                        */
+ /* Warning: Duplicated &loz; entry.  HTML 4,0 defines it as U+25CA. */
   {"lozf",     0x2726},  /* BLACK FOUR POINTED STAR                       */
   {"lpar",     0x0028},  /* LEFT PARENTHESIS                              */
   {"lrarr2",   0x21C6},  /* LEFTWARDS ARROW OVER RIGHTWARDS ARROW         */
@@ -1089,4 +1408,4 @@
 /* {"smid",    0x????},  shortmid                               # ISOamsr */
 };

-#endif /* ENTITIES_H */
+#endif /* not ENTITIES_HTML40_ONLY */

diff -u old/samples/sgml.htm ./samples/sgml.htm
--- old/samples/sgml.htm        Sat Dec 12 20:10:36 1998
+++ ./samples/sgml.htm  Sun Mar  7 22:26:18 1999
@@ -38,7 +38,15 @@
 # The contents of this table are drawn from various sources, and
 # are in the public domain.
 #
+<!-- Changes:
++   {"euro",    0x20AC},  /* EURO SIGN                                     */
+    {"loz",     0x25CA},  /* LOZENGE                                       */
+! /*  {"loz",   0x2727},  WHITE FOUR POINTED STAR                          */
+!  /* Warning: Duplicated &loz; entry.  HTML 4,0 defines it as U+25CA. */
+-   {"b.delta", 0x03B3},  /* GREEK SMALL LETTER GAMMA                      */
++   {"b.delta", 0x03B4},  /* GREEK SMALL LETTER DELTA                      */

+-->

 This test illuminating SGML character entities implementation in your browser.
 We sort the entities according to unicode numbers.
@@ -394,12 +402,12 @@
 0x03B2    &b.beta;           ISOgrk4   # GREEK SMALL LETTER BETA
 0x03B2    &beta;             ISOgrk3   # GREEK SMALL LETTER BETA
 0x03B2    &bgr;              ISOgrk1   # GREEK SMALL LETTER BETA
-0x03B3    &b.delta;          ISOgrk4   # GREEK SMALL LETTER GAMMA
 0x03B3    &b.gamma;          ISOgrk4   # GREEK SMALL LETTER GAMMA
 0x03B3    &gamma;            ISOgrk3   # GREEK SMALL LETTER GAMMA
 0x03B3    &ggr;              ISOgrk1   # GREEK SMALL LETTER GAMMA
 0x03B4    &delta;            ISOgrk3   # GREEK SMALL LETTER DELTA
 0x03B4    &dgr;              ISOgrk1   # GREEK SMALL LETTER DELTA
+0x03B4    &b.delta;          ISOgrk4   # GREEK SMALL LETTER DELTA
 0x03B5    &b.epsi;           ISOgrk4   # GREEK SMALL LETTER EPSILON
 0x03B5    &b.epsis;          ISOgrk4   # GREEK SMALL LETTER EPSILON
 0x03B5    &b.epsiv;          ISOgrk4   # GREEK SMALL LETTER EPSILON
@@ -625,6 +633,7 @@
 0x2041    &caret;            ISOpub    # CARET INSERTION POINT
 0x2043    &hybull;           ISOpub    # HYPHEN BULLET
 0x2044    &frasl;            HTMLsymbol        # FRACTION SLASH
+0x20AC    &euro;             new       # EURO SIGN
 0x20DB    &tdot;             ISOtech   # COMBINING THREE DOTS ABOVE
 0x20DC    &DotDot;           ISOtech   # COMBINING FOUR DOTS ABOVE
 0x2105    &incare;           ISOpub    # CARE OF
@@ -1037,7 +1046,7 @@
 0x2717    &cross;            ISOpub    # BALLOT X
 0x2720    &malt;             ISOpub    # MALTESE CROSS
 0x2726    &lozf;             ISOpub    # BLACK FOUR POINTED STAR
-0x2727    &loz;              ISOpub    # WHITE FOUR POINTED STAR
+<!-- 0x2727    &loz;         ISOpub    # WHITE FOUR POINTED STAR -->
 0x2736    &sext;             ISOpub    # SIX POINTED BLACK STAR
 0x????    &epsiv;            ISOgrk3   # variant epsilon
 0x????    &fjlig;            ISOpub    # fj ligature
diff -u old/samples/unicode.htm ./samples/unicode.htm
--- old/samples/unicode.htm     Thu Feb  5 09:00:20 1998
+++ ./samples/unicode.htm       Mon Mar  8 21:24:12 1999
@@ -38,11 +38,13 @@
 # The contents of this table are drawn from various sources, and
 # are in the public domain.
 #
+<!-- Changes:
++   {"euro",    0x20AC},  /* EURO SIGN                                     */

+-->

 This test is illuminated Unicode numeric entities like &amp;#x22AB;
 We sort the entities according to unicode numbers.
-(Sorry, many lines duplicated).
 You should see visible characters if your display character set support them
 or some substitution string picked up from  src/chrtrans/def7_uni.tbl

@@ -92,8 +94,6 @@
 0x00A6    &#x00A6;             # BROKEN BAR
 0x00A7    &#x00A7;             # SECTION SIGN
 0x00A8    &#x00A8;             # DIAERESIS
-0x00A8    &#x00A8;             # DIAERESIS
-0x00A8    &#x00A8;             # DIAERESIS
 0x00A9    &#x00A9;             # COPYRIGHT SIGN
 0x00AA    &#x00AA;             # FEMININE ORDINAL INDICATOR
 0x00AB    &#x00AB;             # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
@@ -115,7 +115,6 @@
 0x00BB    &#x00BB;             # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
 0x00BC    &#x00BC;             # VULGAR FRACTION ONE QUARTER
 0x00BD    &#x00BD;             # VULGAR FRACTION ONE HALF
-0x00BD    &#x00BD;             # VULGAR FRACTION ONE HALF
 0x00BE    &#x00BE;             # VULGAR FRACTION THREE QUARTERS
 0x00BF    &#x00BF;             # INVERTED QUESTION MARK
 0x00C0    &#x00C0;             # LATIN CAPITAL LETTER A WITH GRAVE
@@ -324,64 +323,28 @@
 0x038F    &#x038F;             # GREEK CAPITAL LETTER OMEGA WITH TONOS
 0x0390    &#x0390;             # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND 
TONOS
 0x0391    &#x0391;             # GREEK CAPITAL LETTER ALPHA
-0x0391    &#x0391;             # GREEK CAPITAL LETTER ALPHA
-0x0392    &#x0392;             # GREEK CAPITAL LETTER BETA
 0x0392    &#x0392;             # GREEK CAPITAL LETTER BETA
 0x0393    &#x0393;             # GREEK CAPITAL LETTER GAMMA
-0x0393    &#x0393;             # GREEK CAPITAL LETTER GAMMA
-0x0393    &#x0393;             # GREEK CAPITAL LETTER GAMMA
-0x0394    &#x0394;             # GREEK CAPITAL LETTER DELTA
-0x0394    &#x0394;             # GREEK CAPITAL LETTER DELTA
 0x0394    &#x0394;             # GREEK CAPITAL LETTER DELTA
 0x0395    &#x0395;             # GREEK CAPITAL LETTER EPSILON
-0x0395    &#x0395;             # GREEK CAPITAL LETTER EPSILON
-0x0396    &#x0396;             # GREEK CAPITAL LETTER ZETA
 0x0396    &#x0396;             # GREEK CAPITAL LETTER ZETA
 0x0397    &#x0397;             # GREEK CAPITAL LETTER ETA
-0x0397    &#x0397;             # GREEK CAPITAL LETTER ETA
-0x0398    &#x0398;             # GREEK CAPITAL LETTER THETA
-0x0398    &#x0398;             # GREEK CAPITAL LETTER THETA
 0x0398    &#x0398;             # GREEK CAPITAL LETTER THETA
 0x0399    &#x0399;             # GREEK CAPITAL LETTER IOTA
-0x0399    &#x0399;             # GREEK CAPITAL LETTER IOTA
-0x039A    &#x039A;             # GREEK CAPITAL LETTER KAPPA
 0x039A    &#x039A;             # GREEK CAPITAL LETTER KAPPA
 0x039B    &#x039B;             # GREEK CAPITAL LETTER LAMDA
-0x039B    &#x039B;             # GREEK CAPITAL LETTER LAMDA
-0x039B    &#x039B;             # GREEK CAPITAL LETTER LAMDA
 0x039C    &#x039C;             # GREEK CAPITAL LETTER MU
-0x039C    &#x039C;             # GREEK CAPITAL LETTER MU
-0x039D    &#x039D;             # GREEK CAPITAL LETTER NU
 0x039D    &#x039D;             # GREEK CAPITAL LETTER NU
 0x039E    &#x039E;             # GREEK CAPITAL LETTER XI
-0x039E    &#x039E;             # GREEK CAPITAL LETTER XI
-0x039E    &#x039E;             # GREEK CAPITAL LETTER XI
-0x039F    &#x039F;             # GREEK CAPITAL LETTER OMICRON
 0x039F    &#x039F;             # GREEK CAPITAL LETTER OMICRON
 0x03A0    &#x03A0;             # GREEK CAPITAL LETTER PI
-0x03A0    &#x03A0;             # GREEK CAPITAL LETTER PI
-0x03A0    &#x03A0;             # GREEK CAPITAL LETTER PI
 0x03A1    &#x03A1;             # GREEK CAPITAL LETTER RHO
-0x03A1    &#x03A1;             # GREEK CAPITAL LETTER RHO
-0x03A3    &#x03A3;             # GREEK CAPITAL LETTER SIGMA
 0x03A3    &#x03A3;             # GREEK CAPITAL LETTER SIGMA
-0x03A3    &#x03A3;             # GREEK CAPITAL LETTER SIGMA
-0x03A4    &#x03A4;             # GREEK CAPITAL LETTER TAU
 0x03A4    &#x03A4;             # GREEK CAPITAL LETTER TAU
 0x03A5    &#x03A5;             # GREEK CAPITAL LETTER UPSILON
-0x03A5    &#x03A5;             # GREEK CAPITAL LETTER UPSILON
-0x03A5    &#x03A5;             # GREEK CAPITAL LETTER UPSILON
-0x03A5    &#x03A5;             # GREEK CAPITAL LETTER UPSILON
-0x03A6    &#x03A6;             # GREEK CAPITAL LETTER PHI
 0x03A6    &#x03A6;             # GREEK CAPITAL LETTER PHI
-0x03A6    &#x03A6;             # GREEK CAPITAL LETTER PHI
-0x03A7    &#x03A7;             # GREEK CAPITAL LETTER CHI
 0x03A7    &#x03A7;             # GREEK CAPITAL LETTER CHI
 0x03A8    &#x03A8;             # GREEK CAPITAL LETTER PSI
-0x03A8    &#x03A8;             # GREEK CAPITAL LETTER PSI
-0x03A8    &#x03A8;             # GREEK CAPITAL LETTER PSI
-0x03A9    &#x03A9;             # GREEK CAPITAL LETTER OMEGA
-0x03A9    &#x03A9;             # GREEK CAPITAL LETTER OMEGA
 0x03A9    &#x03A9;             # GREEK CAPITAL LETTER OMEGA
 0x03AA    &#x03AA;             # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
 0x03AB    &#x03AB;             # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
@@ -391,105 +354,41 @@
 0x03AF    &#x03AF;             # GREEK SMALL LETTER IOTA WITH TONOS
 0x03B0    &#x03B0;             # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND 
TONOS
 0x03B1    &#x03B1;             # GREEK SMALL LETTER ALPHA
-0x03B1    &#x03B1;             # GREEK SMALL LETTER ALPHA
-0x03B1    &#x03B1;             # GREEK SMALL LETTER ALPHA
-0x03B2    &#x03B2;             # GREEK SMALL LETTER BETA
 0x03B2    &#x03B2;             # GREEK SMALL LETTER BETA
-0x03B2    &#x03B2;             # GREEK SMALL LETTER BETA
-0x03B3    &#x03B3;             # GREEK SMALL LETTER GAMMA
 0x03B3    &#x03B3;             # GREEK SMALL LETTER GAMMA
-0x03B3    &#x03B3;             # GREEK SMALL LETTER GAMMA
-0x03B3    &#x03B3;             # GREEK SMALL LETTER GAMMA
-0x03B4    &#x03B4;             # GREEK SMALL LETTER DELTA
 0x03B4    &#x03B4;             # GREEK SMALL LETTER DELTA
 0x03B5    &#x03B5;             # GREEK SMALL LETTER EPSILON
-0x03B5    &#x03B5;             # GREEK SMALL LETTER EPSILON
-0x03B5    &#x03B5;             # GREEK SMALL LETTER EPSILON
-0x03B5    &#x03B5;             # GREEK SMALL LETTER EPSILON
-0x03B5    &#x03B5;             # GREEK SMALL LETTER EPSILON
-0x03B5    &#x03B5;             # GREEK SMALL LETTER EPSILON
-0x03B6    &#x03B6;             # GREEK SMALL LETTER ZETA
 0x03B6    &#x03B6;             # GREEK SMALL LETTER ZETA
-0x03B6    &#x03B6;             # GREEK SMALL LETTER ZETA
-0x03B7    &#x03B7;             # GREEK SMALL LETTER ETA
-0x03B7    &#x03B7;             # GREEK SMALL LETTER ETA
 0x03B7    &#x03B7;             # GREEK SMALL LETTER ETA
 0x03B8    &#x03B8;             # GREEK SMALL LETTER THETA
-0x03B8    &#x03B8;             # GREEK SMALL LETTER THETA
-0x03B8    &#x03B8;             # GREEK SMALL LETTER THETA
-0x03B8    &#x03B8;             # GREEK SMALL LETTER THETA
-0x03B9    &#x03B9;             # GREEK SMALL LETTER IOTA
 0x03B9    &#x03B9;             # GREEK SMALL LETTER IOTA
-0x03B9    &#x03B9;             # GREEK SMALL LETTER IOTA
-0x03BA    &#x03BA;             # GREEK SMALL LETTER KAPPA
 0x03BA    &#x03BA;             # GREEK SMALL LETTER KAPPA
-0x03BA    &#x03BA;             # GREEK SMALL LETTER KAPPA
-0x03BB    &#x03BB;             # GREEK SMALL LETTER LAMDA
 0x03BB    &#x03BB;             # GREEK SMALL LETTER LAMDA
-0x03BB    &#x03BB;             # GREEK SMALL LETTER LAMDA
-0x03BC    &#x03BC;             # GREEK SMALL LETTER MU
 0x03BC    &#x03BC;             # GREEK SMALL LETTER MU
-0x03BC    &#x03BC;             # GREEK SMALL LETTER MU
-0x03BD    &#x03BD;             # GREEK SMALL LETTER NU
 0x03BD    &#x03BD;             # GREEK SMALL LETTER NU
-0x03BD    &#x03BD;             # GREEK SMALL LETTER NU
-0x03BE    &#x03BE;             # GREEK SMALL LETTER XI
 0x03BE    &#x03BE;             # GREEK SMALL LETTER XI
-0x03BE    &#x03BE;             # GREEK SMALL LETTER XI
-0x03BF    &#x03BF;             # GREEK SMALL LETTER OMICRON
 0x03BF    &#x03BF;             # GREEK SMALL LETTER OMICRON
 0x03C0    &#x03C0;             # GREEK SMALL LETTER PI
-0x03C0    &#x03C0;             # GREEK SMALL LETTER PI
-0x03C0    &#x03C0;             # GREEK SMALL LETTER PI
-0x03C1    &#x03C1;             # GREEK SMALL LETTER RHO
 0x03C1    &#x03C1;             # GREEK SMALL LETTER RHO
-0x03C1    &#x03C1;             # GREEK SMALL LETTER RHO
-0x03C2    &#x03C2;             # GREEK SMALL LETTER FINAL SIGMA
-0x03C2    &#x03C2;             # GREEK SMALL LETTER FINAL SIGMA
-0x03C2    &#x03C2;             # GREEK SMALL LETTER FINAL SIGMA
 0x03C2    &#x03C2;             # GREEK SMALL LETTER FINAL SIGMA
 0x03C3    &#x03C3;             # GREEK SMALL LETTER SIGMA
-0x03C3    &#x03C3;             # GREEK SMALL LETTER SIGMA
-0x03C3    &#x03C3;             # GREEK SMALL LETTER SIGMA
-0x03C4    &#x03C4;             # GREEK SMALL LETTER TAU
 0x03C4    &#x03C4;             # GREEK SMALL LETTER TAU
-0x03C4    &#x03C4;             # GREEK SMALL LETTER TAU
-0x03C5    &#x03C5;             # GREEK SMALL LETTER UPSILON
 0x03C5    &#x03C5;             # GREEK SMALL LETTER UPSILON
-0x03C5    &#x03C5;             # GREEK SMALL LETTER UPSILON
-0x03C5    &#x03C5;             # GREEK SMALL LETTER UPSILON
-0x03C6    &#x03C6;             # GREEK SMALL LETTER PHI
-0x03C6    &#x03C6;             # GREEK SMALL LETTER PHI
 0x03C6    &#x03C6;             # GREEK SMALL LETTER PHI
-0x03C6    &#x03C6;             # GREEK SMALL LETTER PHI
-0x03C7    &#x03C7;             # GREEK SMALL LETTER CHI
-0x03C7    &#x03C7;             # GREEK SMALL LETTER CHI
 0x03C7    &#x03C7;             # GREEK SMALL LETTER CHI
 0x03C8    &#x03C8;             # GREEK SMALL LETTER PSI
-0x03C8    &#x03C8;             # GREEK SMALL LETTER PSI
-0x03C8    &#x03C8;             # GREEK SMALL LETTER PSI
-0x03C9    &#x03C9;             # GREEK SMALL LETTER OMEGA
 0x03C9    &#x03C9;             # GREEK SMALL LETTER OMEGA
 0x03CA    &#x03CA;             # GREEK SMALL LETTER IOTA WITH DIALYTIKA
 0x03CB    &#x03CB;             # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
 0x03CC    &#x03CC;             # GREEK SMALL LETTER OMICRON WITH TONOS
-0x03CD    &#x03CD;             # GREEK SMALL LETTER UPSILON WITH TONOS
-0x03CE    &#x03CE;             # GREEK SMALL LETTER OMEGA WITH TONOS
 0x03CE    &#x03CE;             # GREEK SMALL LETTER OMEGA WITH TONOS
 0x03D1    &#x03D1;             # GREEK THETA SYMBOL
-0x03D1    &#x03D1;             # GREEK THETA SYMBOL
-0x03D1    &#x03D1;             # GREEK THETA SYMBOL
 0x03D2    &#x03D2;             # GREEK UPSILON WITH HOOK SYMBOL
 0x03D5    &#x03D5;             # GREEK PHI SYMBOL
-0x03D5    &#x03D5;             # GREEK PHI SYMBOL
-0x03D6    &#x03D6;             # GREEK PI SYMBOL
 0x03D6    &#x03D6;             # GREEK PI SYMBOL
 0x03DC    &#x03DC;             # GREEK LETTER DIGAMMA
-0x03DC    &#x03DC;             # GREEK LETTER DIGAMMA
-0x03F0    &#x03F0;             # GREEK KAPPA SYMBOL
 0x03F0    &#x03F0;             # GREEK KAPPA SYMBOL
 0x03F1    &#x03F1;             # GREEK RHO SYMBOL
-0x03F1    &#x03F1;             # GREEK RHO SYMBOL
 0x0401    &#x0401;             # CYRILLIC CAPITAL LETTER IO
 0x0402    &#x0402;             # CYRILLIC CAPITAL LETTER DJE
 0x0403    &#x0403;             # CYRILLIC CAPITAL LETTER GJE
@@ -627,6 +526,7 @@
 0x2041    &#x2041;             # CARET INSERTION POINT
 0x2043    &#x2043;             # HYPHEN BULLET
 0x2044    &#x2044;             # FRACTION SLASH
+0x20AC    &#x20AC;             # EURO SIGN
 0x20DB    &#x20DB;             # COMBINING THREE DOTS ABOVE
 0x20DC    &#x20DC;             # COMBINING FOUR DOTS ABOVE
 0x2105    &#x2105;             # CARE OF
@@ -668,8 +568,6 @@
 0x2192    &#x2192;             # RIGHTWARDS ARROW
 0x2193    &#x2193;             # DOWNWARDS ARROW
 0x2194    &#x2194;             # LEFT RIGHT ARROW
-0x2194    &#x2194;             # LEFT RIGHT ARROW
-0x2194    &#x2194;             # LEFT RIGHT ARROW
 0x2195    &#x2195;             # UP DOWN ARROW
 0x2196    &#x2196;             # NORTH WEST ARROW
 0x2197    &#x2197;             # NORTH EAST ARROW
@@ -716,13 +614,10 @@
 0x21CE    &#x21CE;             # LEFT RIGHT DOUBLE ARROW WITH STROKE
 0x21CF    &#x21CF;             # RIGHTWARDS DOUBLE ARROW WITH STROKE
 0x21D0    &#x21D0;             # LEFTWARDS DOUBLE ARROW
-0x21D0    &#x21D0;             # LEFTWARDS DOUBLE ARROW
 0x21D1    &#x21D1;             # UPWARDS DOUBLE ARROW
 0x21D2    &#x21D2;             # RIGHTWARDS DOUBLE ARROW
-0x21D2    &#x21D2;             # RIGHTWARDS DOUBLE ARROW
 0x21D3    &#x21D3;             # DOWNWARDS DOUBLE ARROW
 0x21D4    &#x21D4;             # LEFT RIGHT DOUBLE ARROW
-0x21D4    &#x21D4;             # LEFT RIGHT DOUBLE ARROW
 0x21D5    &#x21D5;             # UP DOWN DOUBLE ARROW
 0x21DA    &#x21DA;             # LEFTWARDS TRIPLE ARROW
 0x21DB    &#x21DB;             # RIGHTWARDS TRIPLE ARROW
@@ -740,19 +635,15 @@
 0x220D    &#x220D;             # SMALL CONTAINS AS MEMBER
 0x220F    &#x220F;             # N-ARY PRODUCT
 0x2210    &#x2210;             # N-ARY COPRODUCT
-0x2210    &#x2210;             # N-ARY COPRODUCT
-0x2210    &#x2210;             # N-ARY COPRODUCT
 0x2211    &#x2211;             # N-ARY SUMMATION
 0x2212    &#x2212;             # MINUS SIGN
 0x2213    &#x2213;             # MINUS-OR-PLUS SIGN
 0x2214    &#x2214;             # DOT PLUS
 0x2216    &#x2216;             # SET MINUS
-0x2216    &#x2216;             # SET MINUS
 0x2217    &#x2217;             # ASTERISK OPERATOR
 0x2218    &#x2218;             # RING OPERATOR
 0x221A    &#x221A;             # SQUARE ROOT
 0x221D    &#x221D;             # PROPORTIONAL TO
-0x221D    &#x221D;             # PROPORTIONAL TO
 0x221E    &#x221E;             # INFINITY
 0x221F    &#x221F;             # RIGHT ANGLE
 0x2220    &#x2220;             # ANGLE
@@ -761,8 +652,6 @@
 0x2223    &#x2223;             # DIVIDES
 0x2224    &#x2224;             # DOES NOT DIVIDE
 0x2225    &#x2225;             # PARALLEL TO
-0x2225    &#x2225;             # PARALLEL TO
-0x2226    &#x2226;             # NOT PARALLEL TO
 0x2226    &#x2226;             # NOT PARALLEL TO
 0x2227    &#x2227;             # LOGICAL AND
 0x2228    &#x2228;             # LOGICAL OR
@@ -773,7 +662,6 @@
 0x2234    &#x2234;             # THEREFORE
 0x2235    &#x2235;             # BECAUSE
 0x223C    &#x223C;             # TILDE OPERATOR
-0x223C    &#x223C;             # TILDE OPERATOR
 0x223D    &#x223D;             # REVERSED TILDE
 0x2240    &#x2240;             # WREATH PRODUCT
 0x2241    &#x2241;             # NOT TILDE
@@ -782,8 +670,6 @@
 0x2245    &#x2245;             # APPROXIMATELY EQUAL TO
 0x2247    &#x2247;             # NEITHER APPROXIMATELY NOR ACTUALLY EQUAL TO
 0x2248    &#x2248;             # ALMOST EQUAL TO
-0x2248    &#x2248;             # ALMOST EQUAL TO
-0x2248    &#x2248;             # ALMOST EQUAL TO
 0x2249    &#x2249;             # NOT ALMOST EQUAL TO
 0x224A    &#x224A;             # ALMOST EQUAL OR EQUAL TO
 0x224C    &#x224C;             # ALL EQUAL TO
@@ -803,16 +689,10 @@
 0x2261    &#x2261;             # IDENTICAL TO
 0x2262    &#x2262;             # NOT IDENTICAL TO
 0x2264    &#x2264;             # LESS-THAN OR EQUAL TO
-0x2264    &#x2264;             # LESS-THAN OR EQUAL TO
-0x2265    &#x2265;             # GREATER-THAN OR EQUAL TO
 0x2265    &#x2265;             # GREATER-THAN OR EQUAL TO
 0x2266    &#x2266;             # LESS-THAN OVER EQUAL TO
 0x2267    &#x2267;             # GREATER-THAN OVER EQUAL TO
 0x2268    &#x2268;             # LESS-THAN BUT NOT EQUAL TO
-0x2268    &#x2268;             # LESS-THAN BUT NOT EQUAL TO
-0x2268    &#x2268;             # LESS-THAN BUT NOT EQUAL TO
-0x2269    &#x2269;             # GREATER-THAN BUT NOT EQUAL TO
-0x2269    &#x2269;             # GREATER-THAN BUT NOT EQUAL TO
 0x2269    &#x2269;             # GREATER-THAN BUT NOT EQUAL TO
 0x226A    &#x226A;             # MUCH LESS-THAN
 0x226B    &#x226B;             # MUCH GREATER-THAN
@@ -820,8 +700,6 @@
 0x226E    &#x226E;             # NOT LESS-THAN
 0x226F    &#x226F;             # NOT GREATER-THAN
 0x2270    &#x2270;             # NEITHER LESS-THAN NOR EQUAL TO
-0x2270    &#x2270;             # NEITHER LESS-THAN NOR EQUAL TO
-0x2271    &#x2271;             # NEITHER GREATER-THAN NOR EQUAL TO
 0x2271    &#x2271;             # NEITHER GREATER-THAN NOR EQUAL TO
 0x2272    &#x2272;             # LESS-THAN OR EQUIVALENT TO
 0x2273    &#x2273;             # GREATER-THAN OR EQUIVALENT TO
@@ -830,8 +708,6 @@
 0x227A    &#x227A;             # PRECEDES
 0x227B    &#x227B;             # SUCCEEDS
 0x227C    &#x227C;             # PRECEDES OR EQUAL TO
-0x227C    &#x227C;             # PRECEDES OR EQUAL TO
-0x227D    &#x227D;             # SUCCEEDS OR EQUAL TO
 0x227D    &#x227D;             # SUCCEEDS OR EQUAL TO
 0x227E    &#x227E;             # PRECEDES OR EQUIVALENT TO
 0x227F    &#x227F;             # SUCCEEDS OR EQUIVALENT TO
@@ -842,20 +718,10 @@
 0x2284    &#x2284;             # NOT A SUBSET OF
 0x2285    &#x2285;             # NOT A SUPERSET OF
 0x2286    &#x2286;             # SUBSET OF OR EQUAL TO
-0x2286    &#x2286;             # SUBSET OF OR EQUAL TO
-0x2287    &#x2287;             # SUPERSET OF OR EQUAL TO
 0x2287    &#x2287;             # SUPERSET OF OR EQUAL TO
 0x2288    &#x2288;             # NEITHER A SUBSET OF NOR EQUAL TO
-0x2288    &#x2288;             # NEITHER A SUBSET OF NOR EQUAL TO
-0x2289    &#x2289;             # NEITHER A SUPERSET OF NOR EQUAL TO
 0x2289    &#x2289;             # NEITHER A SUPERSET OF NOR EQUAL TO
 0x228A    &#x228A;             # SUBSET OF WITH NOT EQUAL TO
-0x228A    &#x228A;             # SUBSET OF WITH NOT EQUAL TO
-0x228A    &#x228A;             # SUBSET OF WITH NOT EQUAL TO
-0x228A    &#x228A;             # SUBSET OF WITH NOT EQUAL TO
-0x228B    &#x228B;             # SUPERSET OF WITH NOT EQUAL TO
-0x228B    &#x228B;             # SUPERSET OF WITH NOT EQUAL TO
-0x228B    &#x228B;             # SUPERSET OF WITH NOT EQUAL TO
 0x228B    &#x228B;             # SUPERSET OF WITH NOT EQUAL TO
 0x228E    &#x228E;             # MULTISET UNION
 0x228F    &#x228F;             # SQUARE IMAGE OF
@@ -880,7 +746,6 @@
 0x22A3    &#x22A3;             # LEFT TACK
 0x22A4    &#x22A4;             # DOWN TACK
 0x22A5    &#x22A5;             # UP TACK
-0x22A5    &#x22A5;             # UP TACK
 0x22A7    &#x22A7;             # MODELS
 0x22A8    &#x22A8;             # TRUE
 0x22A9    &#x22A9;             # FORCES
@@ -951,8 +816,6 @@
 0x231E    &#x231E;             # BOTTOM LEFT CORNER
 0x231F    &#x231F;             # BOTTOM RIGHT CORNER
 0x2322    &#x2322;             # FROWN
-0x2322    &#x2322;             # FROWN
-0x2323    &#x2323;             # SMILE
 0x2323    &#x2323;             # SMILE
 0x2329    &#x2329;             # LEFT-POINTING ANGLE BRACKET
 0x232A    &#x232A;             # RIGHT-POINTING ANGLE BRACKET
@@ -1005,7 +868,6 @@
 0x2592    &#x2592;             # MEDIUM SHADE
 0x2593    &#x2593;             # DARK SHADE
 0x25A1    &#x25A1;             # WHITE SQUARE
-0x25A1    &#x25A1;             # WHITE SQUARE
 0x25AA    &#x25AA;             # BLACK SMALL SQUARE
 0x25AD    &#x25AD;             # WHITE RECTANGLE
 0x25AE    &#x25AE;             # BLACK VERTICAL RECTANGLE
@@ -1020,7 +882,6 @@
 0x25C2    &#x25C2;             # BLACK LEFT-POINTING SMALL TRIANGLE
 0x25C3    &#x25C3;             # WHITE LEFT-POINTING SMALL TRIANGLE
 0x25CA    &#x25CA;             # LOZENGE
-0x25CB    &#x25CB;             # WHITE CIRCLE
 0x25CB    &#x25CB;             # WHITE CIRCLE
 0x2605    &#x2605;             # BLACK STAR
 0x2606    &#x2606;             # WHITE STAR


diff -u old/src/lycharse.c ./src/lycharse.c
--- old/src/lycharse.c  Thu Mar  4 02:39:46 1999
+++ ./src/lycharse.c    Fri Mar  5 12:26:16 1999
@@ -825,9 +825,8 @@
 }

 /*
- *  Function to return the UCode_t (long int) value for entity names
- *  in the ISO_Latin1 and UC_entity_info unicode_entities arrays.
- *  It returns 0 if not found. - FM
+ *  Function to return the UCode_t (long int) value for entity names.
+ *  It returns 0 if not found.
  *
  *  unicode_entities[] handles all the names from old style entities[] too.
  *  Lynx now calls unicode_entities[] only through this function:
@@ -841,10 +840,12 @@
 PUBLIC UCode_t HTMLGetEntityUCValue ARGS1(
        CONST char *,   name)
 {
+#include <entities.h>
+
     UCode_t value = 0;
     size_t i, high, low;
     int diff = 0;
-    CONST UC_entity_info * unicode_entities = HTML_dtd.unicode_entity_info;
+    size_t number_of_unicode_entities = 
sizeof(unicode_entities)/sizeof(unicode_entities[0]);

     /*
      * Make sure we have a non-zero length name. - FM
@@ -856,12 +857,12 @@
      * Try UC_entity_info unicode_entities[].
      */
 #ifdef    NOT_ASCII  /* S/390 -- gil -- 1656 */
-    for (i = 0; i < HTML_dtd.number_of_unicode_entities; i++ ) {
+    for (i = 0; i < number_of_unicode_entities; i++ ) {
        /*
        **  Linear search for NOT_ASCII.
        */
 #else  /* NOT_ASCII */
-    for (low = 0, high = HTML_dtd.number_of_unicode_entities;
+    for (low = 0, high = number_of_unicode_entities;
         high > low;
         diff < 0 ? (low = i+1) : (high = i)) {
        /*

diff -u old/src/ucdomap.h ./src/ucdomap.h
--- old/src/ucdomap.h   Thu Mar  4 02:39:46 1999
+++ ./src/ucdomap.h     Tue Mar  9 01:17:24 1999
@@ -77,30 +77,34 @@
    *  from Unicode mechanism).  For now we use the MIME name that describes
    *  what is output to the terminal. - KW
    */
+static CONST struct unimapdesc_str dfont_replacedesc_fallback = {0,NULL,0,1};
+
 #define UC_CHARSET_SETUP_euc_cn UC_Charset_Setup("euc-cn","Chinese",\
-       NULL,NULL,0,(struct unimapdesc_str){0,NULL,0,1},\
+       NULL,NULL,0,dfont_replacedesc_fallback,\
        128,UCT_ENC_CJK,0)
 #define UC_CHARSET_SETUP_euc_jp UC_Charset_Setup("euc-jp","Japanese (EUC-JP)",\
-       NULL,NULL,0,(struct unimapdesc_str){0,NULL,0,1},\
+       NULL,NULL,0,dfont_replacedesc_fallback,\
        128,UCT_ENC_CJK,0)
 #define UC_CHARSET_SETUP_shift_jis UC_Charset_Setup("shift_jis","Japanese 
(Shift_JIS)",\
-       NULL,NULL,0,(struct unimapdesc_str){0,NULL,0,1},\
+       NULL,NULL,0,dfont_replacedesc_fallback,\
        128,UCT_ENC_CJK,0)
 #define UC_CHARSET_SETUP_euc_kr UC_Charset_Setup("euc-kr","Korean",\
-       NULL,NULL,0,(struct unimapdesc_str){0,NULL,0,1},\
+       NULL,NULL,0,dfont_replacedesc_fallback,\
        128,UCT_ENC_CJK,0)
 #define UC_CHARSET_SETUP_big5 UC_Charset_Setup("big5","Taipei (Big5)",\
-       NULL,NULL,0,(struct unimapdesc_str){0,NULL,0,1},\
+       NULL,NULL,0,dfont_replacedesc_fallback,\
        128,UCT_ENC_CJK,0)
   /*
    *  Placeholder for non-translation mode. - FM
    */
 #define UC_CHARSET_SETUP_x_transparent 
UC_Charset_Setup("x-transparent","Transparent",\
-       NULL,NULL,0,(struct unimapdesc_str){0,NULL,0,0},\
+       NULL,NULL,0,dfont_replacedesc_fallback,\
        128,1,0)

+static CONST struct unimapdesc_str dfont_replacedesc_NO_fallback = 
{0,NULL,0,0};
+
 #define UC_CHARSET_SETUP_utf_8 UC_Charset_Setup("utf-8","UNICODE (UTF-8)",\
-       NULL,NULL,0,(struct unimapdesc_str){0,NULL,0,0},\
+       NULL,NULL,0,dfont_replacedesc_NO_fallback,\
        128,UCT_ENC_UTF8,0)



diff -u old/www/library/implementation/htmldtd.c 
./www/library/implementation/htmldtd.c
--- old/www/library/implementation/htmldtd.c    Mon Jan 18 04:29:20 1999
+++ ./www/library/implementation/htmldtd.c      Tue Mar  9 01:00:24 1999
@@ -9,12 +9,19 @@
 #include <HTMLDTD.h>
 #include <LYLeaks.h>

+/*
+ *     Character entities like &nbsp now excluded from our DTD tables,
+ *     they are mapped to Unicode and handled by chartrans code directly
+ *     the similar way the numeric entities like &#123 does.
+ *     See  src/chrtrans/entities.h  for real mapping.
+ */
+
 /*     Entity Names
 **     ------------
 **
 **     This table must be matched exactly with ALL the translation tables
-**             (this is an obsolete translation mechanism,
-**             currently replaced with unicode chartrans in most cases...)
+**             (this is an obsolete translation mechanism, probably unused,
+**             currently replaced with Unicode chartrans in most cases...)
 */
 static CONST char* entities[] = {
   "AElig",     /* capital AE diphthong (ligature) */
@@ -131,9 +138,6 @@
   "yuml",      /* small y, dieresis or umlaut mark */
 };

-#define HTML_ENTITIES 112
-
-#include <entities.h>

 /*             Attribute Lists
 **             ---------------
@@ -1591,10 +1595,8 @@
 PUBLIC CONST SGML_dtd HTML_dtd = {
        tags,
        HTML_ELEMENTS,
-       entities,
+       entities, /* probably unused */
        sizeof(entities)/sizeof(entities[0]),
-       unicode_entities,
-       sizeof(unicode_entities)/sizeof(unicode_entities[0])
 };

 /* This function fills the "tags" part of the HTML_dtd structure with
diff -u old/www/library/implementation/htplain.c 
./www/library/implementation/htplain.c
--- old/www/library/implementation/htplain.c    Thu Mar  4 02:39:46 1999
+++ ./www/library/implementation/htplain.c      Fri Mar  5 14:29:38 1999
@@ -424,7 +424,7 @@

        /*
        **  If CJK mode is on, we'll assume the document matches
-       **  the user's selected character set, and if not, the
+       **  the user's display character set, and if not, the
        **  user should toggle off raw/CJK mode to reload. - FM
        */
        if (HTCJK != NOCJK) {
diff -u old/www/library/implementation/sgml.c 
./www/library/implementation/sgml.c
--- old/www/library/implementation/sgml.c       Thu Mar  4 02:39:46 1999
+++ ./www/library/implementation/sgml.c Tue Mar  9 00:44:28 1999
@@ -704,7 +704,7 @@
        }

        if (stackpos == 0 && old_tag->contents != SGML_EMPTY) {
-           CTRACE(tfp, "SGML: Still open %s, no open %s for </%s>\n",
+           CTRACE(tfp, "SGML: Still open %s, ***no open %s for </%s>\n",
                        context->element_stack ?
                        context->element_stack->tag->name : "none",
                        old_tag->name,
@@ -844,7 +844,7 @@
            for (; i< new_tag->number_of_attributes && !has_attributes; i++)
                has_attributes = context->present[i];
            if (!has_attributes) {
-               CTRACE(tfp, "SGML: Still open %s, converting invalid <%s> to 
</%s>\n",
+               CTRACE(tfp, "SGML: Still open %s, ***converting invalid <%s> to 
</%s>\n",
                            context->element_stack->tag->name,
                            new_tag->name,
                            new_tag->name);
diff -u old/www/library/implementation/sgml.h 
./www/library/implementation/sgml.h
--- old/www/library/implementation/sgml.h       Mon Feb  8 02:33:00 1999
+++ ./www/library/implementation/sgml.h Tue Mar  9 00:58:58 1999
@@ -2,17 +2,14 @@
                                SGML AND STRUCTURED STREAMS

    The SGML parser is a state machine.  It is called for every character
-
    of the input stream.  The DTD data structure contains pointers
-
    to functions which are called to implement the actual effect of the
-
    text read. When these functions are called, the attribute structures 
pointed to by the
    DTD are valid, and the function is passed a pointer to the current tag 
structure, and an
    "element stack" which represents the state of nesting within SGML elements.

-   The following aspects are from Dan Connolly's suggestions:  Binary search, 
Structured
-   object scheme basically, SGML content enum type.
+   The following aspects are from Dan Connolly's suggestions:  Binary search,
+   Structured object scheme basically, SGML content enum type.

    (c) Copyright CERN 1991 - See Copyright.html

@@ -130,23 +127,13 @@
 **  Not the whole DTD, but all this parser uses of it.
 */
 typedef struct {
-    char* name;
-    long code;
-} UC_entity_info;
-
-typedef struct {
     HTTag *             tags;           /* Must be in strcmp order by name */
     int                 number_of_tags;
     CONST char **       entity_names;   /* Must be in strcmp order by name */
     size_t              number_of_entities;
-    CONST UC_entity_info * unicode_entity_info; /* strcmp order by name */
-    size_t              number_of_unicode_entities;
-                       /*
-                       **  All calls to unicode_entities table should be done
-                       **  through HTMLGetEntityUCValue (LYCharSets.c) only.
-                       **  unicode_entities table now holds *all*
-                       **  old-style entities too.
-                       */
+                               /*  "entity_names" table probably unused,
+                               **  see comments in HTMLDTD.c near the top
+                               */
 } SGML_dtd;



diff -u old/src/lyutils.c ./src/lyutils.c
--- old/src/lyutils.c   Thu Mar  4 02:39:46 1999
+++ ./src/lyutils.c     Wed Mar 10 03:49:18 1999
@@ -2145,10 +2145,12 @@
         return((int)FALSE);
 #endif /* USE_SLANG */

-    /** Keyboard 'Z' or 'z', or Control-G or Control-C **/
 #if defined (DOSPATH) && defined (NCURSES)
     nodelay(stdscr,TRUE);
 #endif /* DOSPATH */
+    /*
+     * 'c' contains whatever character we're able to read from keyboard
+     */
     c = LYgetch();
 #if defined (DOSPATH) && defined (NCURSES)
     nodelay(stdscr,FALSE);
@@ -2175,24 +2177,38 @@
        return((int)TRUE);
     }

-    /** Keyboard 'Z' or 'z', or Control-G or Control-C **/
+    /*
+     * 'c' contains whatever character we're able to read from keyboard
+     */
     c = typeahead();

 #endif /* !VMS */

     /*
-     * 'c' contains whatever character we're able to read from type-ahead
+     * 'c' contains whatever character we're able to read from keyboard
      */
+
+       /** Keyboard 'Z' or 'z', or Control-G or Control-C **/
     if (TOUPPER(c) == 'Z' || c == 7 || c == 3)
        return((int)TRUE);
-#ifdef DISP_PARTIAL
-    else if (display_partial && (NumOfLines_partial > 2))
-    /* OK, we got several lines from new document and want to scroll... */
-    {
+
        /* There is a subset of mainloop() actions available at this stage:
        ** no new getfile() cyrcle possible until the previous finished.
-       ** Currently we have scrolling and toggling of trace log here.
+       ** Currently we have scrolling in partial mode and toggling of trace 
log.
        */
+    switch (keymap[c+1])
+    {
+       case LYK_TRACE_TOGGLE :         /*  Toggle TRACE mode. */
+           WWW_TraceFlag = ! WWW_TraceFlag;
+           if (LYOpenTraceLog())
+               HTUserMsg(WWW_TraceFlag ? TRACE_ON : TRACE_OFF);
+           break ;
+       default :
+
+#ifdef DISP_PARTIAL
+      if (display_partial && (NumOfLines_partial > 2))
+      /* OK, we got several lines from new document and want to scroll... */
+      {

        int res;
        switch (keymap[c+1])
@@ -2258,22 +2274,21 @@
            break;
        case LYK_REFRESH :
            break ;
-       case LYK_TRACE_TOGGLE:  /*  Toggle TRACE mode. */
-           WWW_TraceFlag = ! WWW_TraceFlag;
-           if (LYOpenTraceLog())
-               HTUserMsg(WWW_TraceFlag ? TRACE_ON : TRACE_OFF);
-           break;
        default :
+           /** Other or no keystrokes **/
            return ((int)FALSE) ;
-       }
+       } /* end switch */
        if (Newline_partial < 1)
            Newline_partial = 1;
        NumOfLines_partial = HText_getNumOfLines();
        HText_pageDisplay(Newline_partial, "");
-    }
+
+      }
 #endif /* DISP_PARTIAL */
-    /** Other or no keystrokes **/
-    return((int)FALSE);
+
+           /** Other or no keystrokes **/
+           return((int)FALSE);
+    } /* end switch */
 }

 /*
[Prev in Thread]
Current Thread
[Next in Thread]
Re: lynx-dev lynx2.8.2dev.18, (continued)
Prev by Date: lynx-dev Problem with Lynx and http://www.clicktv.com
Next by Date: Re: lynx-dev lynx2.8.2dev.19 patch #5 (long - entities and more...)
Previous by thread: Re: lynx-dev lynx2.8.2dev.19 patch #4 (UNSET_ARG from dev17)
Next by thread: Re: lynx-dev lynx2.8.2dev.19 patch #6 (em dash = --)
Index(es):
- Date
- Thread