lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lynx-dev] ISO-8859-8-I


From: Owen Leibman
Subject: [Lynx-dev] ISO-8859-8-I
Date: Tue, 21 Feb 2012 13:07:56 -0800 (PST)

The W3C recommends (see 
http://www.w3.org/TR/html4/struct/dirlang.html#bidi88598) the use
of character set ISO-8859-8-I rather than ISO-8859. Although Lynx does 
recognize ISO-8859-8 as a valid encoding,
it does not recognize the character set ISO-8859-8-I (nor ISO-8859-8-E),
and is treating the encoding as ISO-8859-8-1 if so specified.
This is true whether the character set is specified in a meta tag (using either 
Content-type or Charset),
or in an http header. Test pages to demonstrate the problem are at:
http://www.dayenu.com/lieberman.iso88598i.htm (8859-8-i handled incorrectly)
http://www.dayenu.com/lieberman.iso88598.htm  (8859-8   handled   correctly)

Although there is code to recognize the 2 encodings in LYCharSets.c, that code 
seems ineffective
in recognizing the character set however the site specifies it. On the other 
hand, it seems sufficient, in all cases,
to modify UCdomap.c to treat ISO-8859-8-I and ISO-8859-8-E as aliases of 
ISO-8859-8.
A diff to accomplish this follows:

--- src/UCdomap.c.orig  2012-02-21 05:11:03.519199979 -0800
+++ src/UCdomap.c       2012-02-21 05:14:10.120125290 -0800
@@ -1559,6 +1559,10 @@ int UCGetLYhndl_byMIME(const char *value
     if (!strncasecomp(value, "iso", 3) && !StrNCmp(value + 3, "8859", 4)) {
        return getLYhndl_byCP("iso-", value + 3);
     }
+    if (!strcasecomp(value, "iso-8859-8-i") ||
+       !strcasecomp(value, "iso-8859-8-e")) {
+       return UCGetLYhndl_byMIME("iso-8859-8");
+    }
 #if !NO_CHARSET_euc_jp
     if (!strcasecomp(value, "x-euc-jp") ||
        !strcasecomp(value, "eucjp")) {



reply via email to

[Prev in Thread] Current Thread [Next in Thread]