lynx-dev changes for Japanese (was: Re: reading SJIS docs)

lynx-dev
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
lynx-dev changes for Japanese (was: Re: reading SJIS docs)

From:	Hataguchi Takeshi
Subject:	lynx-dev changes for Japanese (was: Re: reading SJIS docs)
Date:	Thu, 6 Jan 2000 01:30:34 +0900 (JST)
I wrote a dev17's patch for Japanese. These are the main changes.

(1) If Japanese document's charset is specified explicitely
    by the MIME tag or HTTP responses, Lynx will assume the charset 
    as it.
(2) Change the Japanese charset detection strategy when chaset 
    isn't specified explicitely.

Please test these changes with CJK_EX and if necessary, with
USE_TH_JP_AUTO_DETECT.

# This patch doesn't care ASSUME_CHARSET. I will probably try it
# near the future. Please give me time.

(1):
Until now Lynx forgot the specified kanji code when the first ' ' 
(, '\n,' and '\r') was found. Applying this patch Lynx never forget 
it holding in the variable specified_kcode.

So if specified kanji code is correct (I hope so), Lynx will process 
Japanese completely. Not only Shift_JIS and EUC-JP but also
x-sjis and x-euc-jp are allowd as charset.

Examples:
    metaEUC.html, metaEUC2.html, metaSJIS.html, metaSJIS2.html
        These are specifying charset in META tag and Lynx processes
        well entirely. The old one doesn't.

    nometaEUC.html, nometaSJIS.html
        These aren't specifying charset and Lynx doesn't processe
        well entirely. The old one also doesn't.

(2):
I added a new Japanese charset detection routine.
This is enabled when the macro USE_TH_JP_AUTO_DETECT is defined.

By old detection strategy, Lynx always thought the document may be
written in mixed three kanji codes (JIS, EUC and SJIS).
But by new one, Lynx first assume the document is written in one or 
JIS + another kanji code (JIS, EUC, SJIS, EUC+JIS and SJIS+JIS).
When it's found the assumption is wrong, Lynx thinks it's 
written in mixed three kanji codes.
The first assumption is usually correct.
So I believe this makes Lynx is a better guesser.

Examples:
    EUC.html, SJIS.html
                They include halfwidth kana. Old Lynx fails to guess their 
        kanji code. New one process perfectly.

Note:
The detection will sometimes fail. Try nometaEUC.html and 
nometaSJIS.html.


(?) other things:
Broken iso-2022 code's support.
Some search engine will make a page including broken iso-2022 code.

Examples:
    broken_jis.html
        Try with Display charset as Japanese (EUC or SJIS).
--
Takeshi Hataguchi
E-mail: address@hidden

%%% Created Tue Jan  4 23:08:21 JST 2000 by target lynx.patch. %%%
diff -bru orig/lynx2-8-3/WWW/Library/Implementation/HTCJK.h 
lynx2-8-3/WWW/Library/Implementation/HTCJK.h
--- orig/lynx2-8-3/WWW/Library/Implementation/HTCJK.h   Sat Jul 31 00:39:54 1999
+++ lynx2-8-3/WWW/Library/Implementation/HTCJK.h        Tue Jan  4 21:12:40 2000
@@ -37,11 +37,21 @@
 #define IS_SJIS_HI1(hi) ((0x81<=hi)&&(hi<=0x9F))       /* 1st lev. */
 #define IS_SJIS_HI2(hi) ((0xE0<=hi)&&(hi<=0xEF))       /* 2nd lev. */
 #define IS_SJIS(hi,lo,in_sjis) 
(!IS_SJIS_LO(lo)?0:IS_SJIS_HI1(hi)?(in_sjis=1):in_sjis&&IS_SJIS_HI2(hi))
+#define IS_SJIS_2BYTE(hi,lo) 
(IS_SJIS_LO(lo)&&(IS_SJIS_HI1(hi)||IS_SJIS_HI2(hi)))
+#define IS_SJIS_HWKANA(lo) ((0xA1<=lo)&&(lo<=0xDF))
 
+#if 0 /* IS_EUC_LOS isn't used because we are interested only in EUC-JP's
+       * code set 0 to 2 now. -- TH
+       * ref: http://www.isi.edu/in-notes/iana/assignments/character-sets
+       */
 #define IS_EUC_LOS(lo) ((0x21<=lo)&&(lo<=0x7E))        /* standard */
+#endif
 #define IS_EUC_LOX(lo) ((0xA1<=lo)&&(lo<=0xFE))        /* extended */
 #define IS_EUC_HI(hi)  ((0xA1<=hi)&&(hi<=0xFE))
-#define IS_EUC(hi,lo) (IS_EUC_HI(hi) && (IS_EUC_LOS(lo) || IS_EUC_LOX(lo)))
+#define IS_EUC_HWKANA(hi,lo) ((hi==0x8E)&&(0xA1<=lo)&&(lo<=0xDF))
+#define IS_EUC(hi,lo) ((IS_EUC_HI(hi) && IS_EUC_LOX(lo))||IS_EUC_HWKANA(hi,lo))
+
+#define IS_JAPANESE_2BYTE(hi,lo) (IS_SJIS_2BYTE(hi,lo) || IS_EUC(hi,lo))
 
 #define IS_BIG5_LOS(lo)        ((0x40<=lo)&&(lo<=0x7E))        /* standard */
 #define IS_BIG5_LOX(lo)        ((0xA1<=lo)&&(lo<=0xFE))        /* extended */
diff -bru orig/lynx2-8-3/WWW/Library/Implementation/SGML.c 
lynx2-8-3/WWW/Library/Implementation/SGML.c
--- orig/lynx2-8-3/WWW/Library/Implementation/SGML.c    Thu Nov  4 11:41:38 1999
+++ lynx2-8-3/WWW/Library/Implementation/SGML.c Tue Jan  4 21:16:56 2000
@@ -157,6 +157,7 @@
                S_esc_dq, S_dollar_dq, S_paren_dq, S_nonascii_text_dq,
                S_dollar_paren_dq,
                S_in_kanji, S_junk_tag, S_junk_pi} state;
+    unsigned char kanji_buf;
 #ifdef CALLERDATA
     void *                     callerData;
 #endif /* CALLERDATA */
@@ -1690,6 +1691,9 @@
        HTCJK == NOCJK)
        goto after_switch;
 
+#if 0  /* This halfwidth kana to fullwidth conversion is/should be
+        * done in the HTextAppendCharacter. -- TH
+        */
 #ifdef CJK_EX  /* 1998/11/24 (Tue) 17:02:31 */
     if (HTCJK == JAPANESE && last_kcode == SJIS) {
        if (sjis_1st == '\0' && (IS_SJIS_HI1(c) || IS_SJIS_HI2(c))) {
@@ -1708,6 +1712,7 @@
        }
     }
 #endif
+#endif
 
     /*
     ** Ignore 127 if we don't have HTPassHighCtrlRaw
@@ -1727,6 +1732,26 @@
        !(PASSHICTRL || HTCJK != NOCJK))
        goto after_switch;
 
+    /* Almost all CJK characters are double byte but only Japanese
+     * halfwidth kana is single byte. To prevent to fail SGML parsing
+     * we have to care halfwidth kana here. -- TH
+     */
+    if ((HTCJK==JAPANESE) && (context->state==S_in_kanji) &&
+       !IS_JAPANESE_2BYTE(context->kanji_buf,(unsigned char)c)) {
+#if CJK_EX
+       if (IS_SJIS_HWKANA(context->kanji_buf)) {
+           JISx0201TO0208_SJIS(context->kanji_buf, &sjis_hi, &sjis_lo);
+           PUTC(sjis_hi);
+           PUTC(sjis_lo);
+       }
+       else
+           PUTC('=');
+#else
+       PUTC('=');
+#endif
+       context->state = S_text;
+    }
+
     /*
     ** Handle character based on context->state.
     */
@@ -1744,6 +1769,7 @@
        **  (see below). - FM
        */
        context->state = S_text;
+       PUTC(context->kanji_buf);
        PUTC(c);
        break;
 
@@ -1772,7 +1798,7 @@
            **  to having raw mode off with CJK. - FM
            */
            context->state = S_in_kanji;
-           PUTC(c);
+           context->kanji_buf = c;
            break;
        } else if (HTCJK != NOCJK && TOASCII(c) == '\033') {  /* S/390 -- gil 
-- 0881 */
            /*
@@ -4075,6 +4101,8 @@
            context->state = S_esc;
        }
        PUTC(c);
+       if (c < 32)
+           context->state = S_text;
        break;
 
     case S_esc_sq:     /* Expecting '$'or '(' following CJK ESC. */
@@ -4361,6 +4389,7 @@
 /*    context->extra_tags = dtd->tags + dtd->number_of_tags; */
     context->current_tag = context->slashedtag = NULL;
     context->state = S_text;
+    context->kanji_buf = '\0';
     context->element_stack = 0;                        /* empty */
     context->inSELECT = FALSE;
     context->no_lynx_specialcodes = NO;        /* special codes normally 
generated */
diff -bru orig/lynx2-8-3/src/GridText.c lynx2-8-3/src/GridText.c
--- orig/lynx2-8-3/src/GridText.c       Tue Dec 28 20:13:52 1999
+++ lynx2-8-3/src/GridText.c    Tue Jan  4 22:12:22 2000
@@ -385,6 +385,16 @@
        STable_info *           stbl;
 
        HTkcode                 kcode;                  /* Kanji code? */
+       HTkcode                 specified_kcode;        /* Specified Kanji code 
*/
+#ifdef USE_TH_JP_AUTO_DETECT
+       enum _detected_kcode  { DET_SJIS, DET_EUC, DET_NOTYET, DET_MIXED } 
+                               detected_kcode;         /* Detected Kanji code 
*/
+       enum _SJIS_status     { SJIS_state_neutral, SJIS_state_in_kanji, 
+                               SJIS_state_has_bad_code } SJIS_status;
+       enum _EUC_status      { EUC_state_neutral, EUC_state_in_kanji, 
+                               EUC_state_in_kana, EUC_state_has_bad_code } 
+                               EUC_status;
+#endif
        enum grid_state       { S_text, S_esc, S_dollar, S_paren,
                                S_nonascii_text, S_dollar_paren,
                                S_jisx0201_text }
@@ -823,6 +833,12 @@
     HTMainAnchor = anchor;
     self->display_on_the_fly = 0;
     self->kcode = NOKANJI;
+    self->specified_kcode = NOKANJI;
+#ifdef USE_TH_JP_AUTO_DETECT
+    self->detected_kcode = DET_NOTYET;
+    self->SJIS_status = SJIS_state_neutral;
+    self->EUC_status = EUC_state_neutral;
+#endif
     self->state = S_text;
     self->kanji_buf = '\0';
     self->in_sjis = 0;
@@ -3471,6 +3487,100 @@
        text->halted = 3;
        return;
     }
+#ifdef USE_TH_JP_AUTO_DETECT
+    if ((HTCJK == JAPANESE) && (text->detected_kcode != DET_MIXED) &&
+       (text->specified_kcode != SJIS) && (text->specified_kcode != EUC)) {
+       unsigned char c;
+       int save_d_kcode;
+
+       c = ch;
+       save_d_kcode = text->detected_kcode;
+       switch (text->SJIS_status) {
+       case SJIS_state_has_bad_code:
+           break;
+       case SJIS_state_neutral:
+           if (IS_SJIS_HI1(c) || IS_SJIS_HI2(c)) {
+               text->SJIS_status = SJIS_state_in_kanji;
+           }
+           else if ((c & 0x80) && !IS_SJIS_HWKANA(c)) {
+               text->SJIS_status = SJIS_state_has_bad_code;
+               if (text->EUC_status == EUC_state_has_bad_code)
+                   text->detected_kcode = DET_MIXED;
+               else
+                   text->detected_kcode = DET_EUC;
+           }
+           break;
+       case SJIS_state_in_kanji:
+           if (IS_SJIS_LO(c)) {
+               text->SJIS_status = SJIS_state_neutral;
+           }
+           else {
+               text->SJIS_status = SJIS_state_has_bad_code;
+               if (text->EUC_status == EUC_state_has_bad_code)
+                   text->detected_kcode = DET_MIXED;
+               else
+                   text->detected_kcode = DET_EUC;
+           }
+           break;
+       }
+       switch (text->EUC_status) {
+       case EUC_state_has_bad_code:
+           break;
+       case EUC_state_neutral:
+           if (IS_EUC_HI(c)) {
+               text->EUC_status = EUC_state_in_kanji;
+           }
+           else if (c == 0x8e) {
+               text->EUC_status = EUC_state_in_kana;
+           }
+           else if (c & 0x80) {
+               text->EUC_status = EUC_state_has_bad_code;
+               if (text->SJIS_status == SJIS_state_has_bad_code)
+                   text->detected_kcode = DET_MIXED;
+               else
+                   text->detected_kcode = DET_SJIS;
+           }
+           break;
+       case EUC_state_in_kanji:
+           if (IS_EUC_LOX(c)) {
+               text->EUC_status = EUC_state_neutral;
+           }
+           else {
+               text->EUC_status = EUC_state_has_bad_code;
+               if (text->SJIS_status == SJIS_state_has_bad_code)
+                   text->detected_kcode = DET_MIXED;
+               else
+                   text->detected_kcode = DET_SJIS;
+           }
+           break;
+       case EUC_state_in_kana:
+           if ((0xA1<=c)&&(c<=0xDF)) {
+               text->EUC_status = EUC_state_neutral;
+           }
+           else {
+               text->EUC_status = EUC_state_has_bad_code;
+               if (text->SJIS_status == SJIS_state_has_bad_code)
+                   text->detected_kcode = DET_MIXED;
+               else
+                   text->detected_kcode = DET_SJIS;
+           }
+           break;
+       }
+       if (save_d_kcode != text->detected_kcode) {
+           switch (text->detected_kcode) {
+           case DET_SJIS:
+               CTRACE((tfp, "TH_JP_AUTO_DETECT: This document's kcode seems 
SJIS.\n"));
+               break;
+           case DET_EUC:
+               CTRACE((tfp, "TH_JP_AUTO_DETECT: This document's kcode seems 
EUC.\n"));
+               break;
+           case DET_MIXED:
+               CTRACE((tfp, "TH_JP_AUTO_DETECT: This document's kcode seems 
mixed!\n"));
+               break;
+           }
+       }
+    }
+#endif /* USE_TH_JP_AUTO_DETECT */
     /*
      *  Make sure we don't hang on escape sequences.
      */
@@ -3540,6 +3650,8 @@
                 */
                if (ch == '@' || ch == 'B' || ch=='A') {
                    text->state = S_nonascii_text;
+                   if (ch == '@' || ch == 'B')
+                       text->kcode = JIS;
                    return;
                } else if (ch == '(') {
                    text->state = S_dollar_paren;
@@ -3578,6 +3690,7 @@
                     *  Can split here. - FM
                     */
                    text->permissible_split = text->last_line->size;
+                   text->kcode = JIS;
                    return;
                } else {
                    text->state = S_text;
@@ -3591,7 +3704,16 @@
                if (ch == CH_ESC) {  /* S/390 -- gil -- 1553 */
                    text->state = S_esc;
                    text->kanji_buf = '\0';
+                   if (HTCJK == JAPANESE) {
+                       text->kcode = NOKANJI;
+                   }
                    return;
+               } else if ((0 <= ch) && (ch < 32)) {
+                   text->state = S_text;
+                   text->kanji_buf = '\0';
+                   if (HTCJK == JAPANESE) {
+                       text->kcode = NOKANJI;
+                   }
                } else {
                    ch |= 0200;
                }
@@ -3604,6 +3726,7 @@
                if (ch == CH_ESC) {  /* S/390 -- gil -- 1570 */
                    text->state = S_esc;
                    text->kanji_buf = '\0';
+                   text->kcode = NOKANJI;
                    return;
                } else {
                    text->kanji_buf = '\216';
@@ -3627,7 +3750,15 @@
                /*
                 *  JIS X0201 Kana in SJIS support. - by ASATAKU
                 */
+#ifdef CJK_EX
+               if (((text->kcode == SJIS) || (last_kcode == SJIS) ||
+#ifdef USE_TH_JP_AUTO_DETECT
+                    (text->detected_kcode == DET_SJIS) ||
+#endif
+                    ((text->kcode == NOKANJI) && (text->specified_kcode == 
SJIS))) &&
+#else
                if ((text->kcode == SJIS) &&
+#endif
                    ((unsigned char)ch >= 0xA1) &&
                    ((unsigned char)ch <= 0xDF))
                {
@@ -4057,6 +4188,57 @@
            lo = (unsigned char)ch;
 
            if (HTCJK == JAPANESE) {
+               if (text->kcode != JIS) {
+                   if (text->specified_kcode == EUC) {
+                       if (IS_EUC(hi, lo))
+                           text->kcode = EUC;
+                       else if (IS_SJIS_2BYTE(hi, lo)) {
+                           text->kcode = SJIS;
+                           CTRACE((tfp, "Specified_kcode is EUC, "));
+                           CTRACE((tfp, "but this character (%X:%X) seems 
SJIS\n", hi, lo));
+                       }
+                       else {
+                           hi = lo = '=';
+                           CTRACE((tfp, "Specified_kcode is EUC, "));
+                           CTRACE((tfp, "but this character (%X:%X) doesn't 
seem EUC\n", hi, lo));
+                       }
+                   }
+                   else if (text->specified_kcode == SJIS) {
+                       if (IS_SJIS_2BYTE(hi, lo))
+                           text->kcode = SJIS;
+                       else if (IS_EUC(hi, lo)) {
+                           text->kcode = EUC;
+                           CTRACE((tfp, "Specified_kcode is SJIS, "));
+                           CTRACE((tfp, "but this character (%X:%X) seems 
EUC\n", hi, lo));
+                       }
+                       else {
+                           hi = lo = '=';
+                           CTRACE((tfp, "Specified_kcode is SJIS, "));
+                           CTRACE((tfp, "but this character (%X:%X) doesn't 
seem SJIS\n", hi, lo));
+                       }
+                   }
+                   else {
+                       if (IS_EUC(hi, lo) && ! IS_SJIS_2BYTE(hi, lo)) {
+                           text->kcode = EUC;
+                       }
+                       else if (!IS_EUC(hi, lo) && IS_SJIS_2BYTE(hi, lo)) {
+                           text->kcode = SJIS;
+                       }
+#ifdef USE_TH_JP_AUTO_DETECT
+                       else if (text->detected_kcode == DET_EUC) {
+                           text->kcode = EUC;
+                       }
+                       else if (text->detected_kcode == DET_SJIS) {
+                           text->kcode = SJIS;
+                       }
+#endif
+                       else if (IS_EUC_HWKANA(hi, lo) && (text->kcode != EUC)) 
{
+                           text->kcode = SJIS;
+                       }
+                   }
+               }
+               /* This judgement routine is replaced by above one. -- TH */
+#if 0
                if (text->kcode == NOKANJI)
                {
                    if (IS_SJIS(hi, lo, text->in_sjis) && IS_EUC(hi, lo)) {
@@ -4067,6 +4249,7 @@
                        text->kcode = EUC;
                    }
                }
+#endif
 
                switch (kanji_code) {
                case EUC:
@@ -4074,7 +4257,7 @@
                        SJIS_TO_EUC1(hi, lo, tmp);
                        line->data[line->size++] = tmp[0];
                        line->data[line->size++] = tmp[1];
-                   } else if (text->kcode == EUC) {
+                   } else if (IS_EUC(hi, lo)) {
                        JISx0201TO0208_EUC(hi, lo, &hi, &lo);
                        line->data[line->size++] = hi;
                        line->data[line->size++] = lo;
@@ -4082,7 +4265,8 @@
                    break;
 
                case SJIS:
-                   if (last_kcode != SJIS && text->kcode == EUC)
+                   if (last_kcode != SJIS && 
+                       ((text->kcode == EUC) || (text->kcode == JIS)))
                    {
                        EUC_TO_SJIS1(hi, lo, tmp);
                        line->data[line->size++] = tmp[0];
@@ -4095,7 +4279,7 @@
                                hi = '=';
                                lo = '=';
                            } else if (hi == 0x8e) {
-                               text->kcode = NOKANJI;
+                               text->kcode = EUC;
                                JISx0201TO0208_EUC(hi, lo, &hi, &lo);
                                EUC_TO_SJIS1(hi, lo, tmp);
                                hi = tmp[0];
@@ -11151,6 +11335,8 @@
        CONST char *,   charset,
        LYUCcharset *,  p_in)
 {
+    BOOL explicit;
+
     if (!text)
        return;
 
@@ -11160,6 +11346,7 @@
     if (!charset && !p_in) {
        return;
     }
+    explicit = charset ? TRUE : FALSE;
     /*
     **  If no explicit charset string, use the implied one. - kw
     */
@@ -11188,8 +11375,10 @@
               !strcmp(charset, "x-euc") ||     /* 1997/11/28 (Fri) 18:11:24 */
               !strcmp(charset, "euc-jp") ||
               !strncmp(charset, "x-euc-", 6) ||
+#if 0 /* iso-2022-jp* shouldn't be treated as euc-jp */
               !strcmp(charset, "iso-2022-jp") ||
               !strcmp(charset, "iso-2022-jp-2") ||
+#endif
               !strcmp(charset, "euc-kr") ||
               !strcmp(charset, "iso-2022-kr") ||
               !strcmp(charset, "big5") ||
@@ -11214,6 +11403,7 @@
                HTCJK = NOCJK;
        }
     }
+    text->specified_kcode = explicit ? text->kcode : NOKANJI;
 
     return;
 }
@@ -13641,6 +13831,19 @@
        HTkcode,        kcode)
 {
     text->kcode = kcode;
+}
+
+PUBLIC HTkcode HText_getSpecifiedKcode ARGS1(
+       HText *,        text)
+{
+    return text->specified_kcode;
+}
+
+PUBLIC void HText_updateSpecifiedKcode ARGS2(
+       HText *,        text,
+       HTkcode,        kcode)
+{
+    text->specified_kcode = kcode;
 }
 #endif
 
diff -bru orig/lynx2-8-3/src/GridText.h lynx2-8-3/src/GridText.h
--- orig/lynx2-8-3/src/GridText.h       Wed Dec  1 12:33:02 1999
+++ lynx2-8-3/src/GridText.h    Tue Jan  4 22:59:24 2000
@@ -317,6 +317,8 @@
 
 extern HTkcode HText_getKcode PARAMS((HText * text));
 extern void HText_updateKcode PARAMS((HText * text, HTkcode kcode));
+extern HTkcode HText_getSpecifiedKcode PARAMS((HText * text));
+extern void HText_updateSpecifiedKcode PARAMS((HText * text, HTkcode kcode));
 
 #endif
 
diff -bru orig/lynx2-8-3/src/HTML.c lynx2-8-3/src/HTML.c
--- orig/lynx2-8-3/src/HTML.c   Tue Dec 28 20:13:52 1999
+++ lynx2-8-3/src/HTML.c        Mon Jan  3 15:51:20 2000
@@ -4966,6 +4966,7 @@
            BOOL IsSubmitOrReset = FALSE;
 #ifdef CJK_EX
            HTkcode kcode = 0;
+           HTkcode specified_kcode = 0;
 #endif
            /* init */
            I.align=NULL; I.accept=NULL; I.checked=NO; I.class=NULL;
@@ -5401,6 +5402,8 @@
                if (HTCJK != NOCJK) {
                    kcode = HText_getKcode(me->text);
                    HText_updateKcode(me->text, kanji_code);
+                   specified_kcode = HText_getSpecifiedKcode(me->text);
+                   HText_updateSpecifiedKcode(me->text, kanji_code);
                }
 #endif
                if (me->sp[0].tag_number == HTML_PRE ||
@@ -5443,8 +5446,10 @@
                        HTML_put_character(me, HT_NON_BREAK_SPACE);
                }
 #ifdef CJK_EX
-               if (HTCJK != NOCJK)
+               if (HTCJK != NOCJK) {
                    HText_updateKcode(me->text, kcode);
+                   HText_updateSpecifiedKcode(me->text, specified_kcode);
+               }
 #endif
            }
            HText_setIgnoreExcess(me->text, FALSE);
@@ -7551,8 +7556,25 @@
                for (; ptr && *ptr != '\0'; ptr++) {
                    if (*ptr == ' ')
                        HText_appendCharacter(me->text,HT_NON_BREAK_SPACE);
-                   else
+                   else {
+#ifdef CJK_EX
+                       HTkcode kcode = 0;
+                       HTkcode specified_kcode = 0;
+                       if (HTCJK != NOCJK) {
+                           kcode = HText_getKcode(me->text);
+                           HText_updateKcode(me->text, kanji_code);
+                           specified_kcode = HText_getSpecifiedKcode(me->text);
+                           HText_updateSpecifiedKcode(me->text, kanji_code);
+                       }
+#endif
                        HText_appendCharacter(me->text,*ptr);
+#ifdef CJK_EX
+                       if (HTCJK != NOCJK) {
+                           HText_updateKcode(me->text, kcode);
+                           HText_updateSpecifiedKcode(me->text, 
specified_kcode);
+                       }
+#endif
+                   }
                }
                /*
                 *  Add end option character.
@@ -8810,19 +8832,19 @@
 #ifdef SH_EX   /* 1998/04/02 (Thu) 16:02:00 */
 
     /* for proxy server 1998/12/19 (Sat) 11:53:30 */
-    if (stricmp(newtitle + 1, "internal-gopher-menu") == 0) {
+    if (AS_casecomp(newtitle + 1, "internal-gopher-menu") == 0) {
        StrAllocCopy(newtitle, "+");
-    } else if (stricmp(newtitle + 1, "internal-gopher-unknown") == 0) {
+    } else if (AS_casecomp(newtitle + 1, "internal-gopher-unknown") == 0) {
        StrAllocCopy(newtitle, " ");
     } else {
        /* normal title */
        ptr = strrchr(newtitle, '.');
        if (ptr) {
-         if (stricmp(ptr, ".gif") == 0)
+         if (AS_casecomp(ptr, ".gif") == 0)
            *ptr = '\0';
-         else if (stricmp(ptr, ".jpg") == 0)
+         else if (AS_casecomp(ptr, ".jpg") == 0)
            *ptr = '\0';
-         else if (stricmp(ptr, ".jpeg") == 0)
+         else if (AS_casecomp(ptr, ".jpeg") == 0)
            *ptr = '\0';
        }
        StrAllocCat(newtitle, "]");
diff -bru orig/lynx2-8-3/src/LYMail.c lynx2-8-3/src/LYMail.c
--- orig/lynx2-8-3/src/LYMail.c Wed Sep 29 20:40:38 1999
+++ lynx2-8-3/src/LYMail.c      Tue Dec 28 21:01:36 1999
@@ -2048,7 +2048,7 @@
     while ((n = fread(buf, 1, sizeof(buf), fd)) != 0) {
        fwrite(buf, 1, n, fp);
     }
-#if defined(DOSPATH) || defined(SH_EX)
+#if defined(DOSPATH) || (defined(SH_EX) && defined(WIN_EX))
 #ifdef SH_EX   /* 1998/05/04 (Mon) 22:40:35 */
     if (mail_is_blat) {
        StrAllocCopy(command,
diff -bru orig/lynx2-8-3/src/UCdomap.c lynx2-8-3/src/UCdomap.c
--- orig/lynx2-8-3/src/UCdomap.c        Thu Nov  4 11:41:38 1999
+++ lynx2-8-3/src/UCdomap.c     Tue Jan  4 23:08:08 2000
@@ -1578,13 +1578,13 @@
     }
 #endif
 #if !NO_CHARSET_euc_jp
-    if (!strncasecomp(value, "iso-2022-jp", 11) ||
-       !strcasecomp(value, "x-euc-jp")) {
+    if (!strcasecomp(value, "x-euc-jp")) {
        return UCGetLYhndl_byMIME("euc-jp");
     }
 #endif
 #if !NO_CHARSET_shift_jis
-    if (!strcasecomp(value, "x-shift-jis")) {
+    if ((!strcasecomp(value, "x-shift-jis")) ||
+       (!strcasecomp(value, "x-sjis"))) {
        return UCGetLYhndl_byMIME("shift_jis");
     }
 #endif
examples.tar.gz
Description: Binary data
[Prev in Thread]
Current Thread
[Next in Thread]
lynx-dev changes for Japanese (was: Re: reading SJIS docs), Hataguchi Takeshi <=
- Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Henry Nelson, 2000/01/12
  - Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Mike Bledig, 2000/01/12
    - Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Leonid Pauzner, 2000/01/12
- Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Hataguchi Takeshi, 2000/01/13
  - Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Klaus Weide, 2000/01/13
    - Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Hataguchi Takeshi, 2000/01/16
- Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), T.E.Dickey, 2000/01/13
- Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Henry Nelson, 2000/01/13
- Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs), Henry Nelson, 2000/01/18
Prev by Date: lynx-dev lynx 2.2.2 not working on AIX 3.2.5
Next by Date: lynx-dev lynx 2.8.2 not working on AIX 3.2.5
Previous by thread: lynx-dev Possible Y2K issue with Lynx
Next by thread: Re: lynx-dev changes for Japanese (was: Re: reading SJIS docs)
Index(es):
- Date
- Thread