[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Charset names in ac105

From: Foteos Macrides
Subject: Re: LYNX-DEV Charset names in ac105
Date: Tue, 20 Jan 1998 17:26:51 -0500 (EST)

Leonid Pauzner <address@hidden> wrote to me (and TD) instead of
>> Also, both 2.7.2 and dev code
>> use strange "strip high bit" technique for Cyrillic koi8-r texts
>Thanks for incorporating koi8-r fallback in 2.7.2 so quickly!!!
>Sorry, in def7_uni.tbl please correct 4 letters
>against the patch I posted few days ago:
>! U+042e:JU
>! U+042f:JA
>! U+044e:ju
>! U+044f:ja
>replace this with  YU, YA, yu, ya  - respectively.
>It is more common usage. Hope no more.

        Thanks.  I added those additional fixes to

>I wonder, though, why you leave few lines in UCAux.c (quoted below).
>(1) There is no more koi-8 and cp1251 here since we pass through aliasing,
>and (2) it may mess somebody up if one more cyrillic charset
>would be incorporated in the future...
>Why we need any specials for cyrillic if unicode now in action?
>Also, stripping out 8bit (commented out stuff) may be interesting
>for history purpose only: >50% pages here around in windows-1251 and if
>someone really want to read KOI7 (which is definitely less readable) let he
>correct def7_uni that way...
>>             return TQ_NO;
>>         }
>> !       if (!strcmp(fromname, "koi8-r")) {
>> !           /*
>> !            *  Will try to use stripping of high bit...
>> !            */
>> !           tqmin = TQ_POOR;
>> !       }
>> !
>> !       if (!strcmp(fromname, "koi8-r") || /* from cyrillic */
>> !           !strcmp(fromname, "iso-8859-5") ||
>> !           !strcmp(fromname, "cp866") ||
>> !           !strcmp(fromname, "cp1251") ||
>> !           !strcmp(fromname, "koi-8")) {
>> !           if (strcmp(toname, "iso-8859-5") &&
>> !               strcmp(toname, "koi8-r") &&
>> !               strcmp(toname, "cp866") &&
>> !               strcmp(toname, "cp1251"))
>> !               tqmax = TQ_POOR;
>> !       }

        I didn't include Klaus' "probabilistic" TQ_foo stuff in Lynx
v2.7.2 because it was too complex, with unpredictable consequences,
IMHO.  In v2.7.2, that function returns YES or NO, and now should always
return YES if the input charset is any of the Russian Cyrillic charsets
for which we have Unicode support to generate the corresponding characters
or 7 bit approximations.  That fix is in as well.

        In the development code, the "cp1251" should be changed to
"windows-1251", the check for "koi-8 should be eliminated (the strings
always will correspond to the internal MIME names, and never to their
synonyms; koi-8 is treated as a synonym for koi8-r to deal with an
accentsoft bug), and possibly other tweaks are needed to do what Klaus
intended, but the way it's used in the development code is too complex
to be sure without empirical testing.


 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545

reply via email to

[Prev in Thread] Current Thread [Next in Thread]