[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev cleanup chartrans [patch]
From: |
Leonid Pauzner |
Subject: |
Re: lynx-dev cleanup chartrans [patch] |
Date: |
Fri, 26 Feb 1999 00:16:34 +0300 (MSK) |
25-Feb-99 07:36 Klaus Weide wrote:
> On Thu, 25 Feb 1999, Leonid Pauzner wrote:
>> 25-Feb-99 04:49 Klaus Weide wrote:
>> > What's special about 8859-15, to be the only one left intact here
>> > besides 8859-1? "7 Bit Approximations" would much rather deserve that
>> > honor.
>> Well, yes, but this got hidden the another side...
> It would be more logical to put "7 Bit Approximations" in the 2nd place
> though. That would change the order in the "Display character set" option
> list, but maybe that's not a bad idea anyway.
OK, no problem (next time).
I would probably suggest moving "7 bit approximation" to the first place
(current_char_set = 0) instead. I change lots of such explicit checks
with LATIN1 macro, but no guarantee for all places. This is somehow
close to the logic of ISO_Latin1 usage...
>> >> @@ -394,41 +369,6 @@
>> >> * Placeholders for Unicode tables. - FM
>> >> */
>> >> {-1,"iso-8859-15", UCT_ENC_8BIT,0,0,0, UCT_R_8BIT,UCT_R_ASCII},
>> >> - {-1,"cp850", UCT_ENC_8BIT,0,
>> >> - UCT_REP_SUPERSETOF_LAT1,
>> >> - 0, UCT_R_8BIT,UCT_R_ASCII},
>> > [ etc - including CJK, 7-bit approx., transparent ]
>>
>> > The various tables here served to provide some minimal information
>> > (without taking much space) about several charsets / Display character
>> > sets even in the case where chartrans table files for them were not
>> > included. Yes it's redundant; however, sometimes redundancy *may* be
>> > good.
>> Yes, but in this case I think this redundancy may be misguiding for other
>> changes. In fact, no fields from this struct are used except mime name and
>> encoding name, only UCT_REP_* _may_ be useful when we are very close to
>> old-style LATIN1 charset.
> Yes, most of those bits are underused... probably even more so now.
> I always liked to keep the possibility open to one day do something more
> with that info, or put more detailed info in that struct (like *what kind*
> of CJK encoding, or which scripts of Unicode where present in a charset's
> repertoire). But it hasn't happened, and leaving it open is not exactly
> compatible with your goal of cleaning up.
"*What kind* of CJK encoding" can be mapped to 'enc' value as a region.
Check for "160" and "173" can be done dynamically (rare), e.g.
if (160 == UCTransChar(160, from_charset, to_unicode)) {}
Other info can be incorparated into *_uni.h format when necessary.
>> > still the case; maybe it's not wanted. It probably hasn't been tested
>> > by anyone in a long time. An example would be the case where someone
>> > wanted to not have the large 7-bit approximations file, but still have
>> > 7-bit approximations available as Display character set to at least
>> > deal with the "classical" ISO-8859-1 chars and entities.
Well, this two old-style tables in LYCharSets.c and corresponding code
may probably be #ifdef'ed with OLD_STYLE_CLASSIC (iso-latin1 for any display)
- will look more closely (and this is a little bit harder than just removing).
>> How about Euro/(TM)/Copyright/emdash/etc requests?
> It's not strictly 8859-1 but with some extensions - &trade, ©, &emdash
> were "classically" covered,
yes, and no &emdash in HTML 4.0 but only &mdash :)
euro is much to new (and isn't listed in
> entities.h even now, as of dev.17).
the table dated to 1997, a superset of HTML 4.0 entities.