[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV more chartrans
Re: LYNX-DEV more chartrans
Tue, 25 Nov 1997 13:28:29 -0600 (CST)
On Tue, 25 Nov 1997, Leonid Pauzner wrote:
> One more here, about charset assumption if we have 7bit text.
> A huge majority of texts have only us-ascii symbols,
> but if you check '=' or p)rint to email
> there is an assumed charset name introduced,
> non us-ascii in any case and possible non iso-latin-1.
These two charsets are quite different:
- The (possibly "assumed") charset shown with '=' refers to the original
text as received from the net (or read from disk for "file:" URLs etc.)
[That's how it should be; if you find it isn't so, as usual report it as
- The charset parameter in generated e-mail headers refers to the contents
of that mail; since the mail body is generated from already
charset-translated text, it has to reflect the Display character set,
not the charset of the original text.
> This may confuse people and some wrongly implemented mailreaders
> and possible increase haos.
> Is it possible to test new coming page whether it have 8bit or not,
> and set the appropriate variable to us-ascii in the code?
Currently this is already done for the generated mail headers (for
mail generated from the 'P'rint menu): if the _rendered_ text (i.e. after
charset translation) doesn't have any chars with high bit set, then a
charset parameter will not be generated in the mail headers. [As usual,
if you find it doesn't work like that, report it...]
This doesn't apply to the charset shown with '=', but I don't see why it
should. If you use '=', I "assume" you want to know what charset was in
effect for a document, independent of whether this has any practical
effect for a specific document, or has no effect (because there weren't
any non-ASCII raw characters).
> If yes, you may remove x00-x7f (or at least x20-x7e) region
> from every chartrans table (is there any exception from us-ascii conformance?
> maybe x7F somewhere?)
> I just care of information (dis-information) purpose,
> but even it may save CPU out of chartrans "projection" if you need no such.
It is possible to remove (comment out) the x00-x7f range from more
of the tables. There is a small memory (not CPU) saving possible.
But I don't really see what this has to do with the previous topic.
In at least one case mappings from the x00-x7f range are done (VISCII has
some visible glyphs at 7-bit control char positions); and if someone
really wants to create a "charset" that includes the PC characters at
x00-x1f, or make a table for a 7-bit national ASCII variant, it could be
done this way (some code changes would also be necessary); so the mapping
mechanism should also cover this range, even if it is normally not used
> (of cause, I mean one-byte encodings only.
> >From the other hand, you still need to translate ©
> and others &...; in HTML.)
A "©" gets translated to an 8-bit character if the display character
set has it; that is independent of whatever charset is in effect for the
original text. If you mail the result, Lynx will normally generate a
charset parameter in the mail headers, so the recipient can know what is
what. If you really have to avoid that, you can use "7 bit approximations"
as display character set. Or convince the page authors not to put
copyright notices on their pages...