[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] rendering — (0x97)

From: Mouse
Subject: Re: [Lynx-dev] rendering — (0x97)
Date: Mon, 29 Jun 2020 16:27:12 -0400 (EDT)

>> Content-Encoding=Windows-1252
> I meant Charset, and I hadn't read the other replies.

> If it is the document character set I'm not sure how one should
> interpret that for variable length codes.

As a codepoint, rather than as a encoding octet, I would guess.

Content-Type:'s charset= is actually two things.  (It arguably
shouldn't be, but since when has that made any difference to
HTTP-family protocols?)  It is a charset in the strict sense, a mapping
from integer codepoints to abstract characters, and it is an encoding,
a way of turning a stream of integer codepoints into a stream of
octets.  The latter really should be split out into a separate header;
I speculate that that wasn't done because everyone used the trivial
encoding for single-octet character sets, then added UTF-8, and nobody
noticed that they were silently adding an encoding spec to the charset
spec until after it got entrenched.

I could argue it either way whether something like — should be
"octet 151 for the encoding specified by charset=" or "codepoint 151
for the character set specified by charset=".  I do strongly believe
it is broken for it to be "Unicode codepoint 151" even if the charset=
specifies something very non-Unicode like 8859-14 or KOI-8.  If nothing
else, it makes it completely impossible to represent non-single-octet
codepoints when using a character set that is not a subset of Unicode.
But what I believe doesn't matter....

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML      
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

reply via email to

[Prev in Thread] Current Thread [Next in Thread]