lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: script to convert those =?? to HTML tags [was Re: LYNX-DEV ftp files


From: David Woolley
Subject: Re: script to convert those =?? to HTML tags [was Re: LYNX-DEV ftp files]
Date: Tue, 23 Dec 1997 00:14:02 +0000 (GMT)

>   it's so-called "quoted printable" encoding.  AFAIK, it's a MICROSOFTism.

It's nothing to do with Microsoft.  It is the internet mail way of handling
text with a bias to seven bit characters over mail routers that are not
eight bit clean, have line length restrictions or lose trailing spaces.
It is becoming more common because people are handling eight bit data 
correctly, rather than sending it in violation of the protocol specs.  The
only real problem is that it tends to be sent unnecessarily, e.g. most
of the mail on this list which triggers metamail doesn't actually require
the use of MIME features.   (Microsoft may tend to use it because they think
paragraphs are one long line; however it predated Microsoft's discovery
of the internet.)

The other problem is that in many countries using eight bit character sets
a tradition has built up of using the local character set, undeclared,
and with all 8 bits, in violation of the SMTP protocol; this has caused
a lot of hostility to MIME in the past, but that was before Outlook
started becoming the de facto standard mail user agent for the whole world
(Eudora and Pegasus are also quite comfortable with it, as are many lesser
known ones).

(Data which is not biassed towards 7 bit codes is transmitted in base64,
which is a sort of improved and standardised UUENCODE.)

> the form is "=<numeric_value>" of the character in the MS-DOS character-set
> being used.

It's actually the hex value of the character in the character set declared
in the Content-Type: header, which defaults to US-ASCII, with the result 
that most MIME Quoted Printable will have an explicit character set 
specified, as ASCII is pure 7 bit.  = at the end of the line is a special
case, and escapes the end of line so that you can have indefinitely long
lines.  (Note that the most likely explicit character sets would be 
ISO 8859/1 and Windows code page 1252, neither of which have much in common
with MS-DOS character sets.  Unfortunately, much that is declared as ISO
8859/1 is likely really to be CP 1252.)

You might want to note that HTTP is based on MIME, the standard which also
defines this coding.  However, HTTP assumes an 8 bit path and defaults to
ISO 8859/1.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]