[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uuencode: multi-bytes char in remote file name contains bytes >0x80
From: |
Bruno Haible |
Subject: |
Re: uuencode: multi-bytes char in remote file name contains bytes >0x80 |
Date: |
Wed, 6 Jul 2011 22:56:00 +0200 |
User-agent: |
KMail/1.9.9 |
Hi Bruce,
> I pick the way that is most robust and prone to the fewest problems.
> You tell me, please. :)
OK :)
> > a) Do the charset conversion on the receiver's side, and on the sender's
> > side only embed the charset. The most well-known encoding of this
> > kind is probably the way subject lines are encoded in MIME:
> > "jörg" would become
> > =?iso-8859-1?Q?j=F6rg?=
> > or
> > =?utf-8?Q?j=C3=B6rg?=
> > or
> > hex-encode:3F69736F2D383835392D313F513F6A3D463672673F
This approach was preferred between ca. 1995 and 1999, because at that time,
it was not clear that Unicode would succeed in the way it did.
> > b) Do the charset conversion both on the sender's side and on the
> > receiver's side, and always send filenames converted to UTF-8.
> > Example:
> > j=C3=B6rg
> > or
> > hex-encode:6AC3B67267
Whereas this approach b) is the preferred one since ca. 2001.
> I'll do what you suggest and run the result
> past both you and our new friend, =?GB2312?B?j4jI/g==?=
You are presenting a good argument for b) and against a). Namely, the charset
label is often wrong. As in your example: It claims to be GB2312, but is in
fact CP936, an extension of GB2312 [1].
$ echo -n j4jI/g== | base64 -d | iconv -f GB2312 -t UTF-8
iconv: (stdin):1:0: cannot convert
$ echo -n j4jI/g== | base64 -d | iconv -f CP936 -t UTF-8
張叁
Such mislabeling is present in email and HTML, for historical reasons. It is
better to use approach b), because it does not require that the sender and
receiver have a common understanding what they mean by "GB2312" (or worse:
by "Big5").
Additionally, approach b) also leads to shorter strings usually than
approach a). Which is also a consideration, given that uuencode's output
should fit in 80 columns.
Bruno
[1] http://www.haible.de/bruno/charsets/conversion-tables/GB2312.html
--
In memoriam Jan Hus <http://en.wikipedia.org/wiki/Jan_Hus>
- uuencode: multi-bytes char in remote file name contains bytes >0x80, ��叁, 2011/07/03
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Bruce Korb, 2011/07/03
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Eric, 2011/07/03
- Message not available
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Eric, 2011/07/06
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Bruce Korb, 2011/07/06
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Bruno Haible, 2011/07/06
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Bruce Korb, 2011/07/06
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80,
Bruno Haible <=
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Eric Blake, 2011/07/06
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Bruce Korb, 2011/07/08
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Eric Blake, 2011/07/08
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Eli Zaretskii, 2011/07/09
- Re: file names encoding on Windows, Bruno Haible, 2011/07/09
- Re: file names encoding on Windows, Eli Zaretskii, 2011/07/09
- Re: file names encoding on Windows, Bruce Korb, 2011/07/09
- Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Bruno Haible, 2011/07/08
Re: uuencode: multi-bytes char in remote file name contains bytes >0x80, Bruce Korb, 2011/07/03