bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uuencode: multi-bytes char in remote file name contains bytes >0x80


From: Bruno Haible
Subject: Re: uuencode: multi-bytes char in remote file name contains bytes >0x80
Date: Wed, 6 Jul 2011 21:55:01 +0200
User-agent: KMail/1.9.9

Bruce Korb wrote:
> I think the arguments are sufficient to make the changes.
> The change will include uudecode changes so it can detect
> and handle the encoded file names, and uudecode will get
> an "encode-filename" ("-e") option.

Where and how will the charset conversion of the filenames be handled?

Remember, in the scope of a single user or a single machine, it can be OK
to treat a file name as a mere sequence of bytes - assuming all users on
that machine use the same encoding. But when a user in an UTF-8 locale
send a file named "jörg" to some recipients, and some of the recipients
get a file named "jörg" created on their disk and others a file named
"jörg" (namely those in a ISO-8859-15 locale) and others a file named
"j枚rg" (namely those in a GB18030 locale), that will be viewed as bug.

There are two ways to deal with it:
  a) Do the charset conversion on the receiver's side, and on the sender's
     side only embed the charset. The most well-known encoding of this
     kind is probably the way subject lines are encoded in MIME:
     "jörg" would become
        =?iso-8859-1?Q?j=F6rg?=
     or
        =?utf-8?Q?j=C3=B6rg?=
     or
        hex-encode:3F69736F2D383835392D313F513F6A3D463672673F
  b) Do the charset conversion both on the sender's side and on the
     receiver's side, and always send filenames converted to UTF-8.
     Example:
        j=C3=B6rg
     or
        hex-encode:6AC3B67267

Bruno
-- 
In memoriam Jan Hus <http://en.wikipedia.org/wiki/Jan_Hus>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]