bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uuencode: multi-bytes char in remote file name contains bytes >0x80


From: Eric Blake
Subject: Re: uuencode: multi-bytes char in remote file name contains bytes >0x80
Date: Fri, 08 Jul 2011 17:25:11 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.10

On 07/08/2011 05:11 PM, Bruce Korb wrote:
> 
> Hi Eric(s),
> 
> This mojibake stuff is mumbo jumbo to me.

mojibake is what happens when you interpret bytes from one character set
as though they were characters in another character set, and then
convert them according to that wrong assumption.  A common symptom is
that when you view UTF-8 text with a unibyte Latin-1 charset, each
multibyte UTF-8 character appears as multiple 8-bit random characters
from Latin-1.

> 
> I looked into the iconv(3p) function a bit and it seems to be dependent
> upon some characters strings that are different from what one might
> put in LANG or LC_ALL or LC_NAME environment variables.  Those guys
> take things like EN_us, for example, not character set specifications.
> So how am I to know what the current character set it if all I know is
> CN_hk, for example?

I suggest using the gnulib module localcharset which provides the
function locale_charset().  That should give an answer which is safe to
pass to iconv() as one of the two charsets, with "utf-8" being the other
charset.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]