bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)


From: Andries E. Brouwer
Subject: Re: [Bug-wget] bad filenames (again)
Date: Mon, 17 Aug 2015 19:58:31 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Aug 17, 2015 at 06:27:05PM +0300, Eli Zaretskii wrote:

>> (ii) [about possibly using iconv]
>> 
>>>> How do you guess the original character set?
>
> The answer is call "nl_langinfo (CODESET)".

I think we are not communicating.

wget fetches a file from a remote machine.
We know the filename (as a sequence of bytes).
As far as I can see, there is no information on what character set
(if any) that sequence of bytes might be in.

In order to call iconv, I need a from-charset and a to-charset.
I think your answer tells me how to find a reasonable to-charset.
But the problem is how to find a from-charset.

[Even when from-charset and to-charset are known there is
a can of worms involved in conversion. But without from-charset
one cannot even start thinking about conversion.]

> > Unix filenames are not necessarily in any particular character set.
> > They are sequences of bytes different from NUL and '/'.
> > A different sequence of bytes is a different filename.
> 
> As long as you treat them as UTF-8 encoded strings, ...

I don't understand how one can treat sequences of bytes
that are not valid UTF-8 as UTF-8 encoded strings.
If all the world is UTF-8 then fine. But the remote machine
is an unknown system. We just have a byte sequence, that is all.

Andries



reply via email to

[Prev in Thread] Current Thread [Next in Thread]