Re: [Bug-wget] bad filenames (again)

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)

From:	Andries E. Brouwer
Subject:	Re: [Bug-wget] bad filenames (again)
Date:	Mon, 17 Aug 2015 19:58:31 +0200
User-agent:	Mutt/1.5.21 (2010-09-15)

On Mon, Aug 17, 2015 at 06:27:05PM +0300, Eli Zaretskii wrote:

>> (ii) [about possibly using iconv]
>> 
>>>> How do you guess the original character set?
>
> The answer is call "nl_langinfo (CODESET)".

I think we are not communicating.

wget fetches a file from a remote machine.
We know the filename (as a sequence of bytes).
As far as I can see, there is no information on what character set
(if any) that sequence of bytes might be in.

In order to call iconv, I need a from-charset and a to-charset.
I think your answer tells me how to find a reasonable to-charset.
But the problem is how to find a from-charset.

[Even when from-charset and to-charset are known there is
a can of worms involved in conversion. But without from-charset
one cannot even start thinking about conversion.]

> > Unix filenames are not necessarily in any particular character set.
> > They are sequences of bytes different from NUL and '/'.
> > A different sequence of bytes is a different filename.
> 
> As long as you treat them as UTF-8 encoded strings, ...

I don't understand how one can treat sequences of bytes
that are not valid UTF-8 as UTF-8 encoded strings.
If all the world is UTF-8 then fine. But the remote machine
is an unknown system. We just have a byte sequence, that is all.

Andries

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] bad filenames (again), (continued)

Prev by Date: [Bug-wget] [bug #45732] Please document --ask-password in manual section 2.1
Next by Date: Re: [Bug-wget] bad filenames (again)
Previous by thread: Re: [Bug-wget] bad filenames (again)
Next by thread: Re: [Bug-wget] bad filenames (again)
Index(es):
- Date
- Thread