bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)


From: Andries E. Brouwer
Subject: Re: [Bug-wget] bad filenames (again)
Date: Tue, 18 Aug 2015 12:55:50 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Aug 18, 2015 at 11:58:54AM +0200, Tim Ruehsen wrote:

> > Unix filenames are sequences of bytes, they do not have a character set.
> 
> The character encoding makes with what symbols these bytes
> (or byte sequences aka multibyte / codepoints) are displayed for you.

Sure. So each time I load a different font, I see different glyphs
for my symbols. The file with single-byte name 0xff will look like
a Dutch ligature ij in some fonts, and quite different in other fonts.

The point is: it is the user's choice to load a font. (Or to set a locale.)
The filenames themselves do not carry additional information
about their character set.
For historical reasons a single directory can have files with names
in several character sets.

All this is about the local situation. One cannot know "the character set"
of a filename because that concept does not exist in Unix.
About the remote situation even less is known. It would be terrible
if wget decided to use obscure heuristics to invent a remote character set
and then invoke iconv.

Andries



reply via email to

[Prev in Thread] Current Thread [Next in Thread]