[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Eli Zaretskii |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Tue, 18 Aug 2015 19:39:40 +0300 |
> Date: Tue, 18 Aug 2015 17:28:34 +0200
> From: "Andries E. Brouwer" <address@hidden>
> Cc: "Andries E. Brouwer" <address@hidden>, address@hidden,
> address@hidden
>
> > > About the remote situation even less is known.
> >
> > Assuming UTF-8 will go a long way towards resolving this. When this
> > is not so, we have the --remote-encoding switch.
>
> This is wget. The user is recursively downloading a file hierarchy.
> Only after downloading does it become clear what one has got.
In some use cases, yes. In most others, no: the encoding is known in
advance.
> I download a collection of East Asian texts on some topic.
> Upon examination, part is in SJIS, part in Big5, part in EUC-JP,
> part in UTF-8. Since the downloaded stuff does not have a uniform
> character set, and surely the server is not going to specify
> character sets, any invocation of iconv will corrupt my data.
> When I get the unmodified data I look using browser or editor
> or xterm+luit for which character set setting I get readable text.
I already said that wget should support this use case. I just don't
think it should be the default.
> > > It would be terrible if wget decided to use obscure heuristics to
> > > invent a remote character set and then invoke iconv.
> >
> > But what you suggest instead -- create a file name whose bytes are an
> > exact copy of the remote -- is just another heuristic.
>
> No. An exact copy allows me to decide what I have.
Which is the heuristic you want this to be solved. IMO, such a
heuristic will not server most of the users in most of use cases.
Users just want wget to DTRT automatically, and have the file names
legible.
> Conversion leads to data loss.
When it does, or there's a risk that it does, users should use
optional features to countermand that.
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again),
Eli Zaretskii <=
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18