[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Andries E. Brouwer |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Sat, 22 Aug 2015 00:39:01 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Aug 21, 2015 at 08:54:28PM +0200, Tim Rühsen wrote:
> > Content-Disposition: attachment;
> > filename="20101202_%EB...%A8-%EB%B0%B1_.sgf"
> > This encodes a valid utf-8 filename, and that name should be used.
> > So wget should save this file under the name
> > 20101202_농심신라면배_바둑(다카오신지9단-백_.sgf
>
> This is a different issue. Here we are talking about the encoding of HTTP
> headers, especially 'filename' values within Content-Disposition HTTP header.
> Wget simply does not parse this correctly - it is just not coded in.
> It is just Wget missing some code here (worth opening a separate bug).
Good, saved for later.
> If the server AND the document do not explicitly specify the character
> encoding, there still is one - namely the default. Has been ISO-8859-1
> a while ago. AFAIR, HTML5 might have changed that (too late for me now
> to look it up).
Yes - that is our main difference. You read the standard and find there
what everyone is supposed to do, or what the default is.
I download stuff from the net and encounter lots of things people do,
that are perhaps not according to the most recent standard,
and may differ from the default.
As a consequence I prefer to base the decision about what to do
on the form of the filename (ASCII / UTF-8 / other), not on the
headers encountered on the way to this file.
Fortunately, almost all URLs are in ASCII - no problem.
Fortunately, almost all that are not in ASCII, are UTF-8.
The good thing of UTF-8 is that it has a quite typical bit pattern.
A non-ASCII filename that is valid UTF-8 is very likely UTF-8.
So, one can recognize ASCII and UTF-8 rather reliably.
(By the way, I checked my conjecture that iconv from UTF-8
to UTF-8 need not be the identity map, and that is indeed the case.
On my Ubuntu machine iconv from UTF-8 to UTF-8 converts NFD to NFC.)
Andries
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Rühsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again),
Andries E. Brouwer <=
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/24
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/25
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Ángel González, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Ángel González, 2015/08/23
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/23
- Re: [Bug-wget] bad filenames (again), Ángel González, 2015/08/23