[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Andries E. Brouwer |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Wed, 19 Aug 2015 02:52:57 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Aug 19, 2015 at 01:43:51AM +0200, Ángel González wrote:
> And of course, there's the question of what to do if the filename we
> are trying to convert to utf-16 is not in fact valid utf-8.
My current understanding:
(i) there is a current patch, that fixes most problems on Unix
and can be applied today
(ii) one also wants to fix Windows problems, and in the process
do something more general for Unix. We can discuss a future
patch that does something like:
Look at the remote filename.
Assign a character set as follows:
- if the user specified a from-charset, use that
- if the name is printable ASCII (in 0x20-0x7f), take ASCII
- if the name is non-ASCII and valid UTF-8, take UTF-8
- otherwise take Unknown.
Determine a local character set as follows:
- if the user specified a to-charset, use that
- if the locale uses UTF-8, use that
- otherwise take ASCII
Convert the name from from-charset to to-charset:
- if the user asked for unmodified filenames, do nothing
- if the name is ASCII, do nothing
- if the name is UTF-8 and the locale uses UTF-8, do nothing
- convert from Unknown by hex-escaping the entire name
- convert to ASCII by hex-escaping the entire name
- otherwise invoke iconv(); upon failure, escape the illegal bytes
See whether the resulting name can be used. On Unix all strings
(without NUL and '/') are ok. On Windows there are many restrictions.
Further hex escape problematic characters on Windows.
Since conversions to 8-bit character sets will often fail,
it is desirable to convince Windows to use Unicode as current codeset.
Maybe that requires a copy of the common fileio routines.
That is my view of the result of the present conversation.
Probably some refinements will be needed. Moreover, there is
interference with iri stuff that should be looked at.
Once we know what we want it is trivial to write the code,
but it may take a while to figure out what we want.
I think we should start applying the current patch.
Andries
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Ángel González, 2015/08/18
- Re: [Bug-wget] bad filenames (again),
Andries E. Brouwer <=
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21