[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Andries E. Brouwer |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Sun, 16 Aug 2015 22:21:20 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Sun, Aug 16, 2015 at 05:43:50PM +0300, Eli Zaretskii wrote:
(i)
>> #if defined(WINDOWS) || defined(MSDOS) || defined(__CYGWIN__)
>> /* insert some test for Windows */
>> #else
>> ... code that uses getenv to test LC_ALL, LC_CTYPE, LANG ...
>> #endif
> I'm not sure this is the right way to fix this. First, relying on
> UTF-8 locale to be announced in the environment is less portable than
> it could be: it's better to call 'setlocale' with the 2nd argument
> NULL to glean the same information. Then the ugly #ifdef above could
> be dropped, and at least Cygwin will not be excluded from this
> feature.
I left the wget behaviour for MSDOS / Windows / Cygwin unchanged
because I do not know anything about these platforms. It is quite
possible that the #ifdef is unneeded.
Are you saying that it in fact is needed when getenv() is used,
but unneeded when setlocale() is used? And then what about LANG?
(ii)
> Moreover, even if the locale is not UTF-8, wget should attempt to
> convert the file names to the current locale using iconv (which I
> believe was what Tim suggested). This will DTRT in much more cases
> than the above UTF-8 centric approach, IMO.
Hmm. My own point of view is almost the opposite. In my life I have
spent countless hours trying to repair the damage done by software
that helpfully modified my data.
I prefer my data as-is, unless I explicitly ask for conversion.
I think Tim suggested something else (namely, just checking whether
the filename was valid UTF-8), but never mind.
The patch enlarges the number of cases where the original data
is preserved. Yes, I am all in favour of enlarging that number of
cases even further. This is only a first step. But in my eyes
applying iconv would be a step back. It can be really tricky to
decode the mojibake obtained by converting A to C, while
the original really was in B.
How do you guess the original character set?
What should happen when iconv() returns EILSEQ?
Andries
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/07
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/07
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/09
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/12
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/12
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/12
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/12
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/13
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/13
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/16
- Re: [Bug-wget] bad filenames (again),
Andries E. Brouwer <=
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/16
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/17
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18