Re: [Bug-wget] bad filenames (again)

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)

From:	Eli Zaretskii
Subject:	Re: [Bug-wget] bad filenames (again)
Date:	Wed, 19 Aug 2015 17:12:08 +0300

> Date: Tue, 18 Aug 2015 22:28:21 +0200
> From: "Andries E. Brouwer" <address@hidden>
> Cc: "Andries E. Brouwer" <address@hidden>, address@hidden,
>         address@hidden
> 
> > What is needed to have a full Unicode support in wget on Windows is to
> > provide replacements for all the file-name related libc functions
> > ('fopen', 'open', 'stat', 'access', etc.) which will accept file names
> > encoded in UTF-8, convert them internally into UTF-16, and call the
> > wchar_t equivalents of those functions ('_wfopen', '_wopen', '_wstat',
> > '_waccess', etc.) with the converted file name.  Another thing that is
> > needed is similar replacements for 'printf', 'puts', 'fprintf',
> > etc. when they are used for writing file names to the console --
> > because we cannot write UTF-8 sequences to the Windows console.
> 
> Aha. That reminds me of a patch by I think Aleksey Bykov.
> Yes - see http://lists.gnu.org/archive/html/bug-wget/2014-04/msg00080.html
> 
> There we had a similar discussion, and he wrote mswindows.diff with
> 
> +int 
> +wc_utime (unsigned char *filename, struct _utimbuf *times)
> +{
> +  wchar_t *w_filename;
> +  int buffer_size;
> +
> +  buffer_size = sizeof (wchar_t) * MultiByteToWideChar(65001, 0, filename, 
> -1, 
> w_filename, 0);
> +  w_filename = alloca (buffer_size);
> +  MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size);
> +  return _wutime (w_filename, times);
> +}
> 
> and similar for stat, open, etc. Something similar is what would be needed on 
> Windows?

Yes, thanks for pointing out those patches.  Any reasons they weren't
accepted back then?

> Is his patch usable?

It needs some minor polishing, but in general it should do the job,
yes.

I admit that I don't understand the need for the url.c patch.  Why do
we need to convert to wchar_t when the locale's codeset is already
UTF-8?  (I could understand that for non-UTF-8 locales, but the patch
explicitly limits the conversion to wchar_t and back to UTF-8 locales,
where the normal string functions should do the job.)  Is this only
for converting to upper/lower-case?

There's still the part with writing UTF-8 encoded file/URL names to
the Windows console; that will have to be added.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] bad filenames (again), (continued)

Prev by Date: [Bug-wget] [bug #43799] wget should implement OCSP + OCSP stapling
Next by Date: Re: [Bug-wget] bad filenames (again)
Previous by thread: Re: [Bug-wget] bad filenames (again)
Next by thread: Re: [Bug-wget] bad filenames (again)
Index(es):
- Date
- Thread