bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problem with ÅÄÖ and wget


From: Bykov Aleksey
Subject: Re: [Bug-wget] Problem with ÅÄÖ and wget
Date: Sun, 15 Sep 2013 01:59:23 +0300
User-agent: Opera Mail/12.14 (Win32)

Greetings

Great thanks for pushing in correct direction.

With attached patch Wget in Windows can work with UTF-8 names. But - also only with "--restrict-file-names=nocontrol"...

Windows need conversion for all work with wide chars. MultiByteToWideChar() choosed because it allow to force set input encdoing. And after convertion chars can be checked separatly for restriction. As variant - restricted symbol replaced and whole string converted back to UTF-8 with WideCharToMultiByte(). It is possible in UNIX use mbstowcs()/wcstombs with setlocale(LC_ALL, "UTF-8") for same purpose? Or exist some better way to convert shortstring to widestring during character quoting?

--
Best regars, Alex

On Fri, 13 Sep 2013 16:13:10 +0300, Tim Ruehsen <address@hidden> wrote:

On Friday 13 September 2013 12:43:53 Bykov Aleksey wrote:
Greetings
Yes, You show correct cyrillic filename.
Sorry, I'm not aggree that this bug is ready to close.
Your method is mentioned in it.
This bug about filenames in non UTF-8 locales.

Main qoute:
> If you are using a unix-like OS where the filesystem interface uses
> utf-8, there is a workaround of using --restrict-file-names=nocontrol
> (which is still too big, as that would allow problematic control
> characters %01 or %09).
> If you are using Windows, --restrict-file-names=nocontrol still gives
> garbage (the utf-8 characters are treated as if they were in latin1).

Thanks for pointing this out. I missed it.

I'm tried to solve this bug by adding new options
--local-filesystem-encoding
http://lists.gnu.org/archive/html/bug-wget/2013-05/msg00102.html
but patch was (rejected?)/(frozen?)/(lack of demand?).

It seems, there has be no discussion about. I interpret that it might be a
lack of interest - but i am not sure.

But quick net search reveals that NTFS is using UTF-16 (UNICODE) while fopen()
demands ASCII !?
[1] suggests to feed UTF-8 strings to CreateFile() or wfopen() when built with
UNICODE. For a non-UNICODE build use CreateFileW() or wfopen().

So maybe your patch used the wrong approach.
You should try to use the above mentioned functions for WINDOWS builds.
If that works, the patch will be just a few lines...

Sorry, I don't know how Björn Mattsson swith it Windows Vista (x64)
filesystem to UTF-8.
In Russian locales Windows 98, XP (x86), Vista (x86) use filesystem
encoding CP866.

Wasn't there something like international language support even for Windows 98 ? Together with perhaps some new fonts, that should do it... but hey, I out of
the Windows business since 12 years now and I never regretted it.


[1] http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as
[2] http://en.wikipedia.org/wiki/Filename

Attachment: win_utf-8.diff
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]