bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] IRIs not encoded in UTF-8 locale


From: Magnus Holmgren
Subject: [Bug-wget] IRIs not encoded in UTF-8 locale
Date: Thu, 17 Feb 2011 21:08:54 +0100
User-agent: KMail/1.13.5 (Linux/2.6.32-5-amd64; KDE/4.4.5; x86_64; ; )

Hi!

Unless someone's fixed it since version 1.12, there's a bug in iri.c causing 
international domain names not to be IDN encoded if the current locale is a 
UTF-8 one.

The problem is that idn_encode expects remote_to_utf8 to return true iff there 
was something to encode, but remote_to_utf8 returns false if do_conversion 
didn't change the string, which is the case if the string is pure ASCII *or* 
already UTF-8 encoded. The test on line 290 needs to be changed to a check for 
high bits. If i->uri_encoding is "UTF-8", the whole iconv bit of course can be 
skipped or be replaced with a check for valid UTF-8.

Alternatively, idn_encode should not return NULL immediately when 
remote_to_utf8 returns false. remote_to_utf8 may need to differentiate between 
"error" and "nothing to encode".

-- 
Magnus Holmgren        address@hidden

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]