bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Failing tests


From: Tim Ruehsen
Subject: [Bug-wget] Failing tests
Date: Thu, 02 Oct 2014 17:10:26 +0200
User-agent: KMail/4.14.1 (Linux/3.16-2-amd64; KDE/4.14.1; x86_64; ; )

Having a non "C" locale, Wget repeatable fails threee tests:

FAIL: Test-iri.px
FAIL: Test-iri-percent.px
FAIL: Test-iri-forced-remote.px

example (of course you must have en_US.utf8 installed):
TESTS_ENVIRONMENT="LC_ALL=en_US.utf8" make check

The simplest is Test-iri-percent.px, so i added -d to the Wget command line 
and made two tests:

1. success with LC_ALL=C
$ cd tests
$ LC_ALL=C ./Test-iri-percent.px
#### snip ####
Loaded index.html (size 195).
URI encoding = 'ANSI_X3.4-1968'
index.html: merge('http://localhost:57052/', 
'http://localhost:57052/hello_%E7\351.html') -> 
http://localhost:57052/hello_%E7\351.html
Incomplete or invalid multibyte sequence encountered
appending 'http://localhost:57052/hello_%E7\351.html' to urlpos.
no-follow in index.html: 0
Deciding whether to enqueue "http://localhost:57052/hello_%E7�.html";.
Decided to load it.
URI encoding = 'ISO-8859-15'
Enqueuing http://localhost:57052/hello_%E7\351.html at depth 1
Queue count 1, maxcount 1.
[IRI Enqueuing 'http://localhost:57052/hello_%E7\351.html' with 'ISO-8859-15'
Dequeuing http://localhost:57052/hello_%E7\351.html at depth 1
Queue count 0, maxcount 1.
--2014-10-02 16:39:13--  http://localhost:57052/hello_%E7%C3%A9.html
Reusing existing connection to localhost:57052.
Reusing fd 4.

---request begin---
GET /hello_%E7%C3%A9.html HTTP/1.1
#### snap ####

But did you see "Incomplete or invalid multibyte sequence encountered" ? This 
indicates a wrong charset conversion though the test succeeds.


1. failure with LC_ALL=en_US.UTF-8
$ cd tests
$ LC_ALL=en_US.UTF-8 ./Test-iri-percent.px
#### snip ####
Loaded index.html (size 195).
URI encoding = ‘UTF-8’
index.html: merge(‘http://localhost:54675/’, 
‘http://localhost:54675/hello_%E7\351.html’) -> 
http://localhost:54675/hello_%E7\351.html
appending ‘http://localhost:54675/hello_%E7%E9.html’ to urlpos.
no-follow in index.html: 0
Deciding whether to enqueue "http://localhost:54675/hello_%E7%E9.html";.
Decided to load it.
URI encoding = ‘ISO-8859-15’
Enqueuing http://localhost:54675/hello_%E7%E9.html at depth 1
Queue count 1, maxcount 1.
[IRI Enqueuing ‘http://localhost:54675/hello_%E7%E9.html’ with ‘ISO-8859-15’
Dequeuing http://localhost:54675/hello_%E7%E9.html at depth 1
Queue count 0, maxcount 1.
--2014-10-02 16:37:16--  http://localhost:54675/hello_%E7%E9.html
Reusing existing connection to localhost:54675.
Reusing fd 4.

---request begin---
GET /hello_%E7%E9.html HTTP/1.1
...
---response begin---
HTTP/1.1 400 Bad Request
...
#### snap ####

The iso-8859-15 URL should be de-percented, translated into UTF-8 and percent-
encoded before putting it into the GET request line. Looks like this hasn't 
been done correctly.


I won't be much online the next three days, so maybe someone else could have a 
look at the Wget sources !?

Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]