bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] improved Test-idn-robots.txt


From: Giuseppe Scrivano
Subject: Re: [Bug-wget] [PATCH] improved Test-idn-robots.txt
Date: Tue, 08 Oct 2013 15:07:51 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Tim Rühsen <address@hidden> writes:

> I added two links/urls to follow in index.html, now there are three in total.
> All three links/urls point to the same host, but have different host 
> encodings 
> (plain international text, punycoding, percent escaping).
>
> Wget should recognize these three codings as being the same and thus I 
> removed 
> the -H (host spanning) option to verify that.
>
> Now, Wget fails this test, I guess it needs a fix.
>
> Regards, Tim
>
> From 2e6f527121497b3b148496a9a9c774451d2e0017 Mon Sep 17 00:00:00 2001
> From: Tim Ruehsen <address@hidden>
> Date: Mon, 7 Oct 2013 23:37:42 +0200
> Subject: [PATCH] improved Test-idn-robots.px
>
> ---
>  tests/ChangeLog          |  5 +++++
>  tests/Test-idn-robots.px | 27 ++++++++++++++++++++++++++-
>  2 files changed, 31 insertions(+), 1 deletion(-)

thanks for your test.  The IRI support is a bit of a mess and I am not
sure how this issue should be fixed:

Should we check if the two domains are the same in recur.c (somewhere
near line 633)?  It means that  we will need to check there for
different encodings and convert among them.  Another solution would be
that append_url stores the url in a specific format.

Probably the latter solution allows us to also deal with page specific
locales when it is specified.

Have you already looked into this issue?  Do you have any
idea/suggestion?

Thanks,
Giuseppe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]