bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wget recursive option not working correctly with scheme relative URL


From: Tim Rühsen
Subject: Re: Wget recursive option not working correctly with scheme relative URLs
Date: Sat, 1 Jul 2023 18:22:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0

Hey Jan,

On 7/1/23 15:16, Jan Bidler via Primary discussion list for GNU Wget wrote:
Hello,
I have part of a website (`example.com/index.html`) I want to mirror which 
contains scheme relative URLs (`//otherexample.com/image.png`). Trying to 
download these with the -r flag, results in wget converting them to a wrong URL 
(`example.com//otherexample.com`).

So using
`wget -r example.com/index.html`
Will cause links with 
`https://example.com/index.html\/\/otherexample.com\/image.png` in the output
Using the debug flag reveals this:
`merge(»example.com/index.html «, » //otherexample.com/image.png«) -> 
https://example.com/index.html\/\/otherexample.com\/image.png 
[`](https://example.com/index.html//otherexample.com/image.png`)

This is unexpected since these kind of links are relatively common and so far nobody complaint about it.

I just added a new test function for uri_merge(), the function that does this job. It has no issue to merge a relative URL like '//otherexample.com/image.png'.

So is it possible to share a real world wget command line to reproduce the issue ?

Regards, Tim

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]