[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #50320] Bad link conversion with mixed HTTP/HTTPS conten
From: |
anonymous |
Subject: |
[Bug-wget] [bug #50320] Bad link conversion with mixed HTTP/HTTPS content plus --mirror --adjust-extension |
Date: |
Wed, 15 Feb 2017 13:08:55 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 |
URL:
<http://savannah.gnu.org/bugs/?50320>
Summary: Bad link conversion with mixed HTTP/HTTPS content
plus --mirror --adjust-extension
Project: GNU Wget
Submitted by: None
Submitted on: Wed 15 Feb 2017 06:08:54 PM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: Thomas Claveirole
Originator Email: address@hidden
Open/Closed: Open
Discussion Lock: Any
Release: trunk
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
Hello,
When I setup a local web server to provide :
<!DOCTYPE html>
<html>
<head>
<title>Wget test</title>
</head>
<body>
<script src="http://localhost/wget-test/script.js?foo=bar"></script>
<script src="https://localhost/wget-test/script.js?foo=bar"></script>
</body>
</html>
when requesting /wget-test/, either as HTTP or HTTPS, as well as a
/wget-test/script.js resource (regardless of the scheme and query string; the
content of this file is irrelevant).
Then,
wget --mirror --adjust-extension --convert-links http://localhost/wget-test/
rewrites the script links as follows:
<!DOCTYPE html>
<html>
<head>
<title>Wget test</title>
</head>
<body>
<script src="script.js%3Ffoo=bar"></script>
<script src="script.js%3Ffoo=bar.html"></script>
</body>
</html>
Note that the second link has an incorrect .html suffix appended. On the
filesystem, the downloaded file does not have this suffix, so the link is
broken. I guess the correct behavior should be not to append the .html
suffix, but I am unsure whether two URLs that differ only in scheme (http://
vs. https://) should be considered the same resource and rewritten to point to
the same location.
(This test case was derived from trying to mirror a much bigger site and it
took me some time to pinpoint the issue. The bug also arises when multiple
pages from the website link to the same resource using mixed http and https
schemes -- which is a more realistic scenario.)
Looking at the bug tracker, I get the feeling that this bug might be related
to #50173 and #25340, but this is unclear to me.
Find attached a debug log for :
wget -o wget.log --debug --no-check-certificate --mirror --adjust-extension
--convert-links http://localhost/wget-test/
with my setup.
Regards,
Thomas Claveirole
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Wed 15 Feb 2017 06:08:54 PM UTC Name: wget.log Size: 9kB By: None
<http://savannah.gnu.org/bugs/download.php?file_id=39762>
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?50320>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] [bug #50320] Bad link conversion with mixed HTTP/HTTPS content plus --mirror --adjust-extension,
anonymous <=