[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #48708] Wget downloads file but refuses to examine it fo
From: |
Dale Worley |
Subject: |
[Bug-wget] [bug #48708] Wget downloads file but refuses to examine it for links to follow |
Date: |
Fri, 5 Aug 2016 15:15:16 +0000 (UTC) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0 |
URL:
<http://savannah.gnu.org/bugs/?48708>
Summary: Wget downloads file but refuses to examine it for
links to follow
Project: GNU Wget
Submitted by: worley
Submitted on: Fri 05 Aug 2016 03:15:13 PM GMT
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: worley
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: 1.16.1
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
To demonstrate (including useful debugging output):
$ wget -d -r --include-directories=/assignments,/protocols
http://www.iana.org/protocols/index.html
Naively, I expect wget to download the index.html file and scan it for links
to recurse on.
The complication seems to arise from the fact that /protocols/index.html is
redirected to http://www.iana.org/protocols. That file is fetched, and stored
as www.iana.org/protocols/index.html, that is, the file name is based on the
original URL, not the redirected one.
However, wget does not examine the file for links to follow. Wget gives the
following messages:
Deciding whether to enqueue "http://www.iana.org/protocols".
http://www.iana.org/protocols () is excluded/not-included.
Decided NOT to load it.
Redirection "http://www.iana.org/protocols" failed the test.
This is unexpected; I expect that the file is treated consistently in regard
to (1) whether to download it, (2) what file name to store it in, and (3)
whether to examine it for links, in that all three decisions would be made
based on either the original URL or the ultimate redirected URL. (The
decision to use the original URL seems to be the correct choice to me.) But
wget's behavior is to make decision (3) based on the redirected name, not the
original name.
In addition, (as I read the documentation) wget will read all URLs that are
named on the command line, regardless of whether they meet the include/exclude
criteria, and so I expect that with -r, all those URLs would be scanned for
links. However it is clear that wget does not always scan provided URL for
links.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?48708>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] [bug #48708] Wget downloads file but refuses to examine it for links to follow,
Dale Worley <=