[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] What ought to be a simple use of wget
From: |
Dale R. Worley |
Subject: |
[Bug-wget] What ought to be a simple use of wget |
Date: |
Tue, 02 Aug 2016 12:38:25 -0400 |
I want to make a local copy of the "IANA protocol assignments" web
pages. It seems to me that this ought to be a simple use of wget in
recursive mode, and indeed, it seems like someone else must have run
into this need before. But I can't get a combination of wget options
that has the behavior I want.
The goal is to make a local file tree that mirrors these URLs:
http://www.iana.org/assignments/index.html
(That page should be in a file named 'index.html'.)
every HTML page under http://www.iana.org/assignments/ that can be
reached from index.html
page requisites for those pages, even if they aren't under
http://www.iana.org/assignments/
The interference comes from all the stuff under http://www.iana.org that
is not under http://www.iana.org/assignments, but which is pointed to by
the pages listed above.
To resolve the simple problem, it appears that --page-requisites does
fetch the page requisites, even if they aren't under
http://www.iana.org/assignments/. So that part of the solution works
fine.
But I can't figure out the right combination of options to fetch the
HTML files that I want:
wget --mirror --convert-links --no-parent --page-requisites
http://www.iana.org/assignments/index.html
Follows links outside of /assignments/.
wget --mirror --convert-links --exclude-directories=/ --page-requisites
http://www.iana.org/assignments/index.html
This doesn't recurse beyond index.html.
wget --mirror --convert-links --no-parent --page-requisites
http://www.iana.org/assignments
Follows links outside of /assignments/.
wget --mirror --convert-links --exclude-directories=/ --page-requisites
http://www.iana.org/assignments
This doesn't recurse beyond index.html.
wget --mirror --convert-links --no-parent --page-requisites
http://www.iana.org/assignments/
This doesn't recurse beyond index.html.
wget --mirror --convert-links --exclude-directories=/ --page-requisites
http://www.iana.org/assignments/
This doesn't recurse beyond index.html.
I'm hoping that this is a known problem and someone can tell me the
answer without having to think about it.
I also think the documentation could be made clearer in some places, but
that can wait.
Dale
- [Bug-wget] What ought to be a simple use of wget,
Dale R. Worley <=