bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] What ought to be a simple use of wget


From: Dale R. Worley
Subject: [Bug-wget] What ought to be a simple use of wget
Date: Tue, 02 Aug 2016 12:38:25 -0400

I want to make a local copy of the "IANA protocol assignments" web
pages.  It seems to me that this ought to be a simple use of wget in
recursive mode, and indeed, it seems like someone else must have run
into this need before.  But I can't get a combination of wget options
that has the behavior I want.

The goal is to make a local file tree that mirrors these URLs:

    http://www.iana.org/assignments/index.html
    (That page should be in a file named 'index.html'.)

    every HTML page under http://www.iana.org/assignments/ that can be
    reached from index.html

    page requisites for those pages, even if they aren't under
    http://www.iana.org/assignments/

The interference comes from all the stuff under http://www.iana.org that
is not under http://www.iana.org/assignments, but which is pointed to by
the pages listed above.

To resolve the simple problem, it appears that --page-requisites does
fetch the page requisites, even if they aren't under
http://www.iana.org/assignments/.  So that part of the solution works
fine.

But I can't figure out the right combination of options to fetch the
HTML files that I want:


wget --mirror --convert-links --no-parent --page-requisites 
http://www.iana.org/assignments/index.html
Follows links outside of /assignments/.

wget --mirror --convert-links --exclude-directories=/ --page-requisites 
http://www.iana.org/assignments/index.html
This doesn't recurse beyond index.html.

wget --mirror --convert-links --no-parent --page-requisites 
http://www.iana.org/assignments
Follows links outside of /assignments/.

wget --mirror --convert-links --exclude-directories=/ --page-requisites 
http://www.iana.org/assignments
This doesn't recurse beyond index.html.

wget --mirror --convert-links --no-parent --page-requisites 
http://www.iana.org/assignments/
This doesn't recurse beyond index.html.

wget --mirror --convert-links --exclude-directories=/ --page-requisites 
http://www.iana.org/assignments/
This doesn't recurse beyond index.html.


I'm hoping that this is a known problem and someone can tell me the
answer without having to think about it.

I also think the documentation could be made clearer in some places, but
that can wait.

Dale



reply via email to

[Prev in Thread] Current Thread [Next in Thread]