[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] What ought to be a simple use of wget
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] What ought to be a simple use of wget |
Date: |
Tue, 02 Aug 2016 20:15:45 +0200 |
User-agent: |
KMail/5.2.3 (Linux/4.6.0-1-amd64; KDE/5.23.0; x86_64; ; ) |
Hi Dale,
If you have a look at 'man wget'/--page-requisites, the stuff is explained
quite well. To me it looks like you are missing --level 2.
If --level 2 is not what you want. you could make your point clear by making
up a small document tree as an example.
Regards, Tim
On Dienstag, 2. August 2016 12:38:25 CEST Dale R. Worley wrote:
> I want to make a local copy of the "IANA protocol assignments" web
> pages. It seems to me that this ought to be a simple use of wget in
> recursive mode, and indeed, it seems like someone else must have run
> into this need before. But I can't get a combination of wget options
> that has the behavior I want.
>
> The goal is to make a local file tree that mirrors these URLs:
>
> http://www.iana.org/assignments/index.html
> (That page should be in a file named 'index.html'.)
>
> every HTML page under http://www.iana.org/assignments/ that can be
> reached from index.html
>
> page requisites for those pages, even if they aren't under
> http://www.iana.org/assignments/
>
> The interference comes from all the stuff under http://www.iana.org that
> is not under http://www.iana.org/assignments, but which is pointed to by
> the pages listed above.
>
> To resolve the simple problem, it appears that --page-requisites does
> fetch the page requisites, even if they aren't under
> http://www.iana.org/assignments/. So that part of the solution works
> fine.
>
> But I can't figure out the right combination of options to fetch the
> HTML files that I want:
>
>
> wget --mirror --convert-links --no-parent --page-requisites
> http://www.iana.org/assignments/index.html Follows links outside of
> /assignments/.
>
> wget --mirror --convert-links --exclude-directories=/ --page-requisites
> http://www.iana.org/assignments/index.html This doesn't recurse beyond
> index.html.
>
> wget --mirror --convert-links --no-parent --page-requisites
> http://www.iana.org/assignments Follows links outside of /assignments/.
>
> wget --mirror --convert-links --exclude-directories=/ --page-requisites
> http://www.iana.org/assignments This doesn't recurse beyond index.html.
>
> wget --mirror --convert-links --no-parent --page-requisites
> http://www.iana.org/assignments/ This doesn't recurse beyond index.html.
>
> wget --mirror --convert-links --exclude-directories=/ --page-requisites
> http://www.iana.org/assignments/ This doesn't recurse beyond index.html.
>
>
> I'm hoping that this is a known problem and someone can tell me the
> answer without having to think about it.
>
> I also think the documentation could be made clearer in some places, but
> that can wait.
>
> Dale
signature.asc
Description: This is a digitally signed message part.