[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] What ought to be a simple use of wget
From: |
Ander Juaristi |
Subject: |
Re: [Bug-wget] What ought to be a simple use of wget |
Date: |
Tue, 2 Aug 2016 19:43:19 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0 |
Hi Dale,
I'm seeing it always redirects to www.iana.org/protocols
Would -A protocols work for you?
e.g
wget ----mirror --convert-links --no-parent --page-requisites -A
protocols http://www.iana.org/protocols
On 02/08/16 18:38, Dale R. Worley wrote:
> I want to make a local copy of the "IANA protocol assignments" web
> pages. It seems to me that this ought to be a simple use of wget in
> recursive mode, and indeed, it seems like someone else must have run
> into this need before. But I can't get a combination of wget options
> that has the behavior I want.
>
> The goal is to make a local file tree that mirrors these URLs:
>
> http://www.iana.org/assignments/index.html
> (That page should be in a file named 'index.html'.)
>
> every HTML page under http://www.iana.org/assignments/ that can be
> reached from index.html
>
> page requisites for those pages, even if they aren't under
> http://www.iana.org/assignments/
>
> The interference comes from all the stuff under http://www.iana.org that
> is not under http://www.iana.org/assignments, but which is pointed to by
> the pages listed above.
>
> To resolve the simple problem, it appears that --page-requisites does
> fetch the page requisites, even if they aren't under
> http://www.iana.org/assignments/. So that part of the solution works
> fine.
>
> But I can't figure out the right combination of options to fetch the
> HTML files that I want:
>
>
> wget --mirror --convert-links --no-parent --page-requisites
> http://www.iana.org/assignments/index.html
> Follows links outside of /assignments/.
>
> wget --mirror --convert-links --exclude-directories=/ --page-requisites
> http://www.iana.org/assignments/index.html
> This doesn't recurse beyond index.html.
>
> wget --mirror --convert-links --no-parent --page-requisites
> http://www.iana.org/assignments
> Follows links outside of /assignments/.
>
> wget --mirror --convert-links --exclude-directories=/ --page-requisites
> http://www.iana.org/assignments
> This doesn't recurse beyond index.html.
>
> wget --mirror --convert-links --no-parent --page-requisites
> http://www.iana.org/assignments/
> This doesn't recurse beyond index.html.
>
> wget --mirror --convert-links --exclude-directories=/ --page-requisites
> http://www.iana.org/assignments/
> This doesn't recurse beyond index.html.
>
>
> I'm hoping that this is a known problem and someone can tell me the
> answer without having to think about it.
>
> I also think the documentation could be made clearer in some places, but
> that can wait.
>
> Dale
>
signature.asc
Description: OpenPGP digital signature