[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] What ought to be a simple use of wget
From: |
Dale R. Worley |
Subject: |
Re: [Bug-wget] What ought to be a simple use of wget |
Date: |
Wed, 03 Aug 2016 11:46:22 -0400 |
Matthew White <address@hidden> writes:
> wget --recursive \
> --page-requisites \
> --convert-links \
> --domains="www.iana.org" \
> --reject "robots.txt","reports","contact" \
>
> --exclude-directories="/go,/assignments,/_img,/_js,/_css,/domains,/performance,/about,/protocols,/procedures,/dnssec,/reports,/help,/abuse,/numbers,/reviews,/time-zones,/2000,/2001"
> \
> http://www.iana.org/assignments/index.html
True, using --exclude-directories I can isolate what I want, but as you
note, that requires actually knowing all of the children of the root in
advance. Whereas it seems to me that there should be a straightforward
way of instructing wget to exclude "everything but X".
> wget --recursive \
> --no-clobber \
> --page-requisites \
> --adjust-extension \
> --convert-links \
> --span-hosts \
> --domains="www.iana.org" \
> http://www.iana.org/assignments/index.html
As you said, that command returned lots of things that aren't in
http://www.iana.org/assignments.
Dale