[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] hotlinked page requisites
From: |
Ben Fino-Radin |
Subject: |
[Bug-wget] hotlinked page requisites |
Date: |
Tue, 27 Mar 2012 13:18:54 -0400 |
Hello wget list,
A question that has been bugging me for quite some time…
If a site has a large amount of hotlinked images, videos, etc… how could
one perform an infinite recursive crawl that included hotlinked images,
etc, without invoking -H, which would grab unwanted material, and in some
cases get way out of control?
Heritrix has an option for this:
https://webarchive.jira.com/wiki/display/Heritrix/unexpected+offsite+content
Httrack has an option, using the --near flag:
http://www.httrack.com/html/fcguide.html
This is essentially the only thing preventing me from solely using wget for
my web archiving needs… am I missing something?
Thanks,
Ben
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] hotlinked page requisites,
Ben Fino-Radin <=