bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] hotlinked page requisites


From: Ben Fino-Radin
Subject: [Bug-wget] hotlinked page requisites
Date: Tue, 27 Mar 2012 13:18:54 -0400

Hello wget list,

A question that has been bugging me for quite some time…

If a site has a large amount of hotlinked images, videos, etc… how could
one perform an infinite recursive crawl that included hotlinked images,
etc, without invoking -H, which would grab unwanted material, and in some
cases get way out of control?

Heritrix has an option for this:
https://webarchive.jira.com/wiki/display/Heritrix/unexpected+offsite+content

Httrack has an option, using the --near flag:
http://www.httrack.com/html/fcguide.html

This is essentially the only thing preventing me from solely using wget for
my web archiving needs… am I missing something?

Thanks,
Ben


reply via email to

[Prev in Thread] Current Thread [Next in Thread]