[Bug-wget] hotlinked page requisites

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] hotlinked page requisites

From:	Ben Fino-Radin
Subject:	[Bug-wget] hotlinked page requisites
Date:	Tue, 27 Mar 2012 13:18:54 -0400

Hello wget list,

A question that has been bugging me for quite some time…

If a site has a large amount of hotlinked images, videos, etc… how could
one perform an infinite recursive crawl that included hotlinked images,
etc, without invoking -H, which would grab unwanted material, and in some
cases get way out of control?

Heritrix has an option for this:
https://webarchive.jira.com/wiki/display/Heritrix/unexpected+offsite+content

Httrack has an option, using the --near flag:
http://www.httrack.com/html/fcguide.html

This is essentially the only thing preventing me from solely using wget for
my web archiving needs… am I missing something?

Thanks,
Ben

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] hotlinked page requisites, Ben Fino-Radin <=

Prev by Date: [Bug-wget] patch to fix some types of warnings
Next by Date: [Bug-wget] 1.13.4 patched to support file size limitation
Previous by thread: [Bug-wget] patch to fix some types of warnings
Next by thread: [Bug-wget] 1.13.4 patched to support file size limitation
Index(es):
- Date
- Thread