[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Prevent wget from redownloading when using, recursive opt
Re: [Bug-wget] Prevent wget from redownloading when using, recursive option?
Mon, 28 Dec 2009 13:05:45 -0500
Thunderbird 18.104.22.168 (Windows/20090812)
I have a similar issue. I'm using wget recursively as a link checking
spider. I don't save the files downloaded, so the -c and -N options
won't help me. What I'd love is if wget could keep a list of the links
it follows and not follow any link on the list. As it is, I download
250K links of which only 70K are unique.
I'm thinking this is a feature request, but if there's a way I can cut
down on the extra downloads today, I'd love to know it.
Here's the command I use:
wget --input-file=spider_pages.html --force-html --no-cache
--no-check-certificate --recursive --page-requisites --no-parent -e
"robots=off" --delete-after --no-directories --no-host-directories
Date: Sun, 27 Dec 2009 13:10:25 -0800
From: Micah Cowan <address@hidden>
Subject: Re: [Bug-wget] Prevent wget from redownloading when using
To: David <address@hidden>
Content-Type: text/plain; charset=ISO-8859-1
Is there a way to prevent wget from redownloading files it has already
downloaded when using the recursive -r option? I know that -c is used
when downloading a large file but I wasn't sure if it also could be
used to accomplish this. It seems like even if it was set not to
download files it would still have to check to make sure the file had
been completely downloaded. Right now it's hard for me to tell if this
is its behavior when using -rc as the individual files are small and
thus do not take long to download (I cannot tell if wget is actually
downloading the full file or just requesting the file's size from the
server and moving on upon seeing that the file is already complete.
I typically use -rc. -rN is also a possibility.
|[Prev in Thread]
||[Next in Thread]|
- Re: [Bug-wget] Prevent wget from redownloading when using, recursive option?,
Allan Spiegel <=