[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Prevent wget from redownloading when using, recursive opt

From: Allan Spiegel
Subject: Re: [Bug-wget] Prevent wget from redownloading when using, recursive option?
Date: Mon, 28 Dec 2009 13:05:45 -0500
User-agent: Thunderbird (Windows/20090812)


I have a similar issue. I'm using wget recursively as a link checking spider. I don't save the files downloaded, so the -c and -N options won't help me. What I'd love is if wget could keep a list of the links it follows and not follow any link on the list. As it is, I download 250K links of which only 70K are unique. I'm thinking this is a feature request, but if there's a way I can cut down on the extra downloads today, I'd love to know it.

Here's the command I use:

wget --input-file=spider_pages.html --force-html --no-cache --no-check-certificate --recursive --page-requisites --no-parent -e "robots=off" --delete-after --no-directories --no-host-directories --no-verbose


Message: 1
Date: Sun, 27 Dec 2009 13:10:25 -0800
From: Micah Cowan <address@hidden>
Subject: Re: [Bug-wget] Prevent wget from redownloading when using
        recurise        option?
To: David <address@hidden>
Cc: address@hidden
Message-ID: <address@hidden>
Content-Type: text/plain; charset=ISO-8859-1

David wrote:
Is there a way to prevent wget from redownloading files it has already
downloaded when using the recursive -r option? I know that -c is used
when downloading a large file but I wasn't sure if it also could be
used to accomplish this. It seems like even if it was set not to
download files it would still have to check to make sure the file had
been completely downloaded. Right now it's hard for me to tell if this
is its behavior when using -rc as the individual files are small and
thus do not take long to download (I cannot tell if wget is actually
downloading the full file or just requesting the file's size from the
server and moving on upon seeing that the file is already complete.

I typically use -rc. -rN is also a possibility.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]