bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Bug in WGET?


From: Patrick Steil
Subject: Re: [Bug-wget] Bug in WGET?
Date: Sat, 23 Jul 2011 16:26:57 -0500

Thanks!

I will try this when I get time...

Here is a note... I am using the --spider option now and it looks like it
also downloads and saves the file to disk and then removes it when it is
done... I don't mind on that, but it doesn't match the documentation...

Also, if I use wget in spider mode, it will at the end of the log output
tell me about all the broken links... but I also need to know what page
those broken links are created on (if the broken link) is on the site I am
getting... this will help me find the 404 on my site...

I have a vision for how this should work to make it awesome...

Any way to do that, or anyone want to add this functionality?

Thanks!


On Sat, Jul 23, 2011 at 7:12 AM, Giuseppe Scrivano <address@hidden>wrote:

> Hello,
>
> Patrick Steil <address@hidden> writes:
>
> > If I run this command:
> >
> > wget www.domain.org/news?page=1 options= -r --no-clobber
> --html-extension
> > --convert-links -np --include-directories=news
>
> > Here is what it does today:
> >
> > 1.  When --html-extension is turned on, the --noclobber is not changing
> the
> > name of the downloaded files, but it DOES rewrite the file as the
> date/time
> > stamp changes every time I run the above command.
>
> I couldn't reproduce it.  I have `strace'd but I can't see any syscall
> which could modify the time stamp.  Can you please attach the strace
> and the wget debug log?  You can get it by:
>
> strace -o strace.log wget <args> -d -o wget.log
>
>
>
> > 2.  If I turn off --html-extension, then as soon as WGET sees that the
> first
> > file has already been downloaded it stops and does not continue to
> > spider/download any further pages.
>
> AFAICS, the behaviour you get using --no-clobber and -r is documented,
> and it should work exactly as you described it (a newer version is
> ignored).  The old version is still traversed for links.
>
> Cheers,
> Giuseppe
>



-- 

**

*Patrick Steil  |  ChurchBuzz.org*

Church Website Optimization <http://www.churchbuzz.org/>
Like us on Facebook <http://facebook.com/churchbuzz>!

Mobile: 940-391-9250


reply via email to

[Prev in Thread] Current Thread [Next in Thread]