bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget not stop when using -e robots=off option


From: Tim Ruehsen
Subject: Re: [Bug-wget] wget not stop when using -e robots=off option
Date: Wed, 30 Nov 2016 10:11:15 +0100
User-agent: KMail/5.2.3 (Linux/4.8.0-1-amd64; KDE/5.28.0; x86_64; ; )

On Sunday, November 27, 2016 5:40:09 PM CET Sethi Badhan wrote:
> Hello
> 
> when i try to run simply wget in for loop it works fine but when i try to
> run using -e robots=off it not stopping and it downloading pages
> recursively even i have set the limit for 'for ' loop it is not stoping
> after that limit here is my code
> 
> #!/bin/bash
> 
> lynx --dump  https://en.wikipedia.org/wiki/Cloud_computing |awk
> '/http/{print $2}'| grep https://en. | grep -v
> '.svg\|.png\|.jpg\|.pdf\|.JPG\|.php' >Pages.txt
> grep -vwE "(
> http://www.enterprisecioforum.com/en/blogs/gabriellowy/value-data-platform-s
> ervice-dpaas)" Pages.txt > newpage.txt
> rm Pages.txt
> egrep -v "#|$^" newpage.txt>try.txt
> awk '!a[$0]++' try.txt>new.txt
> rm newpage.txt
> rm try.txt
> mkdir -p htmlpagesnew
> cd htmlpagesnew
> j=0
> for i in $( cat ../new.txt );
> do
> if [ $j -lt 10 ];
> then
>     let j=j+1;
>     echo $j
>     wget  -N -nd -r $i -e robots=off --wait=.25 ;
> fi
> done

Maybe you don't want '-r' ?

robots=off circumvents the robots.txt exclusion list... so it might download 
much more (and thus perhaps 'never' stops). 

Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]