[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #30999] wget should respect robots.txt directive crawl-d

From: Tim Ruehsen
Subject: [Bug-wget] [bug #30999] wget should respect robots.txt directive crawl-delay
Date: Thu, 09 Apr 2015 20:25:43 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.6.0

Follow-up Comment #6, bug #30999 (project wget):

Crawl-delay is host/domain specific. Thus a wget -r 'domain1 domain2 domain3'
can't simply wait 'crawl-delay' seconds after a download. We need some
specific logic when dequeing the next file. Also how comes --wait into play ?
The user might be able to override crawl-delay for domain1 but not for domain2
and domain3.

Today, web servers often allow for 50+ parallel connections from one client -
I really don't see the point in implementing crawl-delay.

I could change my mind if someone has a *real* good reason for it *and* comes
up with a good algorithm / patch to handle all corner cases.


Reply to this item at:


  Nachricht gesendet von/durch Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]