[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] robots.txt seemingly ignored

From: Darshit Shah
Subject: Re: [Bug-wget] robots.txt seemingly ignored
Date: Tue, 15 May 2018 11:34:33 +0200
User-agent: NeoMutt/20180323


You are using a very old version of Wget.  v1.12 was released in 2009 if I
remember correctly. 

The current version of Wget doesn't seem to have any issues with the parsing of
that robots.txt. I just tried it locally and it downloads no files at all.

Please update your version of Wget.

* Daniel Feenberg <address@hidden> [180514 16:51]:
> I have the following wget command line:
>    wget -r  http://wwwdev.nber.org/
> http://wwwdev.nber.org/robots.txt  is:
>   User-agent: *
>   Disallow: /
>   User-Agent: W3C-checklink
>   Disallow:
> However wget fetches thousands of pages from wwwdev.nber.org. I would have
> thought nothing would be found. (This is a demonstration, obviously in real
> life I'd have a more detailed robots.txt to control the process).
> Obviously too, I don't understand something about wget or robots.txt. Can
> anyone help me out?
> This is GNU Wget 1.12 built on linux-gnu.
> Thank you
> Daniel Feenberg

Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]