bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] robots.txt seemingly ignored


From: Daniel Feenberg
Subject: Re: [Bug-wget] robots.txt seemingly ignored
Date: Tue, 15 May 2018 15:11:24 -0400

Thank you. Updating to 1.19 fixed the problem. The version 1.12 came from
the Scientific Linux v6 repository. I didn't realize it was so old.
Installing 1.19 was easy - just configure;make;make install

Thanks again.
Daniel Feenberg

On Tue, May 15, 2018 at 5:34 AM, Darshit Shah <address@hidden> wrote:

> Hi,
>
> You are using a very old version of Wget.  v1.12 was released in 2009 if I
> remember correctly.
>
> The current version of Wget doesn't seem to have any issues with the
> parsing of
> that robots.txt. I just tried it locally and it downloads no files at all.
>
> Please update your version of Wget.
>
> * Daniel Feenberg <address@hidden> [180514 16:51]:
> >
> > I have the following wget command line:
> >
> >    wget -r  http://wwwdev.nber.org/
> >
> > http://wwwdev.nber.org/robots.txt  is:
> >
> >   User-agent: *
> >   Disallow: /
> >
> >   User-Agent: W3C-checklink
> >   Disallow:
> >
> >
> > However wget fetches thousands of pages from wwwdev.nber.org. I would
> have
> > thought nothing would be found. (This is a demonstration, obviously in
> real
> > life I'd have a more detailed robots.txt to control the process).
> >
> > Obviously too, I don't understand something about wget or robots.txt. Can
> > anyone help me out?
> >
> > This is GNU Wget 1.12 built on linux-gnu.
> >
> > Thank you
> > Daniel Feenberg
> >
>
> --
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]