[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] robots.txt not working
From: |
Micah Cowan |
Subject: |
Re: [Bug-wget] robots.txt not working |
Date: |
Fri, 16 Mar 2012 23:38:37 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120302 Thunderbird/11.0 |
I think you're misunderstanding what was supposed to happen.
The robots.txt file is only followed for links that wget is
automatically following. This means (a) wget has to be in
recursive-descent mode (-r or -m), and (b) it only applies to links that
weren't explicitly requested by the user. In other words, it applies
only to links that wget is actually robotting on.
Hope that helps.
-mjc
On 03/16/2012 01:04 PM, phil curb wrote:
> I just tried creating a web server locally.
> |I tried creating a web server locally putting robots.txt in there and using
> wget and it didn't work
>
>
>
> http://pastebin.com/raw.php?i=kt1mV2af
>
>
> C:\r>wget 127.0.0.1:56
> ....
> 2012-03-16 19:45:32 (20.0 KB/s) - `index.html' saved [3/3] C:\r>wget
> 127.0.0.1:56/robots.txt
> ....
> 2012-03-16 19:45:43 (175 KB/s) - `robots.txt' saved [26/26] C:\r>type
> robots.txt
> User-agent: *
> Disallow: /
> C:\r>