bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] can't reject robots.txt in recursive mode


From: Ilya Basin
Subject: [Bug-wget] can't reject robots.txt in recursive mode
Date: Wed, 6 Aug 2014 12:52:05 +0400

Here's my script to download IBM javadocs:

(
    rm -rf wget-test
    mkdir wget-test
    cd wget-test
    
starturl="http://www-01.ibm.com/support/knowledgecenter/api/content/SSZLC2_7.0.0/com.ibm.commerce.api.doc/allclasses-noframe.html";
    wget -d -r -R robots.txt --page-requisites -nH --cut-dirs=5 --no-parent 
"$starturl" 2>&1 | tee wget.log
)

regardless of '-R' option, wget downloads robots.txt and refuses to
follow links starting with "/support/knowledgecenter/api/".

Workaround:

    touch robots.txt
    chmod 400 robots.txt

GNU Wget 1.15 built on linux-gnu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]