bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #60865] wget puts robots.txt and headers into the downloaded file -


From: Aleksej Serdyukov
Subject: [bug #60865] wget puts robots.txt and headers into the downloaded file --no-clobber -A without robots.txt
Date: Wed, 30 Jun 2021 19:46:56 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0

URL:
  <https://savannah.gnu.org/bugs/?60865>

                 Summary: wget puts robots.txt and headers into the downloaded
file --no-clobber -A without robots.txt
                 Project: GNU Wget
            Submitted by: watersky
            Submitted on: Ср 30 июн 2021 23:46:54
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
                 Release: trunk
         Discussion Lock: Any
        Operating System: GNU/Linux
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: No

    _______________________________________________________

Details:

--no-directories is for convenience only.


wget --no-clobber --page-requisites  --accept "101547:*" --no-directories
https://www.linux.org.ru/people/xaizek/profile


robots.txt ist downloaded into robots.txt.tmp and not deleted:


$ ls -1s
итого 2380
   8 101547:617472904.jpg
   4 robots.txt.tmp


Delete the image (only robots.txt.tmp will be left). Run again. Downloading
the image takes a long time, and the .jpg file has robots.txt and HTTP headers
included before the image file's data.

However, with "robots.txt" in --accept, robots.txt is saved directly as
robots.txt, and the image is OK:


./wget --no-clobber --page-requisites  --accept "101547:*,robots.txt"
--no-directories https://www.linux.org.ru/people/xaizek/profile






    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60865>

_______________________________________________
  Сообщение отправлено по Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]