[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #60865] wget puts robots.txt and headers into the downloaded file -
From: |
Aleksej Serdyukov |
Subject: |
[bug #60865] wget puts robots.txt and headers into the downloaded file --no-clobber -A without robots.txt |
Date: |
Wed, 30 Jun 2021 19:46:56 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0 |
URL:
<https://savannah.gnu.org/bugs/?60865>
Summary: wget puts robots.txt and headers into the downloaded
file --no-clobber -A without robots.txt
Project: GNU Wget
Submitted by: watersky
Submitted on: Ср 30 июн 2021 23:46:54
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Release: trunk
Discussion Lock: Any
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: No
_______________________________________________________
Details:
--no-directories is for convenience only.
wget --no-clobber --page-requisites --accept "101547:*" --no-directories
https://www.linux.org.ru/people/xaizek/profile
robots.txt ist downloaded into robots.txt.tmp and not deleted:
$ ls -1s
итого 2380
8 101547:617472904.jpg
4 robots.txt.tmp
Delete the image (only robots.txt.tmp will be left). Run again. Downloading
the image takes a long time, and the .jpg file has robots.txt and HTTP headers
included before the image file's data.
However, with "robots.txt" in --accept, robots.txt is saved directly as
robots.txt, and the image is OK:
./wget --no-clobber --page-requisites --accept "101547:*,robots.txt"
--no-directories https://www.linux.org.ru/people/xaizek/profile
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60865>
_______________________________________________
Сообщение отправлено по Savannah
https://savannah.gnu.org/
- [bug #60865] wget puts robots.txt and headers into the downloaded file --no-clobber -A without robots.txt,
Aleksej Serdyukov <=