[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #64714] --no-clobber not working with --mirror??
From: |
anonymous |
Subject: |
[bug #64714] --no-clobber not working with --mirror?? |
Date: |
Sun, 24 Sep 2023 12:17:57 -0400 (EDT) |
URL:
<https://savannah.gnu.org/bugs/?64714>
Summary: --no-clobber not working with --mirror??
Group: GNU Wget
Submitter: None
Submitted: Sun 24 Sep 2023 04:17:55 PM UTC
Category: None
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Release: trunk
Discussion Lock: Any
Operating System: GNU/Linux
Reproducibility: None
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: Sun 24 Sep 2023 04:17:55 PM UTC By: Anonymous
Hi, I'm not entirely certain that the behavior I'm seeing is a bug and not me
using it incorrectly. But it definitely is not intuitive.
I tried to mirror multiple sites into the same folder, as wanted to be able
to have them reference each other but get "deeper" at some of the referenced
pages and not as deep on others, so I thought I "just" delete the index.html
file of these pages and re-invoke wget with --mirror again to mirror that
webpage as well and write it into that place (so that the reference from the
one mirrored before would still work)
However, even though I specified --no-clobber, wget sometimes overwrote
already downloaded and adjusted webpages with a non adjusted version from the
server. It looks like this is some kind of recursion issue.
When pages are cross linked linke this:
a.com/index.html => Links to a.com/page2.html which has an out link to
b.com/page3.html the first invokation will download all three pages but no
out-links of b.com/page3.html (so the created page3.html file will have the
original links in it).
b.com/page3.html now has a backlink to a.com/index.html, the 1st invokation of
wget tasked to downlaod a.com/index.html doesn't care about this, and may even
correctly adjust the backlink to a.com/index.html. HOWEVER when the local copy
of "b.com/page3.html" is deleted and wget is invoked a 2nd time and tasked to
now only download "b.com/page3.html" potentially with different arguments
(like a specified recursion depth or with an option to not download any
out-links, or a domain restriction) it'll sometimes overwrite a.com/index.html
with a version that no longer has relative (adjusted) urls, but the original
non-adjusted ones, effectively breaking the local copy of the the page. Even
though "--no-clobber" was specified.
The full commands I used were:
* `wget --mirror --recursive=on --level=1 --convert-links --adjust-extension
--page-requisites --span-hosts --no-clobber -e robots=off`
* `wget --mirror --http-user=user --http-password=pass --recursive=on
--level=1 --convert-links --adjust-extension --page-requisites --span-hosts
--no-clobber -e robots=off`
* `wget --mirror --recursive=on --level=0 --convert-links --adjust-extension
--page-requisites --no-parent --no-clobber -e robots=off`
(All three with different target URLs, but each time I first deleted the local
index.html to get a "deeper" copy of that subtree of linked-web-pages)
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?64714>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #64714] --no-clobber not working with --mirror??,
anonymous <=