[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Exclusion failures

From: Tim Rühsen
Subject: Re: Exclusion failures
Date: Mon, 5 Jul 2021 16:09:09 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 28.06.21 19:36, Roger Brooks wrote:
I am trying to use wget 1.19.1 to back up a club website.  Here is a reduced
version of my wget command, which only accesses the public parts of the

cd /volume1/Backup/
wget -EkKrNpH \
      --output-file=wget.log \
      --domains=imcz.club,sf.wildapricot.org \
      --exclude-domains=webmail.imcz.club \
      --ignore-case \
      --level=2 \
      --no-parent \
      --no-proxy \
      --random-wait \
      --reject=ashx,"overlay*" \
      --rejected-log=wget-rejected.log \
      --restrict-file-names=windows \
      --wait=1 \

Two of the exclusions in the command are failing:

1. -exclude-directories=Fonts, fonts
This is a workaround for wget’s creation of spurious font directories.  The
server has only one such directory, but the website’s backend platform (over
which I have no control) sometimes addresses it as “fonts” and sometimes as
I expected that the option "--ignore-case" in the absence of "--no-clobber"
would take care of this problem, but since the contents are static, I don’t
need to back it up regularly.  Despite the exclusion, wget still insists on
creating the following directories:
The resulting backup website does not find the fonts in the "_Conflict"
directories; they have to be copied into the "fonts" directory for the pages
in the mirrored site to display properly.

So the fonts/ directory is not automatically deleted by wget when it is empty. It was used for temporary files during the download. This is a known "issue", but since an empty directory doesn't eat too much space on a disk, it wasn't fixed yet (maybe nobody thought it is relevant). Wget2 doesn't have this issue.

I don't know where the *_Conflict/ directories are from. Seems like a server thing.

This is an attempt to prevent duplicate downloading of files. The following
file is downloaded, even though https://regex101.com says that it matches my
It is effectively a duplicate of:
Increasing "--level" produces additional examples.

Why should '@CalendarView' match 'calendar[@/?]' ?
Maybe your regex should be '[@\?]calendar.*' !?

Regards, Tim

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]