[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
wget and --reject-regex
From: |
Frans de Boer |
Subject: |
wget and --reject-regex |
Date: |
Wed, 23 Dec 2020 13:48:17 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 |
LS,
I found that wget 1.20 and later do support some basic regular
expressions. I had good results with --accept=-regex but the reject part
is more troublesome. I can't use ERE's since only BRE's is supported
with the notion that the whole URL should be included.
I use wget to mirror some sites, but I do not want certain sub
directories included in the download. You can think of sub directories
named rpm, debug, temp etc.
Example:
wget -4 --mirror -nH -np --retr-symlinks=no --passive-ftp --no-verbose
--cut-dirs=1 --regex-type posix --reject-regex
"ftp\:\/\/mirror\.netcologne\.de\/savannah\/smc\/Screensaver\/" -P
./debugdir/nongnu ftp://mirror.netcologne.de/savannah/smc/
I tried this example with or without partial backslashes, but none is
working. I tried this also with a single file, to no avail too. I
understand that one can added multiple reject statements but would
rather use the ERE .*(dir1|dir2|dir3|...|dirx|(..ERE..)), but that is
rather cumbersome when I have to specify them by hand. I do have already
a ERE string ready and would like to use that instead. Breaking down
this string again into multiple reject statement might also not work if
I can't even reject one file or sub directory.
Is there a way to accomplish above without having to resort to loops and
sed as the filtering tool?
Regards, Frans.
--
A: Yes, just like that A: Ja, net zo
Q: Oh, Just like reading a book backwards Q: Oh, net als een boek
achterstevoren lezen
A: Because it upsets the natural flow of a story A: Omdat het de natuurlijke
gang uit het verhaal haalt
Q: Why is top-posting annoying? Q: Waarom is Top-posting zo
irritant?
- wget and --reject-regex,
Frans de Boer <=