[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Behaviour of spanning to accepted domains
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] Behaviour of spanning to accepted domains |
Date: |
Fri, 05 Jun 2015 22:24:16 +0200 |
User-agent: |
KMail/4.14.2 (Linux/4.0.0-1-amd64; KDE/4.14.2; x86_64; ; ) |
Am Freitag, 5. Juni 2015, 08:01:03 schrieb Tony Lewis:
> On June 03, 2015 Tim Ruehsen wrote:
> > This has already been fixed to:
> >
> > "Set domains to be followed. domain-list is a comma-separated list of
> > domains. Note that it does not turn on -H."
> First, I have not dug into the source code to see how -H is implemented.
> However, it makes sense to me that one ought to be able to specify both -H
> and -D together.
Hi Tony,
-H (=all domains)
to exclude some sites use --exclude-domains domain-list
>
> Consider this scenario: I want to mirror a site including the images that
> are stored in a sub-domain, but I don't want to mirror every external site
> referenced by the site. So I would try this:
>
> wget --mirror http://www.somesite.com -H -D www.somesite.com
> images.somesite.com
>
> Which should get all the files from www.somesite.com and images.somesite.com
> without getting files from www.relatedsite.com.
You can also play with:
-A acclist --accept acclist
-R rejlist --reject rejlist
Specify comma-separated lists of file name suffixes or patterns to
accept or reject. Note that if any of the wildcard characters, *, ?, [ or ],
appear in an element of acclist or rejlist, it will be treated as a
pattern, rather than a suffix. In this case, you have to enclose the pattern
into quotes to prevent your shell from expanding it, like in -A
"*.mp3" or -A '*.mp3'.
--accept-regex urlregex
--reject-regex urlregex
Specify a regular expression to accept or reject the complete URL.
Regards, Tim
signature.asc
Description: This is a digitally signed message part.