wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wget-dev] wget2 | -X does not work (#365)


From: Tim Rühsen
Subject: Re: [Wget-dev] wget2 | -X does not work (#365)
Date: Thu, 05 Jul 2018 13:27:01 +0000

Sorry, regex means regular expression (see 
https://en.wikipedia.org/wiki/Regular_expression).

Then let's apply shell directory name matching including wildcards. We should 
use `fnmatch`, which is backed up by gnulib (=portable). Best with FNM_EXTMATCH 
and FNM_PATHNAME.

**Combination of -I/-X:**

We append all arguments into one list (vector) by the order on the command 
line. Each list entry should have a flag for include/exclude. For each 
directory name to check we traverse the list from the beginning to the end. The 
last match decides whether we include or exclude.

The default value depends... If -I is given without -X, it means "default is 
exclude, so we only save files that matches the given -I options".

If -X is given line with -I, it means "default is include, so we only save 
files that matches the given -X options".

If -I and -X both are given, it could mean anything - it's ambigious. But we 
can simply define (and document) how we deal with it. E.g. -I comes first, the 
default is 'exclude'. If -X comes first, default is 'include'.

> will we download HTML from pub/worthless for scanning (but not saving) or not 
> ?
>
> What would be this useful for?

In your example, in `/pub/worthless/` there could be HTML files with links to 
`/pub/fine/` which we want to download.

And we should not apply -I/-X to user-provided URLs (given on the comand line 
or via --input-file).

WDYT ?

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/issues/365#note_86286242
You're receiving this email because of your account on gitlab.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]