bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #45801] Allowing to configure HTML engine which links to


From: Tim Ruehsen
Subject: [Bug-wget] [bug #45801] Allowing to configure HTML engine which links to follow
Date: Tue, 03 Nov 2015 15:25:31 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.3.0

Follow-up Comment #1, bug #45801 (project wget):

There are --accept-regex and --reject-regex.

For your example below you could use
wget -e robots=off -r --regex-type=pcre --accept-regex
'(20151027/$|Scrolling_Survival_Turn_)' --reject-regex ";+"
http://replays.wesnoth.org/1.12/

1. --reject-regex ";+" skips these 'sorting' URLs
2. --accept-regex makes Wget just look into subdir 20151027 and from there
just download URLs containing 'Scrolling_Survival_Turn_'

Note that for --regex-type=pcre you need PCRE compiled in (just try it out),
else you could use POSIX regexes.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?45801>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]