Hello,
I'm a long term user of wget, and I'm trying to make the switch to
wget2. I'm having a problem understanding what exactly is going on. It
appears that wget2 is getting files outside of what my regex-es allow,
but on closer inspection, the files don't exist on my FS.
Aside: I would attach the complete wget2 log output to this email, but
it's 27MB in size uncompressed and, even using xz, it still comes out to
1MB in size.
I'm uncertain what your particular email list recommends. Normally I
have to get special permission from the list admin.
If there's some fine documentation which explains all this, I haven't
found it, so feel free to point me to it.
Normally, you'd get HTTP response 200, or 404, or something, but wget2
says that it's 0. What does that mean?
When you check something, it's normally because you have it, but wget2
doesn't appear to have downloaded the files it then says that it's
checking (although I may have forgotten to retain them for the purpose
of this email).
So what does '[3] Checking $URL ...' mean?
When you add a URL, one would normally think that it's going to be
downloaded, but that doesn't appear to be the case with wget2. What does
"Adding URL: $URL" mean?
As you probably noticed, I'm rather confused. Here's a portion of
wget2's output followed by the command that I used.
Thanks,
David
#############################################################################
Adding URL:
https://web.archive.org/web/20220305001008js_/https:/americasfrontlinedoctors.org/_next/static/bBQU-7wbyVqBHhpUeRiRF/_middlewareManifest.js
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtr6Uw9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvr6Ew9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCs16Ew9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtr6Ew9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtZ6Ew9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCu170w9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCuM70w9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvr70w9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvC70w9.woff
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WRhyyTh89ZNpQ.woff2
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459W1hyyTh89ZNpQ.woff2
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WZhyyTh89ZNpQ.woff2
Adding URL:
https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WdhyyTh89ZNpQ.woff2
###################...#######################################################
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/modernwisdompodcast'
... [2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/IsaacArthur'
... HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/MLChristiansen]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/BlacktipH]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/TomAntosFilms'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/DonaldJTrumpJr]
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/KimIversen'
...
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/Homesteadonomics'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/MariaBartiromo]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/StevenCrowder]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/LifeStories]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/TheAdventureAgents'
...
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/NDWoodworkingArt'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/Styxhexenhammer666]
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/EarthTitan'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/NYPost]
HTTP response 0
[https://web.archive.org/web/20220130164746js_/https://www.americasfrontlinedoctors.org/_next/static/chunks/d0447323-9a7a3aa3a90e5cd2.js]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/KenDBerryMD'
...
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/HeresyFinancial'
...
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RepJimBanks'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/PageSix]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/SamuelEarpArtist]
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/HOTDANGSHOW'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/Decider]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/JohnStossel]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ATRestoration'
...
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ThisSouthernGirlCan'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/MikhailaPeterson]
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RockFeed'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/Locals]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/Timcast]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/CountryCast'
...
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ShaunAttwood'
...
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/diywife'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/CWLemoine]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/TimcastIRL]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/Entrepreneur]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RekietaLaw'
...
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/MontyFranklin'
...
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GeeksandGamers'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/TheBodyLanguageGuy]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/Yarnhub]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/DrDrew]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/SportsWars'
...
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/nfldaily'
...
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/nbanow'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/TulsiGabbard]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/MattKohrs]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GamingWithGeeks'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/HabibiPowerHour]
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/FactsChannel]
[3] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ParkHoppin'
...
[2] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GeeksAndGamersClips'
...
HTTP response 0
[https://web.archive.org/web/20220404154147/https:/rumble.com/c/phetasy]
[1] Checking
'https://web.archive.org/web/20220404154147/https:/rumble.com/c/chiefstv'
...
#############################################################################
The wget2 command is as follows. I had to wrap it.
wget2 -NEkrl9 -t 13 --regex-type=posix --timeout 45 --reject-regex
'http.*http.*http|\.html?.*\.html?.*\.html?|www\..*www\..*www\.|\{|url[^/]+query|data[^/]+\.(url|image)|
/activity|/members|/groups|/%5C$|xmlrpc\.php|/phpBB/|/socialauth/|/googlebooks/|xmlrpc\.php|/admin\.php|
/rsc/|/htsrv/|/skins/|/activate/|blogger.com/.*(profile|share-post|delete|comment|post-edit)|delete-comment.g|
from=|target=|/public\.api/|/mshots/v1/|/public.api/|/(remote-login|press-this|wp-signup|wp-login)\.php|
Translations:|Sandbox:|Template:|title=User_talk|/pricecompare|\?[rs]=[x0-9]+&|/likers|/following|new\?user=|
/discussion-|\?(resize|w|h)&|signin\?|/signup|/messages|/followers|/likesandfollows|/add\?|/destroy/|
/create\?|/Layout|/Selected_page|redirect=no|Template:|_talk:|User:|User_talk:|sign_up|sign_in|
\.img(\.(xz|gz|bz2))?|/secured_requests|/usenet/|/rss\.php|/design-tools|/supportLink|eesimUrl|
/reliabilityLink|/markets|/storefront.html|distributorData|/mymaxim|/samplecart|/comment-subscriptions|
/walkthroughs|like_comment=|screenToRender=|/UserAccount|/myprofile|layout=siteinfo|/Subscribe|\?cid=|
\+url\+|captcha|utm_(medium|source)=|bc(lid|tid)|pubdate|HQS|tid|eid|kcid|pid=|screenToView=login|
[^[:alnum:]]search/|companies/|directory/|cat/(news|reviews|previews-unboxing)|PrintView|contentItemId|
/[Aa]uth|comment_mail|replytocom=|[^[:alnum:]]search\?|amp$|\.(rss|atom|json)$|/maintenance|
/lib/exe/indexer.php|dataflt|datasrt|\.iso|(show|focused)Comment(Area|s|Id)|decoration|(bookmarks|
browsespace|changes|diffpages(byversion)?|listattachmentsforspace|login|peopledirectory|recentlyupdated|
replycomment|report|space-bookmarks|tinyurl|view(follow|info|mailarchive|page(attachments|src)|
previousversions|recentblogposts|spacesummary|userprofile))\.action|edit$|recentchanges|revisions|
/WantedPages|/forum|cgi-bin/|(do|sectok|mode|action|oldid|diff|showComment|share|replyto)=|Talk:|
Special:|wp-admin/|feed|login|/(EU|FR-FR|anp|ar|az|bg|bgn|bn|ca|cn|cs|da|de|de-de|diq|el|en-au|en-ca|
en-gb|en-sg|en-za|eo|es|es-co|es-mx|es-es|eu|fa|fr|fr-fr|he|hi|hr|hu|hy|ia|id|it|it-it|ja|jbo|jp|kk|ko|
lb|lt|map-bms|ml|mni|nb|ne|nl|nl-nl|no|oc|pa|pl|pl-pl|pt|pt-br|ro|ru|sco|sd|sl|si|sq|sr|sr-ec|ta|te|th|
tr|ua|udm|ug-arab|uk|ur|vi|zh|zh_CN|zh-cn|zh_cn|zh_tw)(:|$)'
--accept-regex
'(.*\.(css|gif|png|jpe?g)$|https?://web\.archive\.org/web/[^ *]+/
https?://?(i0.wp.com|i[0-9].wp.com|s[0-9].wp.com|([0-9]\.)?bp.blogspot.com|
www.blogger.com|www.blogblog.com|lh[0-9]\.googleusercontent.com|
fonts.googleapies.com|(ssl|www|fonts).gstatic.com|(www[0-9]*?\.)?americasfrontlinedoctors.org))'
https://web.archive.org/web/20220305001008/https://americasfrontlinedoctors.org/
|& tee -a 2wget.log