bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Not sure if this list for wget2?? Have an issue after recent updates.


From: Michael D. Setzer II
Subject: Not sure if this list for wget2?? Have an issue after recent updates.
Date: Sun, 08 Oct 2023 19:08:12 +1000

I have used wget2 to download 69 to 70 pages from a University
College Campus directory. The process has worked with no
problems for many years and reduced time to about 25 seconds,

But know I get errors if I set it to more than 32 threads.
wget2  --max-threads=32 --secure-protocol=PFS
--base="https://www.uog.edu/"; -i testlistuog

works fine
testlistuog contains
directory/?page=01
directory/?page=02
...
...
directory/?page=68
directory/?page=69

Know the wget2 recently was updated in the Fedora 38 repo,
GNU Wget2 2.1.0 - multithreaded metalink/file/website
downloader

+digest +https +ssl/gnutls +ipv6 +iri +large-file +nls -ntlm -opie
+psl -hsts +iconv +idn2 +zlib -lzma +brotlidec +zstd -bzip2 -lzip
+http2 +gpgme

Don't know if that change did something with threads? or perhaps
some other update?

I had found that the windows version of wget2 did not work well
with threads so have it run with threads set to 1.
Time with windows to download is:
Time to Download Campus Directory 154.332887 Seconds

The linux version with 32 threads now takes.
Time to Download Campus Directory 138.430772 Seconds
While previously it was running about 25 seconds with 70 threads?

Origainal lines in program
Call to get page 1 to find total number of pages in directory.
    system("wget2 --restrict-file-names=windows --secure-protocol=PFS -q
\"https://www.uog.edu/directory/?page=01\"";);

Creates the testlistuog file with ?page=01 to ?page=lastpage number

Call with linux (Runs the wget in backgroud and loop to display with downloads
    system("wget2 --restrict-file-names=windows --max-threads=70 
--secure-protocol=PFS -q
--base=\"https://www.uog.edu/directory/\"; -i testlistuog 2>error & PID=$! ; 
printf '[' ; while ps hp $PID
>/dev/null ; do  printf  '▓'; sleep 1 ; done ; printf '] done!\n'");
This produces individual files for each page, and then combines them into one 
allraw.uog when done.

With windows it uses single thread and downloads pages 1 to last and sends 
output to allraw.uog.
    system("wget2 --max-threads=1 --restrict-file-names=windows 
--secure-protocol=PFS
--progress=none --base=\"https://www.uog.edu/directory/\"; -O \"allraw.uog\" -i 
testlistuog");

Run wget2 commands outside cpp program to make sure it wasn't that causing 
issue.

Going from 25 seconds to 138 isn't a huge problem, but seeing the change in how 
the program is
working is concerning.

Perhaps a change in max number of threads was done, or perhaps some other 
update in Fedora or
within kernels? 6.5.5-200.fc38.x86_64







+------------------------------------------------------------+
 Michael D. Setzer II - Computer Science Instructor (Retired)
 mailto:mikes@guam.net
 mailto:msetzerii@gmail.com
 Guam - Where America's Day Begins
 G4L Disk Imaging Project maintainer
 http://sourceforge.net/projects/g4l/
+------------------------------------------------------------+




reply via email to

[Prev in Thread] Current Thread [Next in Thread]