bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [RFC] Extend concurrency support


From: Jure Grabnar
Subject: [Bug-wget] [RFC] Extend concurrency support
Date: Tue, 20 May 2014 01:23:55 +0200

Hello,

for GSoC project I will do the following:
1. implement downloading one file through a mirror-list
2. implement downloading multiple files from multiple servers
3. fix Metalink support.

I'd like to get your opinions regarding implementation of the first
one, although I will soon RFC for the second one aswell.

1. Single file through a mirror-list 

a) Backend
A user would specify a number of threads N and a list of mirror servers.
A flowchart would look like this:

1) Go through mirrors and find first available server (available -
responds in < MAX_RETRIES retries).

2) Try to figure out file size with Content-Length header. If size is
unknown fallback to a single thread download. Would it be sensible to
allow user to specify file size with some switch?

3) The main thread maintains a pool of available servers. It spawns at
most N threads if N < M or at most M threads if M < N, where M is
number of available mirrors. Every thread downloads each own chunk from
each own mirror using current implementation of concurrent download
for Metalink. If some mirror becomes unavailable during download from
i-th thread, that threads terminates and notifies the main thread. The
main thread spawns a new thread from available mirrors; if none is
available at the moment, it waits until some mirror becomes available
(whenever some other thread finishes downloading its chunk).

It might occur that a mirror that was unavailable becomes available
during download. Such mirros should be added to the pool of available
mirrors. I was thinking about creating another thread that would
occasionaly "poke" unavailable servers and add them to the pool if they
respond. 

It might occur that when M < N and therefore M threads were spawned, a
fresh mirror is added to the pool (see previous paragraph). In this
case it's probably best to divide file into N pieces no matter what -
but only M threads will be active at the beginning. The newly added
server can be used to spawn another thread.

4) A file would be downloaded to a single temporary file as described
here: http://lists.gnu.org/archive/html/bug-wget/2014-05/msg00025.html
I'm still fixing the patch, because at least one memory corruption bug
is still lurking around which is yet to be found.

b) Front end
What would be a good way to specify mirror list? Specifying a switch
and listing all mirrors could be quite awkward. Should we introduce
some sort of a simple file format? 
I believe we should take into consideration number 2: downloading
multiple files from multiple servers. Do we want to apply different
switches (options) to different files?
What about if we want to combine 1. and 2.: multiple files from multiple
mirror list? The simplest way would be to use Metalink file for such
purpose but is it the most elegant?

All your suggestions are greatly appreciated.

Best Regards,

Jure Grabnar





reply via email to

[Prev in Thread] Current Thread [Next in Thread]