[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Architecture to reduce download time when pulling multiple packages

From: Josh Marshall
Subject: Re: Architecture to reduce download time when pulling multiple packages – historic success with magnet URLs, BTIHs, & Aria2c!
Date: Tue, 17 Oct 2023 21:44:56 -0400

How long is traditional before I can bump a thread?

On Sun, Oct 15, 2023 at 2:21 PM Josh Marshall
<> wrote:
> So it sounds like my first steps are to re-implement the downloads
> using aria2c.  This would affect the minimum base package, no?  Can I
> get some buy-in from maintainers that such changes are acceptable?
> On Fri, Oct 13, 2023 at 2:06 PM James R. Haigh (+ML.GNU.Guix
> subaddress) <> wrote:
> >
> > Hi Josh,
> >
> > At Z-0400=2023-10-13Fri12:36:01, Josh Marshall sent:
> > > This is to parallelize connections which should never hurt downloading 
> > > but can help.  Mirroring would be parallelizing for providing packages, 
> > > what I want to implement is to parallelize obtaining packages.  Server 
> > > side vs client side.
> >
> >         Please, if you are going to do something like this, please use a 
> > torrent architecture like BitTorrent or GNUnet – I suggest Aria2c as a very 
> > good CLI download backend that can be daemonised and sent instructions over 
> > a socket to add, pause, remove downloads, etc., and it supports magnet URLs 
> > including the existing nontorrent servers (via ‘as’ parameters, iirc.).
> >
> >         I actually implemented this in a local copy of APT Daemon many 
> > years ago (circa 2011), but the change was not accepted upstream to 
> > Launchpad (because I was not on bleeding-edge; I was too slow to keep-up 
> > with the upstream development).  My fork got forgotten about, because to 
> > get the full benefit the server would have had to have added a BitTorrent 
> > Info Hash (BTIH) to the metadata of each package, along with the MD5, 
> > SHA-256, etc. that it already did (not a big ask, really).  That said, 
> > without the full benefit of having the metadata, it did provide immediate 
> > benefit and I used it for many years, not upgrading my Ubuntu 11.04 Natty 
> > Narwhal that I was using back then until I really had to.
> >
> >         The immediate benefit that it provided was exactly as you 
> > described: It allowed parallelisation of nontorrent downloads, be it from 
> > the same server or from multiple mirrors.  Iirc., I achieved this by simply 
> > passing the download list to Aria2c in daemon mode, I think I also 
> > converted all the HTTP URLs to ‘as’ parameters in magnet links, so that 
> > multiple mirrors could be passed using multiple ‘as’ parameters in each 
> > magnet link.  Then I simply relied on Aria2c being amazing at parallelising 
> > everything that I had given it!  I then also implemented progress updates 
> > such that APT Daemon could reflect where Aria2c was up to.
> >
> >         The way I implemented this using Aria2c and magnet URLs meant that 
> > if additional hashes were known, they could be used as well, and so if the 
> > server metadata made the simple addition of adding BTIHs, it allows 
> > swarming to occur, which in-turn would massively reduce load on the central 
> > servers, and allow anyone who want to be a mirror to be a mirror simply by 
> > seeding indefinitely.  A default share ratio of 1.0 means that no user is a 
> > burden on the network, unless they deliberately change that.  Users can 
> > donate to the running costs of the project simply by increasing their share 
> > ratio, which adds another means of contribution that they may find easier 
> > than the others.
> >
> >         Anyone keen to keep old packages online can simply seed them 
> > indefinitely, so this is also really great for archival purposes.  Even if 
> > the central project loses interest in the old packages and deletes them, 
> > anyone else can keep them up.  The hashes ensure that they have not been 
> > tampered with.
> >
> >         There is also a really cool benefit that occurs, or can occur, on a 
> > LAN.  An entire network of computers can all swarm locally with each other, 
> > thus needing each package to only need downloading through the metered last 
> > mile bottleneck from the WAN precisely once – providing that local 
> > broadcasting is supported.  I think this requires Avahi, and I seem to 
> > remember that Aria2c supports this but I can't remember.  I don't ever 
> > remember getting this bit working but also I did not try hard because it 
> > would have required the metadata that I didn't have until after download, 
> > so even if I got it working it would not have been directly useful unless 
> > the APT repositories that I was using would include the BTIHs.
> >
> >         So yeah, loads of great benefits to this architecture, and I 
> > highly-recommend it: convert all existing URLs to magnet links (can be done 
> > client-side as I did; or server-side); optionally add any additional 
> > mirrors as additional ‘as’ parameters (again client-side or server-side); 
> > add ‘btih’ parameters to the magnet links (the BTIH must be included in the 
> > server metadata to get the full benefit of the swarming, but conversion to 
> > magnet link format can be done client-side or server-side); then simply 
> > pass all this to a really good parallelising backend such as Aria2c; then 
> > update any progress data and relay pause, resume, cancel, etc. to the 
> > backend.
> >
> >         One final note, as I am sure that there are a lot of GNUnet fans on 
> > this list, is that I would try Aria2c first to see how well it can work, 
> > and then try GNUnet or whatever else once you have a standard to benchmark 
> > against.  Both are Free Software, so no concern there.  Aria2c is an 
> > all-round download manager CLI that works with or without swarming, i.e. it 
> > is just as good at HTTPS as it is BitTorrent, and can do both at the same 
> > time.  GNUnet has the advantage of working from SHA-256 iirc., which is 
> > generally already included in the metadata of the repositories of various 
> > distributions, but I think it lacks a lot of other features and stability 
> > and ecosystem of alternative backends, compared to the BitTorrent network.
> >
> >         Of course, there is no harm in including other hashes along with 
> > BTIH, to allow people to experiment with alternative backends, while always 
> > ensuring that what works works well.  Another hash that may be useful to 
> > include is the Tiger Tree Hash, which is structurally very similar to BTIH, 
> > but stronger, iirc..
> >
> >         The first thing that the Guix project can do to signal interest in 
> > this architecture is to simply include the BTIH of each package in the 
> > repository metadata.  Be it in magnet URL form or not does not matter 
> > because the client can later convert that as needed.  The important thing 
> > is an authoritative statement in metadata that this version of this package 
> > has this BTIH.  Once that metadata is available, the game is on to 
> > implement swarming support, be it with Aria2c as a backend (as I recommend 
> > at least starting with) or otherwise.
> >
> >         I know that this architecture works well out of first-hand 
> > experience with APT Daemon written in Python.  The only failure I had with 
> > it was lack of upstream support.  So I consider it important to first 
> > attain the upstream approval before really investing more time into this.  
> > I seem to remember suggesting this to the Nix project many years ago and 
> > didn't get anywhere, and now I don't have the energy to try to improve 
> > upstream projects if they reject my ideas, so I'll be interested to see 
> > whether you have any success with your attempt to do the same.
> >
> >         Good luck! ;-)
> >
> > Kind regards,
> > James.
> > --
> > Wealth doesn't bring happiness, but poverty brings sadness.
> > Sent from Debian with Claws Mail, using email subaddressing as an 
> > alternative to error-prone heuristical spam filtering.
> > Postal: James R. Haigh, Middle Farm, Vennington, nr. Westbury, nr. 
> > Shrewsbury, Salop, SY5 9RG, Britain

reply via email to

[Prev in Thread] Current Thread [Next in Thread]