bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [RFC] Extend concurrency support


From: Tim Ruehsen
Subject: Re: [Bug-wget] [RFC] Extend concurrency support
Date: Tue, 20 May 2014 16:10:49 +0200
User-agent: KMail/4.12.4 (Linux/3.14-1-amd64; KDE/4.13.1; x86_64; ; )

On Tuesday 20 May 2014 12:56:48 Giuseppe Scrivano wrote:
> Tim Ruehsen <address@hidden> writes:
> > most of this is already solved in https://github.com/rockdaboot/mget which
> > was originally thought as a 'modern' Wget. I would like to see Mget and
> > Wget merge into something like 'Wget2'. At least, feel free to move code
> > from Mget into Wget as you wish (I am the author and copyright holder of
> > Mget, both projects have the same license).
> 
> I'm afraid that Jure can't copy any existing code for his Summer of Code
> of project but reinvent the wheel if needed...

I already was in fear of this.

> > History...
> > I have been at the same point as you some years ago. And after looking at
> > Wget I found Wget's code has to be redesigned. I had two choices:
> > struggling with grown code or restart from scratch. I did the second
> > because I didn't see a chance to get huge code changes into Wget. Either
> > you have to discuss every little change or you end up with your own code
> > branch, which might become integrated into master during the next few
> > years.
> > 
> > It has been asked many times and I do it again: shouldn't we start with
> > Wget2 development, maybe having Jure as "project leader" (if you want). I
> > made a start with Mget (e.g. consequently putting reusable code into a
> > library)... and I would spend some time helping to merge Mget and Wget.
> > Due to the library based character of Mget, I shouldn't be too hard.
> 
> ...but on the long term we can avoid that task and re-use existing
> wheels.  Not sure what other people think about it, but I think wget2,
> whatever it will be, should be based on libcurl and focus the wget
> development on what wget does better, eg recursive downloads.

Libcurl is one option (and not the worst). At least it would replace the HTTP 
and FTP send and receive (plus the underlying TCP network handling - what 
about DNS caching ?). This is just a small amount of Wget's code to replace.

You still need (just to name a few):
- basic algorithms like hashmaps (e.g. stringmaps), vectors, lists / queues, 
buffers / growables, etc.
- a HTML/XML scanner for Html, Metalink, sitemaps, atom / rss feeds
- a CSS scanner
- file hashing
- locale en/decoding for filenames (IDNA)
- HSTS functions
- Cookie logic (incl. public suffix handling)
- robots.txt handling
- threading abstraction API
- a threading model incl. communication

Mget / libmget has all of these incl. tests. And the functionality is strong 
leaning onto what a tool like Wget needs. Libmget is modular in such a way 
that it would be easily possible to replace the HTTP/HTTPS and networking with 
libcurl.
I assume that one of Wget2's most interesting features will be a library that 
gives third parties easy access to Wget's functionality. So why not merge 
existing code with libmget - most of it would be Windows and VMS compatibility 
code.

> Daniel will probably agree with me :-)
Definitely he will.

Regards, Tim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]