bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Async webcrawling


From: James Read
Subject: Re: [Bug-wget] Async webcrawling
Date: Tue, 31 Jul 2018 19:17:22 +0100

Thanks,

as I understand it though there is only so much you can do with threading.
For more scalable solutions you need to go with async programming
techniques. See http://www.kegel.com/c10k.html for a summary of the
problem. I want to do large scale webcrawling and am not sure if wget2 is
up to the job.

On Tue, Jul 31, 2018 at 6:22 PM, Tim Rühsen <address@hidden> wrote:

> On 31.07.2018 18:39, James Read wrote:
> > Hi,
> >
> > how much work would it take to convert wget into a fully fledged
> > asynchronous webcrawler?
> >
> > I was thinking something like using select. Ideally, I want to be able to
> > supply wget with a list of starting point URLs and then for wget to crawl
> > the web from those starting points in an asynchronous fashion.
> >
> > James
> >
>
> Just use wget2. It is already packaged in Debian sid.
> To build from git source, see https://gitlab.com/gnuwget/wget2.
>
> To build from tarball (much easier), download from
> https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz.
>
> Regards, Tim
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]