[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] betweenness running time estimate and
From: |
Gabor Csardi |
Subject: |
Re: [igraph] betweenness running time estimate and |
Date: |
Mon, 14 Jul 2008 03:35:38 -0500 |
User-agent: |
Mutt/1.5.17+20080114 (2008-01-14) |
On Sun, Jul 13, 2008 at 02:16:13PM -0700, David Hunkins wrote:
> Thank you Gabor, Tamas and others very much for the igraph library.
>
> I am analyzing some online photo sharing data donated to me by a large
> photo sharing site.
>
> In my first dataset I have about 5 million vertices and 5 million edges.
> In the next dataset I'll get about 50 million vertices and 200 million
> edges. I am currently running analysis on a 2GB/2.0GHz Intel Core Duo
> Macbook Pro. I am conditioning and loading the weighted, directed graph
> data in Pajek format.
>
> A couple of questions with which your user community may have some
> experience.
>
> First, I am having a trouble determining how long I should expect
> various operations to take even with the 5M/5M dataset, and I am
> wondering whether there is a simple metric (such as density of the
> graph) that can be computed and then used to calculate order-of-
> magnitude running times for the various other functions of interest.
> Yesterday, for example I ran betweenness.estimate(graph, cutoff=2) and
> it hasn't completed after 24 hours (and the Macbook is not paging at
> all, just getting a little 'warm').
David,
some igraph functions support a progress bar, betweenness estimation
supports this at the C level, but not at the R level, unfortunately.
(I forgot to add it, basically.) :(
I can easily add it if you want, and send you an updated R package.
> Second, do you know anybody who has used commercial (or otherwise)
> scalable computing resources to run analyses on datasets of this size? I
> fear that, even if I can melt my laptops to solve the problem for the
> 5M/5M dataset, I will need other resources when I go to 50M/200M
> dataset.
I don't know. The biggest data set I had was about 10M/21M and
you cannot calculate too many thing on this, or it really takes a long time.
An important issue is that if the graph does not fit into the physical
memory then igraph cannot handle it efficiently. Another issue is that
the igraph algorithms are not parallel themselves, you have to "parallelize"
them by hand, and that basically mean rewriting them.
I calculated the diameter of my 10M/21M graph, just for sport, and
it took about a month on a 10-20 machine cluster. The diameter calculation
can be parallelized easily, betweenness is a bit harder, but essentially
the running time is about the same.
> Thanks for any assistance you and the community can provide. Kösz!
Szivesen. :)
G.
> David Hunkins
>
>
>
>
>
>
> _______________________________________________
> igraph-help mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/igraph-help
--
Csardi Gabor <address@hidden> UNIL DGM