[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] Average Path Length / Diameter
From: |
Gabor Csardi |
Subject: |
Re: [igraph] Average Path Length / Diameter |
Date: |
Tue, 12 Feb 2008 09:11:44 +0100 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
Tom,
I think you can do two things:
1) In http://www.cs.cmu.edu/~jure/pubs/powergrowth-tkdd.pdf the authors
introduce a method to approximate the diameter (although their
diameter is not the classic diameter). They claim that it works well.
2) Calculate the real diameter. Yes, this can take a long time.
Here is a first rough sketch of how i would do it:
a) compute the shortest paths from one vertex to all the others
at a time. Calculate a path length histogram for these
path length. Obviously, you cannot store all path lengths,
but the diameter should be small, so you can store a histogram
and that is more information than just storing the average and
the maximum.
b) create a simple text file which contains a single number,
the vertex for which the calculation will be performed next.
c) create another text file which will contain the results, one
line for each histogram, prefixed by the id of the vertex
for which the path lengths are calculated.
d) If you would like to parallelize the calculation then
you'll need some locking mechanism to avoid many processes
reading/writing the files at the same time.
e) At the end look over the result file to see the vertices
for which the calculation is missing or corrupt because
of power failures, or because the machine froze, or other
evil things.
That is it basically. I think it is good because a) from a simple
run (for a single vertex) you can estimate the total running time
quite precisely, b) it is moderately fail-safe, consider to backup
the result file every day (hour) or so, a small script can do this;
c) you can do it in any language that igraph supports (actually it
is not very hard to do it without using igraph at all); d) from the
stored histograms you can calculate other measures, like the 90
percentile diameter they suggest in the paper i proposed above, etc.
Hope this helps,
Gabor
On Mon, Feb 11, 2008 at 08:43:10PM -0500, Thomas F. Brantle wrote:
> Gabor and Others:
>
>
>
> I know that those computations involving “shortest path” analyses will be
> fairly “costly and time consuming” from a computational point of view (i.e.,
> m*n). Please let me give you a general idea of my network data and other
> potential computer resource constraints.
>
>
>
> The primary network data that I want to analyze includes a few datasets of
> approximately 2 – 3 million nodes and 10 – 30 million links, each. For these
> shortest path (e.g., average shortest path length) computations I assume that
> my network is “undirected” and I also expect the largest or giant component to
> include 90-99% of the nodes.
>
>
>
> For these analyses I’m using an 8GB/2800MHz Linux (Ubuntu) server. Given these
> computing constraints, I would greatly appreciate your suggestions on a
> computing (and computational) strategy for my network dataset analyses. I’ve
> estimated that the computation could take a month or more.
>
>
>
> Given this issue, is there a convergence strategy, computational sampling,
> numerical approximation or other alternative that may be incorporated? I’ve
> searched the literature and haven’t seen this issue directly addressed, unless
> I just missed it.
>
>
>
> If you have any ideas or a suggested computational strategy I’d greatly
> appreciate it.
>
>
>
> Thanks,
>
> --Tom
>
>
>
> _______________________________________________
> igraph-help mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/igraph-help
--
Csardi Gabor <address@hidden> UNIL DGM