[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Average Path Length / Diameter

From: Gabor Csardi
Subject: Re: [igraph] Average Path Length / Diameter
Date: Tue, 12 Feb 2008 09:11:44 +0100
User-agent: Mutt/1.5.13 (2006-08-11)


I think you can do two things:
1) In http://www.cs.cmu.edu/~jure/pubs/powergrowth-tkdd.pdf the authors
   introduce a method to approximate the diameter (although their
   diameter is not the classic diameter). They claim that it works well.

2) Calculate the real diameter. Yes, this can take a long time.
   Here is a first rough sketch of how i would do it:

   a) compute the shortest paths from one vertex to all the others 
      at a time. Calculate a path length histogram for these 
      path length. Obviously, you cannot store all path lengths, 
      but the diameter should be small, so you can store a histogram
      and that is more information than just storing the average and
      the maximum.
   b) create a simple text file which contains a single number, 
      the vertex for which the calculation will be performed next.
   c) create another text file which will contain the results, one 
      line for each histogram, prefixed by the id of the vertex
      for which the path lengths are calculated.
   d) If you would like to parallelize the calculation then 
      you'll need some locking mechanism to avoid many processes
      reading/writing the files at the same time.
   e) At the end look over the result file to see the vertices 
      for which the calculation is missing or corrupt because 
      of power failures, or because the machine froze, or other
      evil things.

   That is it basically. I think it is good because a) from a simple 
   run (for a single vertex) you can estimate the total running time 
   quite precisely, b) it is moderately fail-safe, consider to backup
   the result file every day (hour) or so, a small script can do this;
   c) you can do it in any language that igraph supports (actually it 
   is not very hard to do it without using igraph at all); d) from the
   stored histograms you can calculate other measures, like the 90 
   percentile diameter they suggest in the paper i proposed above, etc.

Hope this helps,

On Mon, Feb 11, 2008 at 08:43:10PM -0500, Thomas F. Brantle wrote:
> Gabor and Others:
> I know that those computations involving “shortest path” analyses will be
> fairly “costly and time consuming” from a computational point of view (i.e.,
> m*n).  Please let me give you a general idea of my network data and other
> potential computer resource constraints.
> The primary network data that I want to analyze includes a few datasets of
> approximately 2 – 3 million nodes and 10 – 30 million links, each. For these
> shortest path (e.g., average shortest path length) computations I assume that
> my network is “undirected” and I also expect the largest or giant component to
> include 90-99% of the nodes.
> For these analyses I’m using an 8GB/2800MHz Linux (Ubuntu) server. Given these
> computing constraints, I would greatly appreciate your suggestions on a
> computing (and computational) strategy for my network dataset analyses. I’ve
> estimated that the computation could take a month or more.
> Given this issue, is there a convergence strategy, computational sampling,
> numerical approximation or other alternative that may be incorporated? I’ve
> searched the literature and haven’t seen this issue directly addressed, unless
> I just missed it.
> If you have any ideas or a suggested computational strategy I’d greatly
> appreciate it.
> Thanks,
> --Tom

> _______________________________________________
> igraph-help mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/igraph-help

Csardi Gabor <address@hidden>    UNIL DGM

reply via email to

[Prev in Thread] Current Thread [Next in Thread]