[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Working with large networks and how to sample from a graph?

From: Tamás Nepusz
Subject: Re: [igraph] Working with large networks and how to sample from a graph?
Date: Wed, 4 Apr 2012 13:26:46 +0200

> One idea I had was to take a small random sample from the network (say 5,000 
> nodes) but I am not sure exactly how to do this in igraph.

Well, it depends on how you want to do it. You can try selecting 5000 nodes 
randomly from the entire network and then take the subgraph; this is relatively 

vs <- sample.int(vcount(g), 5000)-1
g2 <- subgraph(g, vs)

However, if your graph is large and sparse enough, there is a chance that the 
resulting graph will not be connected at all, and then your estimates will bear 
no resemblance at all to the "real" betweenness values.

Another option is to use "snowball sampling", in which you start out from a 
selected (and preferably well-connected) node and take the subgraph consisting 
of the vertices that are at most k steps away from the seed node. This can be 
done with the neighborhood() function, but I think this is largely equivalent 
to estimating betweenness by cutting paths after length k.

Note that there are quite a few papers about estimating betweenness centrality 
in very large graphs. I would start reading the following paper first:


Basically, they propose calculating shortest paths only from selected pivot 
nodes and then estimate the real betweenness values by numerical manipulations 
of the results. igraph implements shortest path calculations (see 
get.all.shortest.paths), so in theory it is possible to come up with an R 
implementation of their algoritm using igraph. (And if you manage to implement 
it, let us know so we can include it in the next version).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]