igraph-help
[Top][All Lists]

Re: [igraph] Working with large networks and how to sample from a graph?

 From: Tamás Nepusz Subject: Re: [igraph] Working with large networks and how to sample from a graph? Date: Wed, 4 Apr 2012 13:26:46 +0200

```> One idea I had was to take a small random sample from the network (say 5,000
> nodes) but I am not sure exactly how to do this in igraph.

Well, it depends on how you want to do it. You can try selecting 5000 nodes
randomly from the entire network and then take the subgraph; this is relatively
simple:

library(igraph)
vs <- sample.int(vcount(g), 5000)-1
g2 <- subgraph(g, vs)

However, if your graph is large and sparse enough, there is a chance that the
resulting graph will not be connected at all, and then your estimates will bear
no resemblance at all to the "real" betweenness values.

Another option is to use "snowball sampling", in which you start out from a
selected (and preferably well-connected) node and take the subgraph consisting
of the vertices that are at most k steps away from the seed node. This can be
done with the neighborhood() function, but I think this is largely equivalent
to estimating betweenness by cutting paths after length k.

Note that there are quite a few papers about estimating betweenness centrality
in very large graphs. I would start reading the following paper first:

http://www.inf.uni-konstanz.de/algo/publications/bp-celn-06.pdf

Basically, they propose calculating shortest paths only from selected pivot
nodes and then estimate the real betweenness values by numerical manipulations
of the results. igraph implements shortest path calculations (see
get.all.shortest.paths), so in theory it is possible to come up with an R
implementation of their algoritm using igraph. (And if you manage to implement
it, let us know so we can include it in the next version).

Best,
T.

```