igraph-help
[Top][All Lists]

## Re: [igraph] FW: Calculating betweennes centrality in large graphs

 From: Tamás Nepusz Subject: Re: [igraph] FW: Calculating betweennes centrality in large graphs Date: Sat, 19 Nov 2011 22:18:45 +0100

```Hi,

> Secondly, I am not much of a programmer, but after some trial and error I was
> able to read the graph file in igraph (in Python) and got to
> "filename.betweenness(vertices=1, directed=True, cutoff=2)" which yielded the
> centrality index for, I think, the vertex with ID 1 (see above).
Yup, but watch out: the IDs in the GML file are *not* necessarily the same as
the IDs in igraph. The reason is that igraph assigns a numeric vertex ID for
each of the vertices from the range [0; n-1], where n is the total number of
vertices. Since nothing guarantees that the IDs in an arbitrary GML file are
from this range, igraph simply uses its own numeric IDs and stores the "real"
numeric IDs from the GML file in a vertex attribute called "id". E.g.:

print g.vs["id"]

This prints a list where element i is the value of the "id" attribute for
vertex i (according to igraph's IDs). If you need a mapping from "your" IDs to
igraph's IDs, you can construct a Python dictionary for that easily:

my_id_to_igraph_id = dict((v, k) for k, v in enumerate(g.vs["id"]))

Then you can run graph.betweenness(vertices=my_id_to_igraph_id[1],
directed=True, cutoff=2) to achieve what you want.

> 1) I believe when I use "vertices=None" centrality for all nodes is
> calculated.
That's true.

> How can I get igraph to write a CSV file that contains either (or both) "ID"
> or "Label" in the one column, and the corresponding centrality index in the
> other?
I think I've just answered that question on Stack Overflow ;) Basically it
boils down to this:

import csv
from itertools import izip

scores = graph.betweenness(directed=True, cutoff=2)
writer = csv.writer("output.csv")
writer.writerows(izip(g.vs["id"], scores))

> 2) Is there anyway to determine what a reliable cutoff point would be? I
> don't mind letting my computer run for a couple of days, but preferably not
> more than a day or three.
Well, I don't think so. Running graph.betweenness with a cutoff basically
excludes paths from the calculation that are longer than the cutoff. If you
want to ensure that, say, 95% of all the shortest paths are included in the
calculation, then you need the 95% percentile of the shortest path length
distribution, for which you will need to find all the shortest paths anyway --

One thing you could try is to take a random vertex, find the lengths of the
shortest paths from this vertex to all other vertices using
graph.shortest_paths(), then take the maximum of this vector and use it as a
rough estimate for the graph diameter (i.e. the "longest shortest path"). Then
use a cutoff of diameter/2.

Alternatively, take the vertex with the largest degree, which is likely to sit
somewhere "in the middle of the graph" and be equally far from the "perimeter"
of the graph. Then find the maximum of the shortest path lengths from this
vertex and use it as cutoff.

Cheers,
Tamas

```