[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] FW: Calculating betweennes centrality in large graphs

From: Tamás Nepusz
Subject: Re: [igraph] FW: Calculating betweennes centrality in large graphs
Date: Sat, 19 Nov 2011 22:18:45 +0100


> Secondly, I am not much of a programmer, but after some trial and error I was 
> able to read the graph file in igraph (in Python) and got to 
> "filename.betweenness(vertices=1, directed=True, cutoff=2)" which yielded the 
> centrality index for, I think, the vertex with ID 1 (see above).
Yup, but watch out: the IDs in the GML file are *not* necessarily the same as 
the IDs in igraph. The reason is that igraph assigns a numeric vertex ID for 
each of the vertices from the range [0; n-1], where n is the total number of 
vertices. Since nothing guarantees that the IDs in an arbitrary GML file are 
from this range, igraph simply uses its own numeric IDs and stores the "real" 
numeric IDs from the GML file in a vertex attribute called "id". E.g.:

from igraph import load

g = load("your_graph.gml")
print g.vs["id"]

This prints a list where element i is the value of the "id" attribute for 
vertex i (according to igraph's IDs). If you need a mapping from "your" IDs to 
igraph's IDs, you can construct a Python dictionary for that easily:

my_id_to_igraph_id = dict((v, k) for k, v in enumerate(g.vs["id"]))

Then you can run graph.betweenness(vertices=my_id_to_igraph_id[1], 
directed=True, cutoff=2) to achieve what you want.

> 1) I believe when I use "vertices=None" centrality for all nodes is 
> calculated.
That's true.

> How can I get igraph to write a CSV file that contains either (or both) "ID" 
> or "Label" in the one column, and the corresponding centrality index in the 
> other?
I think I've just answered that question on Stack Overflow ;) Basically it 
boils down to this:

import csv
from itertools import izip

scores = graph.betweenness(directed=True, cutoff=2)
writer = csv.writer("output.csv")
writer.writerows(izip(g.vs["id"], scores))

> 2) Is there anyway to determine what a reliable cutoff point would be? I 
> don't mind letting my computer run for a couple of days, but preferably not 
> more than a day or three.
Well, I don't think so. Running graph.betweenness with a cutoff basically 
excludes paths from the calculation that are longer than the cutoff. If you 
want to ensure that, say, 95% of all the shortest paths are included in the 
calculation, then you need the 95% percentile of the shortest path length 
distribution, for which you will need to find all the shortest paths anyway -- 
but betweenness does that already.

One thing you could try is to take a random vertex, find the lengths of the 
shortest paths from this vertex to all other vertices using 
graph.shortest_paths(), then take the maximum of this vector and use it as a 
rough estimate for the graph diameter (i.e. the "longest shortest path"). Then 
use a cutoff of diameter/2. 

Alternatively, take the vertex with the largest degree, which is likely to sit 
somewhere "in the middle of the graph" and be equally far from the "perimeter" 
of the graph. Then find the maximum of the shortest path lengths from this 
vertex and use it as cutoff.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]