[igraph] Real Network Analysis

igraph-help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[igraph] Real Network Analysis

From:	Lorenzo Isella
Subject:	[igraph] Real Network Analysis
Date:	Thu, 18 Jun 2009 14:34:02 +0200
User-agent:	Thunderbird 2.0.0.21 (X11/20090409)

Dear All,
An apology for this long email, but I am planning to use extensively the
igraph library this year and beyond and I have gathered together a few
questions.
I would now apply the igraph library to study a real network, e.g. the
US airport network.
I found out that some data are freely available at

http://cxnets.googlepages.com/US_largest500_airportnetwork.txt

There a few topological quantities of interest I would like to calculate
(I hope that this may be useful to other igraph learners)

(1) The degree distribution P(k) [which I estimate using an empirical
frequency] as a function of k (node degree)
(2) Fit (1) to a power law
(3) The clustering coefficients (sometimes called transitivity) as a
function of k
(4) The distribution of the vertex betweenness
(5) The average nearest neighbour degree as a function of k
(6) The mean shortest path in the network
(7) The average number of nodes within a distance l from any given node
(as a function of l)

I have written an R code which hopefully performs these tasks (see
below), but I have a few questions

(a) Do I have any problems if there are some missing degrees in my
distribution (other than getting some warnings in a few log-log plots)?
(b) When performing (2), whatever xmin I set by hand, I get an error...why?

(c) In (3), I set equal to zero by hand the NaN returned by the igraphtransitivity routine (I think it is part of the definition of theclustering coefficients when the numerator is zero).

(d) I have no idea of how to calculate the quantity in (7).
(e) Can I tune the parameters (e.g. number of bins) of the histogram
generated by path.length.hist ?

I think this really amounts to knowing the graph topology in detail. Thelast questions are really about the R/Python bindings and the speed penalty.

(f) I resorted to the R bindings of the igraph library, but can Iperform the same tasks with the Python bindings?(g) Approximately, what is the penalty in using the R/Python bindingsrather than the C library directly? A factor 2 or three is not big deal,but one order of magnitude would be.




Many thanks for any suggestions


Lorenzo





rm(list=ls())

library(Cairo)
library(igraph)

###########################################################
#   http://cxnets.googlepages.com/US_largest500_airportnetwork.txt
###########################################################

graph_input<-read.table("US_largest500_airportnetwork.txt",header=FALSE)
graph_input<-as.matrix(graph_input)

graph_input_unweighted <- graph_input[ ,1:2]

# Now I have the graph as a two-column matrix

g <- graph(t(graph_input_unweighted), directed=FALSE)

g <- simplify(g)


#save the graph in a convenient format (e.g. edgelist)

write.graph(g, "edgelist-airport", format="edgelist")


#Now plot the graph

l <- layout.fruchterman.reingold(g)
l <- layout.norm(l, -1,1, -1,1)


E(g)$color <- "grey"
E(g)$width <- 1
V(g)$label.color <- "blue"
V(g)$color  <- "SkyBlue2"



pdf("airport-network.pdf")

plot(g, layout=layout.fruchterman.reingold,
     vertex.label.dist=0.5,vertex.label=NA, vertex.size=5)


dev.off()


############################################################################################
############################################################################################
############################################################################################


#Now some useful statistics

#degree of each vertex

dd <- degree(g)


print("The mean degree is, ")
print(mean(dd))

#degree distribution

dd2 <- degree.distribution(g)

#dd2 is calculated for a node degree ranging from 0 to the max degreefound in the network, hence I

#define the following

k_range <- 0:(length(dd2)-1)

#Get also the cumulative distribution

dd2_cum <- degree.distribution(g, cumulative=TRUE)

#From now on, I get some warnings in the log-log plots, but it shouldnot be anything to worry about



pdf("frequency_degree2.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(k_range,dd2, xlab="Degree",

ylab="Frequency", pch=3, col=3,type="b",cex.axis=1.4,cex.lab=1.6)




dev.off()


pdf("frequency_degree2-loglog.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)

plot(k_range,dd2, xlab="Degree",

ylab="Frequency",log="xy", pch=3, col=3, type="b",ylim=c(2e-3,3e-1), yaxt="n",cex.axis=1.4,cex.lab=1.6)


axis(side=2, at=c( 2e-3, 3e-2, 3e-1),
labels=expression(2%*%10^3,3%*%10^2,3%*%10^1 ),cex.lab=1.4,cex.axis=1.6)

dev.off()

pdf("frequency_degree2-cumulative-loglog.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)


plot(0:(length(dd2)-1),dd2_cum, xlab="Degree",

ylab="Cumulative frequency",log="xy", pch=3, col=3,type="b",cex.axis=1.4,cex.lab=1.6)

dev.off()





pdf("frequency_degree2-cumulative.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)


plot(0:(length(dd2)-1),dd2_cum, xlab="Degree",

ylab="Cumulative frequency", pch=3, col=3,type="b",cex.axis=1.4,cex.lab=1.6)

dev.off()

#Now I fit the degree distribution to a power law, but I get an error ifI try to set xmin. Why????


pfit <- power.law.fit(dd)

pdf("frequency_degree2-loglog-and-fit.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)

plot(k_range,dd2, xlab="Degree",

ylab="Frequency",log="xy", ylim=c(0.002,0.5), pch=3, col=3,type="b",cex.axis=1.4,cex.lab=1.6, yaxt="n")


axis(side=2, at=c( 2e-3, 3e-2, 3e-1),
labels=expression(2%*%10^3,3%*%10^2,3%*%10^1 ),cex.lab=1.4,cex.axis=1.6)


lines(2:20,1*(2:20)^(-coef(pfit)) )


dev.off()



#Now calculate graph diameter


d <- get.diameter(g)


E(g, path=d)$color <- "red"
E(g, path=d)$width <- 1

V(g)[ d ]$label.color <- "black"
V(g)[ d ]$color <- "red"




pdf("airport_graph_diameter.pdf")


plot(g, layout=layout.fruchterman.reingold,
     vertex.label.dist=0.5,vertex.label=NA, vertex.size=5)

title(main="Diameter of the random graph",
      xlab="created by igraph 0.5")
axis(1, labels=FALSE, tick=TRUE)
axis(2, labels=FALSE, tick=TRUE)

dev.off()

#Now calculate the vertex betweenness
#and find the node with the highest betweenness

ver_bet <- betweenness(g)

## max_bet <- ver_bet[which(ver_bet==max(ver_bet))]

## print("The maximumum value of the betweenness is, ")
## print (max_bet)

## node_max_bet <- which(ver_bet==max(ver_bet))-1 #consistent with thelabelling of vertices starting from 0 and not from 1


## print("The node with the maximumum value of the betweenness is, ")
## print (node_max_bet)


print("The mean vertex betweenness is, ")
print(mean(ver_bet))

#Now calculate the clustering coefficients (also called transitivity ofa graph)


clust <- transitivity(g, type="local")

#Now there is a problem: the definition of clustering coefficient maylead to a 0/0# which becomes a NaN for R. The idea is then to remove this NaN and setthem all to zero (as they should)


bad <- sapply(clust, function(x) all(is.nan(x)))
clust[bad] <- 0 #done!

print("The mean clustering coefficient is, ")
print(mean(clust))

#Now I can plot the clustering coefficients as a function of the nodedegrees


#The following figure looks awful, am I doing anything wrong?

pdf("clustering-vs-degree.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(dd,clust, xlab="Degree",

ylab="Clustering coefficient", pch=4, col="blue",type="p",cex.axis=1.4,cex.lab=1.6)




dev.off()




# calculate average nearest-neighbour degree

knn <- graph.knn(g)

#knn$knnk lists the average nearest neighbour degree for vertices havingat least one neighbour, hence

# the vector is one element shorter than dd2




pdf("nearest-neighbour-frequency.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(knn$knnk, xlab="Degree",

ylab="Frequency", pch=3, col=3,type="b",cex.axis=1.4,cex.lab=1.6)


dev.off()


pdf("nearest-neighbour-frequency-log-log.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(knn$knnk, xlab="Degree",

ylab="Frequency", pch=3, col=3,log="xy",type="b",cex.axis=1.4,cex.lab=1.6)




dev.off()


#Now get the shortest paths in the network

mean_shortest_path <- average.path.length(g)

print("The mean shortest path is, ")
print(mean_shortest_path)

h_list <- path.length.hist(g)


print ("So far so good")

[Prev in Thread]

Current Thread

[Next in Thread]

[igraph] Real Network Analysis, Lorenzo Isella <=
- Re: [igraph] Real Network Analysis, Gábor Csárdi, 2009/06/18
- Re: [igraph] Real Network Analysis, Tamas Nepusz, 2009/06/19

Prev by Date: Re: [igraph] Re: shortest path length
Next by Date: Re: [igraph] Re: shortest path length
Previous by thread: [igraph] Re: shortest path length
Next by thread: Re: [igraph] Real Network Analysis
Index(es):
- Date
- Thread