igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[igraph] Real Network Analysis


From: Lorenzo Isella
Subject: [igraph] Real Network Analysis
Date: Thu, 18 Jun 2009 14:34:02 +0200
User-agent: Thunderbird 2.0.0.21 (X11/20090409)

Dear All,
An apology for this long email, but I am planning to use extensively the
igraph library this year and beyond and I have gathered together a few
questions.
I would now apply the igraph library to study a real network, e.g. the
US airport network.
I found out that some data are freely available at

http://cxnets.googlepages.com/US_largest500_airportnetwork.txt

There a few topological quantities of interest I would like to calculate
(I hope that this may be useful to other igraph learners)

(1) The degree distribution P(k) [which I estimate using an empirical
frequency] as a function of k (node degree)
(2) Fit (1) to a power law
(3) The clustering coefficients (sometimes called transitivity) as a
function of k
(4) The distribution of the vertex betweenness
(5) The average nearest neighbour degree as a function of k
(6) The mean shortest path in the network
(7) The average number of nodes within a distance l from any given node
(as a function of l)

I have written an R code which hopefully performs these tasks (see
below), but I have a few questions

(a) Do I have any problems if there are some missing degrees in my
distribution (other than getting some warnings in a few log-log plots)?
(b) When performing (2), whatever xmin I set by hand, I get an error...why?
(c) In (3), I set equal to zero by hand the NaN returned by the igraph transitivity routine (I think it is part of the definition of the clustering coefficients when the numerator is zero).
(d) I have no idea of how to calculate the quantity in (7).
(e) Can I tune the parameters (e.g. number of bins) of the histogram
generated by path.length.hist ?

I think this really amounts to knowing the graph topology in detail. The last questions are really about the R/Python bindings and the speed penalty.

(f) I resorted to the R bindings of the igraph library, but can I perform the same tasks with the Python bindings? (g) Approximately, what is the penalty in using the R/Python bindings rather than the C library directly? A factor 2 or three is not big deal, but one order of magnitude would be.



Many thanks for any suggestions


Lorenzo





rm(list=ls())

library(Cairo)
library(igraph)

###########################################################
#   http://cxnets.googlepages.com/US_largest500_airportnetwork.txt
###########################################################

graph_input<-read.table("US_largest500_airportnetwork.txt",header=FALSE)
graph_input<-as.matrix(graph_input)

graph_input_unweighted <- graph_input[ ,1:2]

# Now I have the graph as a two-column matrix

g <- graph(t(graph_input_unweighted), directed=FALSE)

g <- simplify(g)


#save the graph in a convenient format (e.g. edgelist)

write.graph(g, "edgelist-airport", format="edgelist")


#Now plot the graph

l <- layout.fruchterman.reingold(g)
l <- layout.norm(l, -1,1, -1,1)


E(g)$color <- "grey"
E(g)$width <- 1
V(g)$label.color <- "blue"
V(g)$color  <- "SkyBlue2"



pdf("airport-network.pdf")

plot(g, layout=layout.fruchterman.reingold,
     vertex.label.dist=0.5,vertex.label=NA, vertex.size=5)


dev.off()


############################################################################################
############################################################################################
############################################################################################


#Now some useful statistics

#degree of each vertex

dd <- degree(g)


print("The mean degree is, ")
print(mean(dd))

#degree distribution

dd2 <- degree.distribution(g)

#dd2 is calculated for a node degree ranging from 0 to the max degree found in the network, hence I
#define the following

k_range <- 0:(length(dd2)-1)

#Get also the cumulative distribution

dd2_cum <- degree.distribution(g, cumulative=TRUE)

#From now on, I get some warnings in the log-log plots, but it should not be anything to worry about


pdf("frequency_degree2.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(k_range,dd2, xlab="Degree",
ylab="Frequency", pch=3, col=3, type="b",cex.axis=1.4,cex.lab=1.6)



dev.off()


pdf("frequency_degree2-loglog.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)

plot(k_range,dd2, xlab="Degree",
ylab="Frequency",log="xy", pch=3, col=3, type="b", ylim=c(2e-3,3e-1), yaxt="n",cex.axis=1.4,cex.lab=1.6)

axis(side=2, at=c( 2e-3, 3e-2, 3e-1),
labels=expression(2%*%10^3,3%*%10^2,3%*%10^1 ),cex.lab=1.4,cex.axis=1.6)

dev.off()

pdf("frequency_degree2-cumulative-loglog.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)


plot(0:(length(dd2)-1),dd2_cum, xlab="Degree",
ylab="Cumulative frequency",log="xy", pch=3, col=3, type="b",cex.axis=1.4,cex.lab=1.6)
dev.off()





pdf("frequency_degree2-cumulative.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)


plot(0:(length(dd2)-1),dd2_cum, xlab="Degree",
ylab="Cumulative frequency", pch=3, col=3, type="b",cex.axis=1.4,cex.lab=1.6)
dev.off()



#Now I fit the degree distribution to a power law, but I get an error if I try to set xmin. Why????

pfit <- power.law.fit(dd)

pdf("frequency_degree2-loglog-and-fit.pdf")

par( mar = c(4.5,5, 2, 1) + 0.1)

plot(k_range,dd2, xlab="Degree",
ylab="Frequency",log="xy", ylim=c(0.002,0.5), pch=3, col=3, type="b",cex.axis=1.4,cex.lab=1.6, yaxt="n")

axis(side=2, at=c( 2e-3, 3e-2, 3e-1),
labels=expression(2%*%10^3,3%*%10^2,3%*%10^1 ),cex.lab=1.4,cex.axis=1.6)


lines(2:20,1*(2:20)^(-coef(pfit)) )


dev.off()



#Now calculate graph diameter


d <- get.diameter(g)


E(g, path=d)$color <- "red"
E(g, path=d)$width <- 1

V(g)[ d ]$label.color <- "black"
V(g)[ d ]$color <- "red"




pdf("airport_graph_diameter.pdf")


plot(g, layout=layout.fruchterman.reingold,
     vertex.label.dist=0.5,vertex.label=NA, vertex.size=5)

title(main="Diameter of the random graph",
      xlab="created by igraph 0.5")
axis(1, labels=FALSE, tick=TRUE)
axis(2, labels=FALSE, tick=TRUE)

dev.off()

#Now calculate the vertex betweenness
#and find the node with the highest betweenness

ver_bet <- betweenness(g)

## max_bet <- ver_bet[which(ver_bet==max(ver_bet))]

## print("The maximumum value of the betweenness is, ")
## print (max_bet)

## node_max_bet <- which(ver_bet==max(ver_bet))-1 #consistent with the labelling of vertices starting from 0 and not from 1

## print("The node with the maximumum value of the betweenness is, ")
## print (node_max_bet)


print("The mean vertex betweenness is, ")
print(mean(ver_bet))

#Now calculate the clustering coefficients (also called transitivity of a graph)

clust <- transitivity(g, type="local")


#Now there is a problem: the definition of clustering coefficient may lead to a 0/0 # which becomes a NaN for R. The idea is then to remove this NaN and set them all to zero (as they should)

bad <- sapply(clust, function(x) all(is.nan(x)))
clust[bad] <- 0 #done!

print("The mean clustering coefficient is, ")
print(mean(clust))

#Now I can plot the clustering coefficients as a function of the node degrees

#The following figure looks awful, am I doing anything wrong?

pdf("clustering-vs-degree.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(dd,clust, xlab="Degree",
ylab="Clustering coefficient", pch=4, col="blue", type="p",cex.axis=1.4,cex.lab=1.6)



dev.off()




# calculate average nearest-neighbour degree

knn <- graph.knn(g)

#knn$knnk lists the average nearest neighbour degree for vertices having at least one neighbour, hence
# the vector is one element shorter than dd2




pdf("nearest-neighbour-frequency.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(knn$knnk, xlab="Degree",
ylab="Frequency", pch=3, col=3, type="b",cex.axis=1.4,cex.lab=1.6)

dev.off()


pdf("nearest-neighbour-frequency-log-log.pdf")
par( mar = c(4.5,5, 2, 1) + 0.1)

plot(knn$knnk, xlab="Degree",
ylab="Frequency", pch=3, col=3,log="xy", type="b",cex.axis=1.4,cex.lab=1.6)



dev.off()


#Now get the shortest paths in the network

mean_shortest_path <- average.path.length(g)

print("The mean shortest path is, ")
print(mean_shortest_path)

h_list <- path.length.hist(g)


print ("So far so good")







reply via email to

[Prev in Thread] Current Thread [Next in Thread]