[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[igraph] efficiency in reading a CSV file

From: Jason Cornelius Brunson
Subject: [igraph] efficiency in reading a CSV file
Date: Wed, 02 Jun 2010 15:21:23 -0400
User-agent: Thunderbird (Macintosh/20100228)


Disclaimer: I'm not a proficient programmer.

I've been following online guides and have reached a barrier: I have a large data set in the form of a csv with rows that look like this:


I want to create a network with nodes the cities and edges between each pair of cities common to a row, separated by a pipe in column 3. So, the edges above would be (Bristol,Calcutta), (Bristol,Dover), (Calcutta,Dover), and (Alexandria,Calcutta). I've only figured out how to read the data into a network using convoluted for-loops, which takes prohibitively long. Is there a common way to read files like this that's much more efficient? Below is some code that does what i want on the small scale.

I hope this is a reasonable thing to ask. Thanks for any help!


classes <- c(V1="numeric",V2="character",V3="character")

dat1 <- read.table("test.csv",header=FALSE,sep=",",colClasses=classes)
dat2 <- read.table("test2.csv",header=FALSE,sep=",",colClasses=classes)

dat <- data.frame(index=c(dat1[[1]],dat2[[1]]),class=c(dat1[[2]],dat2[[2]]),authors=c(dat1[[3]],dat2[[3]]))

vertices <- list()
edges <- list()
for (row in 1:length(dat$authors)) {
 auts <- unlist(strsplit(as.character(dat$authors[[row]]),"\\|"))
 for (aut in auts) {
   if (!(aut %in% vertices)) vertices <- c(vertices,aut)
 if (length(auts)>1) {
   collpairs <- combn(auts,2)
   for (i in 1:length(collpairs[1,])) {
     collpair <- sort(collpairs[,i])
     if (!(list(collpair)%in%edges)) edges <- c(edges,list(collpair))


ea <- list()
for (i in 1:length(edges)) ea <- c(ea,edges[[i]])
el <- matrix(as.character(ea),nc=2,byrow=TRUE)
g <- igraph::graph.edgelist(el,directed=FALSE)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]