[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
## [igraph] efficiency in reading a CSV file

**From**: |
Jason Cornelius Brunson |

**Subject**: |
[igraph] efficiency in reading a CSV file |

**Date**: |
Wed, 02 Jun 2010 15:21:23 -0400 |

**User-agent**: |
Thunderbird 2.0.0.24 (Macintosh/20100228) |

Greetings,
Disclaimer: I'm not a proficient programmer.

`I've been following online guides and have reached a barrier: I have a
``large data set in the form of a csv with rows that look like this:
`
alpha,1,Alexandria
beta,5,Bristol|Calcutta|Dover
gamma,6,Alexandria|Calcutta

`I want to create a network with nodes the cities and edges between each
``pair of cities common to a row, separated by a pipe in column 3. So, the
``edges above would be (Bristol,Calcutta), (Bristol,Dover),
``(Calcutta,Dover), and (Alexandria,Calcutta). I've only figured out how
``to read the data into a network using convoluted for-loops, which takes
``prohibitively long. Is there a common way to read files like this that's
``much more efficient? Below is some code that does what i want on the
``small scale.
`
I hope this is a reasonable thing to ask. Thanks for any help!
Cory
classes <- c(V1="numeric",V2="character",V3="character")
dat1 <- read.table("test.csv",header=FALSE,sep=",",colClasses=classes)
dat2 <- read.table("test2.csv",header=FALSE,sep=",",colClasses=classes)

`dat <-
``data.frame(index=c(dat1[[1]],dat2[[1]]),class=c(dat1[[2]],dat2[[2]]),authors=c(dat1[[3]],dat2[[3]]))
`
vertices <- list()
edges <- list()
for (row in 1:length(dat$authors)) {
auts <- unlist(strsplit(as.character(dat$authors[[row]]),"\\|"))
for (aut in auts) {
if (!(aut %in% vertices)) vertices <- c(vertices,aut)
}
if (length(auts)>1) {
collpairs <- combn(auts,2)
for (i in 1:length(collpairs[1,])) {
collpair <- sort(collpairs[,i])
if (!(list(collpair)%in%edges)) edges <- c(edges,list(collpair))
}
}
}
rm(row,collpair,collpairs)
ea <- list()
for (i in 1:length(edges)) ea <- c(ea,edges[[i]])
el <- matrix(as.character(ea),nc=2,byrow=TRUE)
g <- igraph::graph.edgelist(el,directed=FALSE)

**[igraph] efficiency in reading a CSV file**,
*Jason Cornelius Brunson* **<=**