[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[igraph] efficiency in reading a CSV file
From: |
Jason Cornelius Brunson |
Subject: |
[igraph] efficiency in reading a CSV file |
Date: |
Wed, 02 Jun 2010 15:21:23 -0400 |
User-agent: |
Thunderbird 2.0.0.24 (Macintosh/20100228) |
Greetings,
Disclaimer: I'm not a proficient programmer.
I've been following online guides and have reached a barrier: I have a
large data set in the form of a csv with rows that look like this:
alpha,1,Alexandria
beta,5,Bristol|Calcutta|Dover
gamma,6,Alexandria|Calcutta
I want to create a network with nodes the cities and edges between each
pair of cities common to a row, separated by a pipe in column 3. So, the
edges above would be (Bristol,Calcutta), (Bristol,Dover),
(Calcutta,Dover), and (Alexandria,Calcutta). I've only figured out how
to read the data into a network using convoluted for-loops, which takes
prohibitively long. Is there a common way to read files like this that's
much more efficient? Below is some code that does what i want on the
small scale.
I hope this is a reasonable thing to ask. Thanks for any help!
Cory
classes <- c(V1="numeric",V2="character",V3="character")
dat1 <- read.table("test.csv",header=FALSE,sep=",",colClasses=classes)
dat2 <- read.table("test2.csv",header=FALSE,sep=",",colClasses=classes)
dat <-
data.frame(index=c(dat1[[1]],dat2[[1]]),class=c(dat1[[2]],dat2[[2]]),authors=c(dat1[[3]],dat2[[3]]))
vertices <- list()
edges <- list()
for (row in 1:length(dat$authors)) {
auts <- unlist(strsplit(as.character(dat$authors[[row]]),"\\|"))
for (aut in auts) {
if (!(aut %in% vertices)) vertices <- c(vertices,aut)
}
if (length(auts)>1) {
collpairs <- combn(auts,2)
for (i in 1:length(collpairs[1,])) {
collpair <- sort(collpairs[,i])
if (!(list(collpair)%in%edges)) edges <- c(edges,list(collpair))
}
}
}
rm(row,collpair,collpairs)
ea <- list()
for (i in 1:length(edges)) ea <- c(ea,edges[[i]])
el <- matrix(as.character(ea),nc=2,byrow=TRUE)
g <- igraph::graph.edgelist(el,directed=FALSE)
- [igraph] efficiency in reading a CSV file,
Jason Cornelius Brunson <=