I have a large numeric matrix in R as follows:
dim(libfactor.mat)
[1] 40523 10
head(libfactor.mat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 20884 19184 19185 NA NA NA NA NA NA NA
[2,] 18330 13979 13978 NA NA NA NA NA NA NA
[3,] 176 17101 21455 NA NA NA NA NA NA NA
[4,] 15166 15165 15166 NA NA NA NA NA NA NA
[5,] 14190 14306 7450 NA NA NA NA NA NA NA
[6,] 9196 1513 4054 NA NA NA NA NA NA NA
[...]
What I want to do is build a graph from this and decompose it so all compounds
sharing any hits form unique clusters.
I think the easiest way to do it is to build a compound-vs-libraryID bipartite
graph where the nodes are either compounds or library IDs and a compound node
is connected with its corresponding library ID nodes. Then you can run
bipartite.projection to project it into the compound graph where two compounds
will be connected if they share at least a single library ID. Unfortunately
there is no function in igraph to create a bipartite network from the
representation you have. I have tried to come up with a solution and it seems
pretty convoluted for me, but I'm no expert in R so maybe there's a better or
more elegant solution. Basically, we first construct an adjacency list using
apply():
adj.list <- apply(libfactor.mat, function(x) paste("l", x[!is.na(x)], sep=""))
Now we turn the adjacency list into an edge list (with vertex names):
edge.list <- lapply(1:length(adj.list), function(x) as.vector(rbind(paste("n", x,
sep=""), adj.list[[x]])))
edge.list <- Reduce(c, edge.list)
Then into a data frame:
df <- data.frame(matrix(edge.list, ncol=2, byrow=T))
Then we can create our graph from the data frame:
g <- graph.data.frame(df)
V(g)$type <- substr(V(g)$name, 1, 1) == "l"
projections <- bipartite.projection(g)
projections[[1]] will then be the compound matrix you need.