igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[igraph] Subsetting graph by edges + building large graph troubles (R in


From: Steve Lianoglou
Subject: [igraph] Subsetting graph by edges + building large graph troubles (R interface)
Date: Wed, 4 Mar 2009 15:51:17 -0500

Hi all,

I'm a first timer w/ igraph and am using it in R. I'm sorry for putting to messages into one, but the trouble reading in a large graph is causing me to try and subset a whole graph, so they're kind of smashed together. I'll try to explain. Sorry, this is long.

I'm building a network from what is essentially an 'ncol' formatted file. It's the protein interaction network for yeast from the STRING db(http://string.embl.de). Here's the first few lines:

Q0032 YNL182C 433
Q0045 Q0060 186
Q0045 Q0085 716
Q0045 Q0105 997
Q0045 Q0110 882
Q0045 Q0115 898
Q0045 Q0120 928
Q0045 Q0130 201
Q0045 Q0140 222
Q0045 Q0250 999
...

=========
Problem 1
=========

I'd actually like to filter the graph I'm creating from the file on the edge score listed on the third column above, but I'm not having an easy time of doing this correctly. Earlier today I stumbled on what I thought would be a simple way to do this:

# threshold and directed are defined elsewhere
graf <- read.graph(filepath, format='ncol', directed=directed)
good.edges <- E(graf)[weight >= threshold]
good.graf <- graph(edges, directed=directed)

Problem is I'm losing my name attributes from my vertices and my `good.graf` now is just a list of vertex IDs. Being that I'm not really comfortable with the igraph API, I'm having a hard time figuring out the best way to keep the correct names associated with the new IDs that are in `good.graf`.

I'm thinking this is done often, so I was wondering if anyone could offer any good suggestions.


=========
Problem 2
=========

Prior to trying it this way, I was taking inspiration from the technique "creating graphs" section here:

http://igraph.sourceforge.net/igraphbook/igraphbook-creating.html#id2569368

I'm subsetting my `links` file to contain only the edges with scores > threshold, so that I can use the subsetted data.frame to build the graph straight away, however there is some weirdness happening where the first edge at the top of my `links` data.frame isn't being registered, but its weight is, and my weights are all off by one. Have a look:

links <- read.delim(pathToFile, sep=" ", header=FALSE, stringsAsFactors=FALSE, col.names=c('from', 'to', 'weight'))
## I won't filter for easy comparison with data above
## links <- subset(links, weight >= threshold)
vNames <- unique(c(links$from, links$to))
ids <- seq_along(vNames) - 1
names(ids) <- vNames
edges <- matrix(c(ids[links$from], ids[links$to]), nc=2)
g <- add.vertices(graph.empty(directed=FALSE), length(ids), name=vNames)
g <- add.edges(g, t(edges), weight=links$weight)

I will paste below on the left the edges in the graph. There's an extra col surrounded with [] that I'm putting there to show you the scores/weights that are associated with this edge (I'll show the R output of the scores after so you see they are the same). On the right (## comented) are the first 10 lines from the input file for comparison:

# modifed head(E(g), 10) call (modifications in trailing []):
R> head(E(g), 10)              ## Top 10 lines from file
[1]  Q0045   -> Q0060  [433]   ## Q0032 YNL182C 433
[2]  Q0045   -> Q0085  [186]   ## Q0045 Q0060 186
[3]  Q0045   -> Q0105  [716]   ## Q0045 Q0085 716
[4]  Q0045   -> Q0110  [997]   ## Q0045 Q0105 997
[5]  Q0045   -> Q0115  [882]   ## Q0045 Q0110 882
[6]  Q0045   -> Q0120  [898]   ## Q0045 Q0115 898
[7]  Q0045   -> Q0130  [928]   ## Q0045 Q0120 928
[8]  Q0045   -> Q0140  [201]   ## Q0045 Q0130 201
[9]  Q0045   -> Q0250  [222]   ## Q0045 Q0140 222
[10] Q0045   -> Q0275  [999]   ## Q0045 Q0250 999

# Here is the call to get the weights with its output, just to verify
R> head(E(wtf$graph)$weight ,10)
 [1] 433 186 716 997 882 898 928 201 222 999

See how my first edge in the igraph is really the second edge in my input file, BUT its weight is the one from the first edge of my input file? I'm stumped.

I tried my code on a smaller test file, and it works fine. The real data file has 623,530 edges. In the original data file, each edge is repeated once to represent the undirected nature of protein interaction networks, eg there is:

A B <weight>
... somewhere down the line ...
B A <weight>

My small test file did the same thing, and it worked fine. Also the problem still persists whether I specify my igraph to be directed or undirected prior to constructing.

I feel like I'm doing something wrong, since I'm imagining this behavior would have been spotted by now if there's a bug, but I can't see where I'm making a mistake.

Any help to either issue would be greatly appreciated.

Thanks,
-steve




reply via email to

[Prev in Thread] Current Thread [Next in Thread]