[igraph] Subsetting graph by edges + building large graph troubles (R in

igraph-help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[igraph] Subsetting graph by edges + building large graph troubles (R in

From:	Steve Lianoglou
Subject:	[igraph] Subsetting graph by edges + building large graph troubles (R interface)
Date:	Wed, 4 Mar 2009 15:51:17 -0500

Hi all,

I'm a first timer w/ igraph and am using it in R. I'm sorry forputting to messages into one, but the trouble reading in a large graphis causing me to try and subset a whole graph, so they're kind ofsmashed together. I'll try to explain. Sorry, this is long.

I'm building a network from what is essentially an 'ncol' formattedfile. It's the protein interaction network for yeast from the STRINGdb(http://string.embl.de). Here's the first few lines:


Q0032 YNL182C 433
Q0045 Q0060 186
Q0045 Q0085 716
Q0045 Q0105 997
Q0045 Q0110 882
Q0045 Q0115 898
Q0045 Q0120 928
Q0045 Q0130 201
Q0045 Q0140 222
Q0045 Q0250 999
...

=========
Problem 1
=========

I'd actually like to filter the graph I'm creating from the file onthe edge score listed on the third column above, but I'm not having aneasy time of doing this correctly. Earlier today I stumbled on what Ithought would be a simple way to do this:


# threshold and directed are defined elsewhere
graf <- read.graph(filepath, format='ncol', directed=directed)
good.edges <- E(graf)[weight >= threshold]
good.graf <- graph(edges, directed=directed)

Problem is I'm losing my name attributes from my vertices and my`good.graf` now is just a list of vertex IDs. Being that I'm notreally comfortable with the igraph API, I'm having a hard timefiguring out the best way to keep the correct names associated withthe new IDs that are in `good.graf`.

I'm thinking this is done often, so I was wondering if anyone couldoffer any good suggestions.



=========
Problem 2
=========

Prior to trying it this way, I was taking inspiration from thetechnique "creating graphs" section here:


http://igraph.sourceforge.net/igraphbook/igraphbook-creating.html#id2569368

I'm subsetting my `links` file to contain only the edges with scores >threshold, so that I can use the subsetted data.frame to build thegraph straight away, however there is some weirdness happening wherethe first edge at the top of my `links` data.frame isn't beingregistered, but its weight is, and my weights are all off by one. Havea look:

links <- read.delim(pathToFile, sep=" ", header=FALSE,stringsAsFactors=FALSE, col.names=c('from', 'to', 'weight'))

## I won't filter for easy comparison with data above
## links <- subset(links, weight >= threshold)
vNames <- unique(c(links$from, links$to))
ids <- seq_along(vNames) - 1
names(ids) <- vNames
edges <- matrix(c(ids[links$from], ids[links$to]), nc=2)
g <- add.vertices(graph.empty(directed=FALSE), length(ids), name=vNames)
g <- add.edges(g, t(edges), weight=links$weight)

I will paste below on the left the edges in the graph. There's anextra col surrounded with [] that I'm putting there to show you thescores/weights that are associated with this edge (I'll show the Routput of the scores after so you see they are the same). On the right(## comented) are the first 10 lines from the input file for comparison:


# modifed head(E(g), 10) call (modifications in trailing []):
R> head(E(g), 10)              ## Top 10 lines from file
[1]  Q0045   -> Q0060  [433]   ## Q0032 YNL182C 433
[2]  Q0045   -> Q0085  [186]   ## Q0045 Q0060 186
[3]  Q0045   -> Q0105  [716]   ## Q0045 Q0085 716
[4]  Q0045   -> Q0110  [997]   ## Q0045 Q0105 997
[5]  Q0045   -> Q0115  [882]   ## Q0045 Q0110 882
[6]  Q0045   -> Q0120  [898]   ## Q0045 Q0115 898
[7]  Q0045   -> Q0130  [928]   ## Q0045 Q0120 928
[8]  Q0045   -> Q0140  [201]   ## Q0045 Q0130 201
[9]  Q0045   -> Q0250  [222]   ## Q0045 Q0140 222
[10] Q0045   -> Q0275  [999]   ## Q0045 Q0250 999

# Here is the call to get the weights with its output, just to verify
R> head(E(wtf$graph)$weight ,10)
 [1] 433 186 716 997 882 898 928 201 222 999

See how my first edge in the igraph is really the second edge in myinput file, BUT its weight is the one from the first edge of my inputfile? I'm stumped.

I tried my code on a smaller test file, and it works fine. The realdata file has 623,530 edges. In the original data file, each edge isrepeated once to represent the undirected nature of proteininteraction networks, eg there is:


A B <weight>
... somewhere down the line ...
B A <weight>

My small test file did the same thing, and it worked fine. Also theproblem still persists whether I specify my igraph to be directed orundirected prior to constructing.

I feel like I'm doing something wrong, since I'm imagining thisbehavior would have been spotted by now if there's a bug, but I can'tsee where I'm making a mistake.


Any help to either issue would be greatly appreciated.

Thanks,
-steve

[Prev in Thread]

Current Thread

[Next in Thread]

[igraph] Subsetting graph by edges + building large graph troubles (R interface), Steve Lianoglou <=
- Re: [igraph] Subsetting graph by edges + building large graph troubles (R interface), Gábor Csárdi, 2009/03/04
  - Re: [igraph] Subsetting graph by edges + building large graph troubles (R interface), Steve Lianoglou, 2009/03/04
- [igraph] Re: Subsetting graph by edges + building large graph troubles (R interface), Steve Lianoglou, 2009/03/04

Prev by Date: Re: [igraph] Windows build requirements for 0.6
Next by Date: Re: [igraph] Subsetting graph by edges + building large graph troubles (R interface)
Previous by thread: [igraph] Windows build requirements for 0.6
Next by thread: Re: [igraph] Subsetting graph by edges + building large graph troubles (R interface)
Index(es):
- Date
- Thread