igraph-help
[Top][All Lists]

## [igraph] Faster way to detect duplicate graphs in a vector ptr?

 From: Timothy Rice Subject: [igraph] Faster way to detect duplicate graphs in a vector ptr? Date: Wed, 29 Jan 2014 09:29:09 +1100 User-agent: Mutt/1.5.21 (2010-09-15)

```Hello,

I have a question which may be a little similar to the backtracking matrix
question, although I don't think the solution will be the same.

TL;DR my code has a bottle neck when repetitively calling a function that
checks an igraph_vector_ptr_t for graphs identical to another given graph.

Overall, the program is intended to count the number of candidate
aggregates of two input trees. The algorithm starts with the edge
intersection, then incrementally add edges from the symmetric difference
whenever this preserves acyclicity.

My initial implementation didn't check for duplicates so the count
ended up greatly exaggerated. Now that I'm checking for duplicates, each
iteration requires re-checking the vector of previously discovered trees.
The duplicate_check function takes the vector ptr and two matrices. One of
the matrices gives the adjacencies of the new candidate tree. The other
matrix is used in a loop. It holds the adjacencies from the vector of
existing trees. The code is as follows:

igraph_bool_t duplicate_found(igraph_vector_ptr_t *N,
long int n;
for(n = 0; n < igraph_vector_ptr_size(N); ++n) {
return 1;
}
}
return 0;
}

If the function returns 0, no duplicates were found, so the graph
corresponding to adj_g can be added to the end of the vector N.

My code runs okay for small input trees, say up to 10 nodes, but as the
powerset of the symmetric difference grows something like exponentially,
things start slowing down a lot after that. When I compile with
"-pg" and run gprof, I see that the biggest bottleneck is the above
duplicate-checking function. Therefore, I'd like to squeeze a bit more
efficiency out of that function if possible.

It has occurred to me that at the same time as adding new trees to the
vector N, I could also be adding their matrices to another vector; or I
could define a structure that contains both a graph and the graph's
adjacency matrix, and make an igraph_vector_ptr_t that holds instances of
this struct. This would save recomputing *adj_N more than once per tree.

Another idea was to try using threads to check multiple trees in N
simultaneously.

Unfortunately, implementing either of these ideas would add a lot more
complexity to my code. I'd rather avoid so much overhead if I can help it.

So my question is, is there some faster way of scanning the vector of trees
for duplicates that isn't going to add crazy extra complexity to my program?
To be honest, I'm a little pessimistic that any simple solution exists, but
maybe I've overlooked something due to inexperience with C and igraph.

Cheers,

Tim Rice
--
PhD Candidate
A: Room 210, Richard Berry (Maths & Stats), Melbourne University