[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [igraph] Speed comparison of R and C
From: |
jeremy.raw |
Subject: |
RE: [igraph] Speed comparison of R and C |
Date: |
Tue, 8 Feb 2011 09:24:49 -0500 |
Dear William,
I've had some experience implementing routing algorithms based on
shortest paths using R, C and igraph.
As noted in the previous response on the igraph list, a key reason to
use R (or Python) rather than coding in C is that you get a great deal
more flexibility in exploring algorithmic variations if you use a
higher-level language. Having implemented a lot of high-performance
code over the years, I strongly recommend a strategy of starting with
small problems in a very expressive language, getting the basic
structure correct (and developing some test output that you are certain
works properly that you can use in testing later), then carefully
examining the system, language and algorithmic bottlenecks (e.g. what
dark corner does the code disappear into for long periods of time, and
where are you inadvertently using an O(n^3) or worse operation -- some
of that can be buried in the language or system you're using, but
careless implementation in C can also cause problems).
Along those lines, it's actually quite important to consider the
strengths and weaknesses of the language you're using. R has a lot of
optimization for processing vectors and matrices efficiently, and you
want to steer your algorithm toward the strengths of the language.
Igraph (which I use often) is based on some fairly efficient basic graph
structures, but is tuned overall to supporting general graph-theoretic
investigations (that is, algorithmic support and consistency trump
performance).
In light of that, and without seeing the content of the code you used to
perform your tests, I'd immediately raise the question of why you are
attaching attributes, and reading and writing them. My own inclination
(unless those attributes are going to be used within an igraph
algorithm, e.g. by providing weights for a shortest path computation --
and even there, you can send in a separate vector) would be to store a
row index value as a graph attribute and use an efficient R structure
(e.g. matrix) to hold the actual attribute values. That will make it
much quicker overall to access the attributes for reading/updating. You
can push such optimization pretty far: for example, consider
re-ordering the matrix of attributes by the edge ID once the graph has
been created so you can use the edge ID (+1 perhaps) to index directly
into the matrix of attributes. A lot of the R overhead for attribute
access is in dereferencing the attributes (that is, actually parsing the
R code that says how to locate the attribute), and I'll bet you could
probably get a 10x speedup at least by indexing directly off the edge
IDs into a native R structure.
Even using such strategies, you may still need to dip into C, but the
choice is not a binary one: For example, I wrote a set of routines for
doing convex optimization of flows between a subset of graph nodes where
the edge weight is flow dependent (this is a common operation in highway
traffic assignment modeling), and only a few elements of the interior
path building had to go into C to get acceptable performance, with the
rest of the code in R. In that case, I didn't use igraph at all, since
I had some shortest-path optimizations available that needed low-level
implementation and it was easier for me just to roll my own. But in
other projects, I've used various igraph algorithms out of the (R) box
very happily. One of the virtues of R is that the language itself is
pretty "close to the metal" for doing bulk numeric computations and with
a little care in how you set up data structures in R, it's not hard to
have the best of both worlds.
In any case, the bottom line, as always, is to really explore how to map
your algorithm onto data structures and the language's strengths so you
can attain the right balance of flexibility, adaptability,
comprehensiveness, verifiability and performance.
Enjoy!
Jeremy Raw, P.E., AICP
FHWA Office of Planning
address@hidden
(202) 366-0986
-----Original Message-----
From: address@hidden
[mailto:address@hidden On Behalf Of
William Tu
Sent: Monday, February 07, 2011 11:44 AM
To: address@hidden
Subject: [igraph] Speed comparison of R and C
Dear igraph users,
I plan to implement a routing algorithm using igraph. Performance is
critical for me and I'm thinking about using whether C or R interface.
Previous someone claims that there is about 10% overhead using R
interface. I did an experiment as follow:
1. Create a full graph
2. For each edges, attach two attributes
3. For all edges, randomly pick an edge and read/write its value
In R:
100 nodes, 4950 edges: 4 sec
200 nodes, 19900 edges: 111 sec
300 nodes, 44850 edges: 532 sec
In C:
100 nodes, 4950 edges: 0.036 sec
200 nodes, 19900 edges: 0.108 sec
300 nodes, 44850 edges: 0.227 sec
C is about 1000 times faster. However, in the mailing list, it seems
most people are using python or R, and there are few discussions and
docs about igraph in C. So I'm still hesitating. Does anyone have
similar experiences?
William Tu,
Stony Brook University
_______________________________________________
igraph-help mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/igraph-help
- [igraph] Speed comparison of R and C, William Tu, 2011/02/07
- Re: [igraph] Speed comparison of R and C, William Tu, 2011/02/08
- Re: [igraph] Speed comparison of R and C, William Tu, 2011/02/08
- Re: [igraph] Speed comparison of R and C, Tamás Nepusz, 2011/02/08
- Re: [igraph] Speed comparison of R and C, Gábor Csárdi, 2011/02/08
- RE: [igraph] Speed comparison of R and C, jeremy.raw, 2011/02/09
- [igraph] Bipartitle community detection, Liang Wang, 2011/02/09
- Re: [igraph] Bipartitle community detection, Tamás Nepusz, 2011/02/10
- RE: [igraph] Speed comparison of R and C, cheng-chun Tu, 2011/02/11