|Subject:||Re: [igraph] graph representation in memory|
|Date:||Mon, 23 Mar 2009 18:35:57 -0400|
you are completely right, it is better to use (long) int instead of
double, this was a design mistake on my part.
As for the work, it should not be very difficult. In theory, all
functions that you need to rewrite are in type_indexededgelist.c.
Plus, there are three macros in igraph.h that you need to update or
rewrite as a function: IGRAPH_FROM, IGRAPH_TO and IGRAPH_OTHER.
This is theoretically enough for the C library and probably also for
Python. For R you need a bit more, you need to write some conversion
functions and rewrite iterators.R as well.
Many C functions transform the igraph_t into a plain adjacency list,
or one containing edge ids. For better performance you would need to
rewrite these too, they are igraph_adjlist_t and igraph_adjedgelist,
plus lazy versions of these. You can rewrite these just to use the
normal operations of your new data structure.
Maybe I forgot some minor things, but I thing roughly this is it.
Of course I can help you if you decide to work on this, assuming you
need any help. :)
May I ask you what kind of data structure you have in mind? Do you
intend to put it into igraph, or the goal is just to play with it a
Btw. I am not sure what you can actually do with a graph with 100
million nodes and a general graph library. You cannot calculate too
many things and perhaps you're better off coding them without using
igraph at all. But I understand that this is a school project. :)
On Mon, Mar 23, 2009 at 6:15 PM, Mati Vait <address@hidden> wrote:
> as a school project we are investigating scalability of graph algorithms
> on very large undirected graphs. Graphs under question have hundreds of
> millions of nodes, billions of edges), but we have run into some memory
> issues with igraph because of it's way of holding graph in memory
> exhausts ram really fast. We reached this conclusion by looking at
> comments on igraph_t in include/igraph.h and testing on cluster with
> 32GB ram for each node. At the moment we are able to hold in memory a
> graph with 100 million nodes and 300 million edges which uses about 27.9
> GB of ram (after tweaking igraph_real_t and igraph_integer_t types to
> use int instead of double).
> How difficult it is to implement another way of graph representation
> instead of current one? Some loss in work time could be tolerated.
> What code must be reimplemented in order to keep algorithms running?
> Some insight and references to specific documentation would be really
> Thank you for your time,
> igraph-help mailing list
Gabor Csardi <address@hidden> UNIL DGM
igraph-help mailing list
|[Prev in Thread]||Current Thread||[Next in Thread]|