[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] iGraph noob - network size and memory

From: Tamas Nepusz
Subject: Re: [igraph] iGraph noob - network size and memory
Date: Mon, 14 Jun 2010 11:38:21 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

Hi Michael,

> Initially we'll be working on graphs with millions of nodes
> and edges, and possibly going up to hundreds of millions. The iGraph's
> description mentions working very efficiently with millions of edges.
> Does this extend to 10's or 100's of millions?
Probably yes, assuming that you have enough memory. I can definitely
generate an Erdos-Renyi random network with 10 million vertices and 200
million edges on my computer and run some basic queries on it:

>>> g= Graph.Erdos_Renyi(n=int(1e7), m=int(2e8))
>>> print g.maxdegree()

However, the graph takes up more than 8GB of memory at this stage, and
since my machine has only 4GB of physical memory, there is heavy
swapping involved. You'll need plenty of memory and of course a 64-bit
OS, but otherwise you should be fine.

> I've poked around a bit and seen some posts that suggest iGraph keeps
> its network data structure in main memory. Is that correct?
Yes, it is. It uses an indexed edge list data structure, which is
separated from the rest of the code in src/type_indexededgelist.c. The
rest of the code uses a well-defined interface to communicate with the
internal data structure, so if you need to use a different data
strcture, you only have to reimplement the functions in
src/type_indexededgelist.c and recompile igraph.

> Did something come of the suggestion to work on a branch with a
> (presumably) disk-based database for the graph data?
There is a branch on Launchpad that uses a PostgreSQL database as a
backend. It's not kept in sync with the main branch (I'm not sure if it
works now or not), it's not part of the official codebase, but it serves
as a proof of concept:


Feel free to contact the owner of the branch on Launchpad if you have
questions. The only catch is that there are some igraph functions which
explicitly construct an adjacency list representation of the graph in
memory for sake of efficiency, and this is not part of the basic graph
interface yet, so for these cases, you will end up with the whole graph
being loaded in memory anyway. I think the basic interface could be
modified so that igraph algorithms could send "hints" to the internal
data structure saying things like "hey, I'm gonna use lots of
neighbors() queries in the near future", and the internal data structure
could adapt (a database-based backend could ignore these hints while
an in-memory backend could construct the adjacency list). This requires
some modifications to the igraph core, but well, if you want to use a
database, you will need some modifications anyway.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]