igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] community detection algorithms


From: Larson, TR
Subject: Re: [igraph] community detection algorithms
Date: Tue, 23 Mar 2010 17:03:59 +0000
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Hi,
Thought I had got the glitches ironed out of label.propagation.community. However, I've come across this problem (igraph 0.5.3 on R 2.10.1):

> summary(G)
Vertices: 182
Edges: 228
Directed: FALSE
No graph attributes.
Vertex attributes: name.
Edge attributes: weight.

> which(clusters(G)$membership==0)
[1] 1 2
>  V(G)$name[clusters(G)$membership==0]
[1] "M166.0862T68" "M301.1548T68"


> which(label.propagation.community(G)==0)
[1]  1  2 90 91 92
>  V(G)$name[label.propagation.community(G)==0]
[1] "M166.0862T68"  "M301.1548T68"  "M593.2643T182" "M594.2676T182"
[5] "M595.2704T182"

vertices 1,2 belong together, and 90,91,92 belong together - but they cannot be lumped together.


As far as I understand, clusters() should decompose the graph into all is subgraphs, with no connections between any of the subgraphs. The community detection algorithm may decompose further - but it shouldn't return bigger membership groups than clusters() returns, as this would mean connecting subgraphs where there are no connections????

This only appears to happen for membership allocation 0; i.e. for everything else from membership 1:number of vertices-1, the results are as expected - i.e. there are the same or smaller membership groups for the label.propagation.output.

e.g.:

> which(clusters(G)$membership==1)
[1]   3   4   5 123 174 182
> which(label.propagation.community(G)==1)
[1]   3   4   5 123 174 182

> which(clusters(G)$membership==5)
[1] 12 13 60 61
> which(label.propagation.community(G)==5)
[1] 12 13


One solution would be to run cluster() and label.propagation.community() on each subgraph, but this is a bit messy...

any ideas would be most welcome

thnaks
Tony




Gábor Csárdi wrote:
Tony,

I would definitely pick the division with the highest modularity
score, if I were you. Think of this as an optimization problem, you
are trying to maximize the modularity. Even if an algorithm comes up
with a non-optimal modularity value a million times, that does not
mean that this is the "best" solution of the problem. The best
solution is the one with the highest modularity score.

Best,
Gabor

On Thu, Oct 29, 2009 at 4:46 PM, Larson, TR <address@hidden> wrote:
OK, I've now done a few more tests on label.propagation.community with my
data - just thought you'd be interested to know the results:

I've tried 3 methods to get a consistent membership result, with 100 runs of
each method  to evaluate consistency:

1) Using membership from the most frequent modularity score result (10000
iterations).  This gave the same membership in each of 100 runs, so the most
successful.  However, the modularity score results over 10000 iterations
were not normally distributed - there were 4 distinct frequency maxima, with
the most frequent value (i.e. the one used to calculate membership) only
having a frequency of about 0.6.

2) Using membership from the highest modularity score result (10000
iterations).  Very inconsistent - 3 or 4 different results through the 100
runs.

3) Aggregation of all memberships (no modularity taken into account). The
highest number of splits(as expected), but again inconsistent as in method
(2).

So I'm sticking with method (1) for now.

thanks
Tony



Tamas Nepusz wrote:
Hi Tony,

Walktrap and fastgreedy give the same membership results every time I run
the algorithms. However, label.propagation.community it gives slightly
different groupings each time. I have tried setting initial to a set vector
(either all 0 or 0:(length(V(g)$name)-1)) and fixed to a vector of all FALSE
values in case the algorithm results vary because of random starting values.
However, this has no effect.
Unfortunately the label propagation algorithm is inherently unstable;
even when the starting position is completely specified, there are
internal random decisions within the algorithm (e.g., the order in which
the nodes are visited during a single pass of the algorithm, or the
decision a node takes when two different labels appear in its
neighborhood with equal frequency). These random decisions are necessary
to avoid oscillations in the algorithm when it encounters bipartite or
near-bipartite structures within a graph. Section III B in the original
Phys Rev E paper of Raghavan et al (http://arxiv.org/abs/0709.2938)
deals with the problem of aggregating different results into one
consistent solution, but this is just one possible way of dealing with
the problem and it is not implemented in igraph. There are also some
modifications to the original algorithm that try to reduce the
instability of the algorithm, but these are not (yet) implemented in
igraph:

http://arxiv.org/pdf/0910.1154

If you are looking for one single consistent solution with the existing
tools provided by igraph, try running the label propagation algorithm
many times (say, 1000 times) and select a solution based on some
internal quality measure; for instance, the modularity metric (this can
readily be calculated by igraph).

Any comments would be most welcome; also any other suggestions on how to
extract related clusters from my graph structure.
Tom Gregorovic has sent a patch earlier to this mailing list which
implements a fast hierarchical community detection algorithm by Blondel
et al [1]. I'm in the process of integrating this algorithm to igraph
and it will be available soon (maybe next week) in one of the igraph
nightly builds. Also, a recent review paper of Fortunato et al gives a
fairly detailed overview of different community detection approaches on
graphs.

[1]
http://www.iop.org/EJ/article/1742-5468/2008/10/P10008/jstat8_10_p10008.html
[2] http://arxiv.org/pdf/0906.0612

--


_______________________________________________
igraph-help mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/igraph-help









reply via email to

[Prev in Thread] Current Thread [Next in Thread]