igraph-help
[Top][All Lists]

## Re: [igraph] Compare clusterings (partitions of a set)

 From: Richard Bowman Subject: Re: [igraph] Compare clusterings (partitions of a set) Date: Sat, 24 Jul 2010 09:49:08 -0700

Hi Albert,

Tamas and Stijn are correct: it is definitely not a trivial problem. I
did this in my dissertation with the "clue" cluster ensemble R package
and the cl_agreement(...) and cl_dissimilarity(...) functions, which
provide a large range of choices for comparing two partitions of a
set, which is what the membership vectors are. I chose the Jaccard and
corrected Rand similarities, but most methods give very similar
results, qualitatively. Review the literature referenced in the manual
for information about what the different measures mean, because there
are many ways to compare. I attached some code for this below.

Code:
library("clue")

x <- as.hard.partition(c(1,1,1,1,2,2,2,2))
y <- as.hard.partition(c(3,3,3,3,5,5,5,5))
z <- as.hard.partition(c(1,2,3,4,5,6,7,8))

cl_agreement(x,y,method = cRand)
cl_agreement(x,z,method = cRand)
cl_agreement(y,z,method = cRand)

The Jaccard method is slightly bugged in clue 0.3-33, but should be
fixed in versions after that.

Good luck,
-Richard

__________________________
Richard Bowman
Doctoral Fellow

-----------------------------------------------------------------
>
> Message: 1
> Date: Fri, 23 Jul 2010 17:19:50 +0100
> Subject: Re: [igraph] Compare clusterings
> To: Help for igraph users <address@hidden>
> Content-Type: text/plain; charset=us-ascii
>
>>   Hello!  I installed igraph as an R package. If i run two clustering
>>   algorithms on the same graph and i have the two membership vector
>>   then, how can i compare the two partitions? I would like to know
>>   which points changed their cluster?  Thank you  Albert
> Well, the problem is not as simple as it seems; consider the following
> two membership vectors:
>
> c(1,1,1,1,2,1,2,2,2,2)
> c(2,2,2,2,2,1,1,1,1,1)
>
> The simplest solution would be the following:
>
>> which(x != y)
> [1]  1  2  3  4  7  8  9 10
>
> But this is obviously wrong, since if we associate cluster 1 in the
> first vector with cluster 2 in the second vector and vice versa, then it
> is easy to see that only positions 5 and 6 have changed. The problem
> becomes even more complicated if there are more than two clusters and
> the number of clusters is not equal in the two vectors. For instance,
> which points would you consider to have changed their membership here?
>
> c(1,1,1,1,2,2,2,2,3,3,3,3)
> c(1,2,1,2,1,2,1,2,1,2,1,2)
>
> --
> Tamas
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 23 Jul 2010 17:34:38 +0100
> From: Stijn van Dongen <address@hidden>
> Subject: Re: [igraph] Compare clusterings
> To: Help for igraph users <address@hidden>
> Content-Type: text/plain; charset=us-ascii
>
>   Hi,
>
> On Fri, Jul 23, 2010 at 05:19:50PM +0100, Tamas Nepusz wrote:
>> >   Hello!  I installed igraph as an R package. If i run two clustering
>> >   algorithms on the same graph and i have the two membership vector
>> >   then, how can i compare the two partitions? I would like to know
>> >   which points changed their cluster?  Thank you  Albert
>> Well, the problem is not as simple as it seems; consider the following
>> two membership vectors:
>>
>> c(1,1,1,1,2,1,2,2,2,2)
>> c(2,2,2,2,2,1,1,1,1,1)
>
> <snip>
>
> Tamas is correct of course, and if two clusterings are very different
> it will be generally impossible to establish such a simple
> transformational pattern of points changing clusters.
> A less ambitious but informative and doable approach is to consider the
> contingency table
> between the two clusterings  -- simply a table containing the sizes of the
> intersections of all possible pairings of clusters.
>
> Two clusters (say c1 and d1) that have a large intersection (say larger than
> either of the two set differences) can serve as anchors between the two
> clusterings; you could say an abstract set S is represented by c1 in the first
> clustering and d1 in the second clustering, and relative to S you could make
> statements of transformations such as you suggest. But the number of anchors
> you are able to find is not clear in advance. It could be a lot or a few, but
> the scenario that not both of your clusterings are fully covered by such
> anchors is pretty realistic.
>
> I have not used igraph a lot yet, so I don't know what would be a good way
> to create such a contingency table.
>
> best,
> Stijn
>
> --
> Stijn van Dongen         >8<        -o)   O<  forename pronunciation: [Stan]
> EMBL-EBI                            /\\   Tel: +44-(0)1223-492675
> Hinxton, Cambridge, CB10 1SD, UK   _\_/   http://micans.org/stijn
>
>
>
> ------------------------------
>
> _______________________________________________
> igraph-help mailing list