[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] clustering coefficients for bipartite networks
From: |
Tamas Nepusz |
Subject: |
Re: [igraph] clustering coefficients for bipartite networks |
Date: |
Thu, 24 Feb 2011 18:06:41 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Lightning/1.0b2 Thunderbird/3.1.7 |
Hi Jordi,
I haven't had time to experiment with your code too much, but I also
feel that the calculations for the node pairs are correct (the only
thing I would have done differently is cc_dot_pairs where you can save a
union operation by knowing that len(union) = len(unei) + len(vnei) -
len(intersect)). One thing that seems to be different in your code and
in the authors' (but I haven't checked it thoroughly) is that when they
calculate the second-order neighbors of the original vertex, they
exclude the vertex itself while you don't. This means that your sums and
averages will include the vertex's "similarity" (cc_dot, cc_min, cc_max)
with itself, which is always going to be 1.
--
Tamas
On 02/24/2011 11:23 AM, jordi torrents wrote:
> Tamàs, Simone and all,
>
> Sorry for top posting and for resurrecting an old thread. I've been
> working on the implementation of the different kinds of bipartite
> clustering coefficients proposed by Latapy et al (2008), starting with
> the implementation proposed by Simone and refined by Tamàs in this
> thread. My implementation (see script below) is orders of magnitude
> faster than Simone's because it avoids doing two nested loops over the
> nodes and only explores nodes which have common neighbors, as proposed
> in the paper. My implementation and Simone's yield the same results
> but both differ from the results provided in the paper. I've put
> together a test using the smallest network from the paper (arXiv
> coauthor network) to illustrate the problem. You can download the test
> network in graphml format (with node attribute "type" that identifies
> top and bottom nodes) and the script from:
>
> http://www.milnou.net/~jtorrents/bipartite_cc/coauthoring_latapy_2mode.graphml
> http://www.milnou.net/~jtorrents/bipartite_cc/test_bip_cc.py
>
> The output of the script is:
> =====================================================================
> $ python test_bip_cc.py
> Code based on the paper:
> Matthieu Latapy, Clémence Magnien and Nathalie Del Vecchio.
> Basic Notions for the Analysis of Large Two-mode Networks.
> Social Networks 30 (1), p. 31-48, 2008
>
> Loading coauthorship network from the paper. You can download it from:
> http://www.milnou.net/~jtorrents/bipartite_cc/coauthoring_latapy_2mode.graphml
>
> The authors report the results in Table 3 p. 41
> cc_dot_top = 0.29; cc_dot_bottom= 0.31
> cc_min_top = 0.56; cc_min_bottom= 0.73
> cc_max_top = 0.36; cc_max_bottom= 0.33
>
> Testing Simone's implementation of cc_dot:
> cc_dot_top = 0.47; cc_dot_bottom= 0.56; time spend = 2417 seconds
>
> Testing my implementation of cc_dot:
> cc_dot_top = 0.47; cc_dot_bottom= 0.56; time spend = 3 seconds
>
> Testing my implementation of cc_min:
> cc_min_top = 0.70; cc_min_bottom= 0.85; time spend = 3 seconds
>
> Testing my implementation of cc_max:
> cc_max_top = 0.53; cc_max_bottom= 0.57; time spend = 3 seconds
>
> The results do not match!
>
> It seems that the problem is in the computation of CC for individual
> nodes (see pp. 40-41)
> because tests for pairwise bipartite clustering coefficients work with
> simple examples:
>
> Testing functions of different kinds of bipartite clustering
> coefficients between pairs of nodes
> using 3 example graphs from figure 5 p. 40 Latapy et al (2008)
> Example 0 correct!
> Example 1 correct!
> Example 2 correct!
> ===============================================================================
>
> As you can see the results do not match with those provided in the
> paper. My coding experience is quite limited, so maybe I did some
> mistakes in the implementation. I'd appreciate if someone could revise
> the code and provide any clue on how to fix it.
>
> Thanks in advance.
> Salut!
>
> 2011/2/2 Tamas Nepusz <address@hidden>:
>> Hi,
>>
>> The ccdot attribute is probably a list at the end of your function, and
>> lists cannot be saved as GraphML attributes. If you need them later,
>> save the graph in pickle format which preserves all the attributes
>> irrespectively of their types. (But of course you cannot load the graph
>> from another program, only from Python).
>>
>> --
>> T.
>>
>> On 02/02/2011 01:00 PM, Simone Gabbriellini wrote:
>>> Hi,
>>>
>>> yes, right, I am now doing it... but a strange thing happens: I initialize
>>> all the attributes to 0, but after my function updates them, I save the
>>> network as a graphml and those attributes are missing... If I try to save
>>> the graphml file before my CCBN function, they are all there, set to 0...
>>>
>>> Am I deleting them in some way with this code?
>>>
>>> best,
>>> simo
>>>
>>> Il giorno 01/feb/2011, alle ore 21.20, Tamás Nepusz ha scritto:
>>>
>>>> Hi,
>>>>
>>>> Cache the values of g.vs(type=0) and g.vs(type=1) in advance, otherwise
>>>> you will be calculating them over and over again needlessly. (The first
>>>> innerfor loop will calculate g.vs(type=0) as many times as many vertices
>>>> of type=0 you have). Similarly, calculate the neighbor sets in advance to
>>>> avoid creating them all the time from scratch.
>>>>
>>>> Also, there's no need to convert set(unei+vei) back to a list, just use
>>>> len(set(unei) | set(vnei)).
>>>>
>>>>> I have come to this solution but I don't know if I can consider it a fast
>>>>> one:
>>>>>
>>>>> def CCBN(self):
>>>>> for u in self.g.vs(type=0):
>>>>> ccdot = []
>>>>> for v in g.vs(type=0):
>>>>> unei = g.neighbors(u)
>>>>> vnei = g.neighbors(v)
>>>>> if len(set(unei) & set(vnei)) > 0:
>>>>> ccdot.append(len(set(unei) & set(vnei)) /
>>>>> len(list(set(unei + vnei))))
>>>>> u['ccdot'] = [float(sum(ccdot)) / len(ccdot) if len(ccdot) > 0
>>>>> else 0]
>>>>> for u in g.vs(type=1):
>>>>> ccdot = []
>>>>> for v in g.vs(type=0):
>>>> The line above should probably be g.vs(type=0). Also, note that the two
>>>> for loops are almost the same, only the vs the for loop is iterating over
>>>> is different, so I would probably put the two for loops in an auxiliary
>>>> function and just call it twice, once with g.vs(type=0) and once with
>>>> g.vs(type=1).
>>>>
>>>> --
>>>> Tamas
>>>>
>>>>
>>>> _______________________________________________
>>>> igraph-help mailing list
>>>> address@hidden
>>>> http://lists.nongnu.org/mailman/listinfo/igraph-help
>>> _______________________________________________
>>> igraph-help mailing list
>>> address@hidden
>>> http://lists.nongnu.org/mailman/listinfo/igraph-help
>>>
>> _______________________________________________
>> igraph-help mailing list
>> address@hidden
>> http://lists.nongnu.org/mailman/listinfo/igraph-help
>>
> _______________________________________________
> igraph-help mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/igraph-help
>
- Re: [igraph] clustering coefficients for bipartite networks, Simone Gabbriellini, 2011/02/01
- Re: [igraph] clustering coefficients for bipartite networks, Simone Gabbriellini, 2011/02/01
- Re: [igraph] clustering coefficients for bipartite networks, Tamás Nepusz, 2011/02/01
- Re: [igraph] clustering coefficients for bipartite networks, Simone Gabbriellini, 2011/02/01
- Re: [igraph] clustering coefficients for bipartite networks, Tamás Nepusz, 2011/02/01
- Re: [igraph] clustering coefficients for bipartite networks, Simone Gabbriellini, 2011/02/02
- Re: [igraph] clustering coefficients for bipartite networks, Tamas Nepusz, 2011/02/02
- Re: [igraph] clustering coefficients for bipartite networks, jordi torrents, 2011/02/23
- Re: [igraph] clustering coefficients for bipartite networks, jordi torrents, 2011/02/24
- Re: [igraph] clustering coefficients for bipartite networks,
Tamas Nepusz <=
- Re: [igraph] clustering coefficients for bipartite networks, jordi torrents, 2011/02/24
- Re: [igraph] clustering coefficients for bipartite networks, Simone Gabbriellini, 2011/02/26