igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] personalized pagerank computation issue


From: Tamas Nepusz
Subject: Re: [igraph] personalized pagerank computation issue
Date: Tue, 28 Jan 2020 15:34:46 +0100

Hi Omer,

Your comment rang a bell -- I remembered that this issue has already popped up back in 2014 when we first switched from ARPACK to PRPACK; see the following issue in the issue tracker:

https://github.com/igraph/igraph/issues/671

Memories of mine about PRPACK were much fresher by then as I wrote the following:
After reading the thread, it also became apparent that PRPACK is doing some trickery with sink nodes; the following is from @dgleich, one of the original authors of PRPACK:

"For the case that you are jumping to nodes that have no outgoing edges, what happens is you add new edges according to the teleportation set/reset set or the special set "u"."

So, all in all, the case is probably that your implementation matches the ARPACK implementation of PRPACK from igraph, and that's why you are seeing identical results (both your code and igraph's ARPACK implementation formulates PageRank as an eigenvector problem and solves that). In case of PRPACK, PageRank is not an eigenvector problem any more; again, quoting @dgleich:

"PRPACK does decompose the graph into SCCs, but the primary advantage is that it frames the PageRank problem as a linear system instead of the eigenvalue problem. This has tremendous numerical advantages."

There was also a PR quite a while ago that attempted to introduce the possibility of specifying the personalization vector and the reset vector separately, but it did not get merged in the end:


Another issue where this question popped up -- maybe it also provides more insight:


Best,
T.


On Mon, 27 Jan 2020 at 23:17, Yalcin, Omer Faruk <address@hidden> wrote:
Thank you very much for tracking the code! Unfortunately, that doesn't work either. I am also fairly certain that allowing the node to stay at the same spot would give that node an unwarranted boost in pagerank, so it is probably undesirable.

I do have an interesting result though; when I use "arpack" instead of the default "prpack", I get the exact same results as my custom written function. In other words, when there are nodes that have no outgoing edges, "prpack" and "arpack" do the computation differently.

My problem seems to be solved (as long as there is no reason why "arpack" is wrong) but this difference between the two algorithms might be of interest to you.

Thank you very much.
Omer

From: Tamas Nepusz <address@hidden>
Sent: Monday, January 27, 2020 4:15 PM
To: Yalcin, Omer Faruk <address@hidden>
Cc: Help for igraph users <address@hidden>
Subject: Re: [igraph] personalized pagerank computation issue
 

That being said, after your question, I set the probability of navigating to other nodes from a node that has no outbound links to the personalization vector. That doesn't reproduce the igraph result either.
There's also a third option: if there are no outbound nodes, stay at the same node with probability equal to 1-damping, _or_ navigate to a randomly picked node accoding to the persionalization vector with probability equal to damping. Sorry for not being too precise here; the thing is that igraph is using an external library (PRPACK) to calculate personalized PageRank scores, and I only managed to track the code to a point where I am convinced that we are passing down two  vectors to PRPACK; one is a uniform vector, and the other one is the personalization vector submitted by the user. Based on this, I would assume that PRPACK uses the personalization vector when the random walk is reset, and the uniform vector for a random teleport (after all, why would PRPACK need two vectors if it used the personalization vector for both cases?), but I did not manage to track it down further because PRPACK contains at least six different solvers, optimized for different use-cases, and I did not manage to figure out which one it would use in your particular case. But I'm pretty sure that the discrepancy between your results and ours is due to some corner case in the handling of sink nodes.

T.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]