igraph-help
[Top][All Lists]

Re: [igraph] Random Walk Sample

 From: Scott Hale Subject: Re: [igraph] Random Walk Sample Date: Fri, 10 Jan 2014 13:40:12 +0900 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

```Hi All,

```
This is quite late, but thought I would share anyway in hope it might help someone. I'm using the Random Jump sampling approach Leskovec and Faloutsos outline. The pure random walk sampling is similar but more complex because of the possibility it could get stuck in a sink / isolated component. There are probably some optimizations that could yet be made, but it ran acceptably well for my use. Thank you for the earlier code and optimizations.
```
library(igraph)

#Random Jump (RJ) Sample of ncount nodes from network graph
randomJumpSample<-function(graph,ncount,teleport=0.15) {

node <- sample.int(vcount(graph), 1)
selected <- rep(NA,ncount)
selected[[1]]<-node
i<-2

while(i<=ncount) {
neigh<-neighbors(graph,node)
if(length(neigh)==0 | runif(1)<=teleport){
node <- sample.int(vcount(graph), 1)
} else {
node <- sample(neigh,1)
}
if (sum(node==selected,na.rm=TRUE)==0) {
selected[[i]]<-node
i<-i+1
#print(paste0("We now have ",i," nodes."))
}
}
return(induced.subgraph(graph, selected))
}

gSampled<-randomJumpSample(mygraph,50000)

Best wishes,
Scott

On 24 Oct 2013 Tamás Nepusz wrote:
```
```Hi Thomas,

1) instead of length(degree(g)), just use vcount(g)

2) neighbors(g, node1) can be queried outside the interval while() loop and
then stored in a temporary variable because it won't change during the lifetime
of the inner loop

3) if you are sampling from the range 1:n, use sample.int() instead of sample()

4) instead of rbinom(1, 1, p), use runif(1)<p -- it is probably faster

I think this should make things faster -- let me know if it is still too slow.

--
T.

On 24 Oct 2013, at 15:10, Thomas <address@hidden> wrote:

```
```I'm creating a sample of nodes according to the random walk procedure
described in Section 3.3.3 of:

http://www.stat.cmu.edu/~fienberg/Stat36-835/Leskovec-sampling-kdd06.pdf

The following R code samples no less than 300 nodes, although it might sample
the same node twice but it runs really slowly. Does anyone know why it might
be going so slow? Is there any better way to do this?

Thank you,

Thomas

#Random Walk Sample of nodes from network g
#Read graph g in as UNDIRECTED

A <- sample(1:length(degree(g)), 1)
oput <- c()
oput <- c(oput, A)
flag <- FALSE
count <- 1

while(count <= 300)
{
node1 <- A
while(flag==FALSE)
{
node2 <- sample(neighbors(g,node1),1)
oput <- c(oput, node2)
count <- count + 1
node1 <- node2
if(rbinom(1,1,0.15)==1){flag=TRUE}
}#end of while flag loop
}#end of while count loop
This message and any attachment are intended solely for the addressee and may
contain confidential information. If you have received this message in error,
please send it back to me, and immediately delete it.   Please do not use,
copy or disclose the information contained in this message or in any
attachment.  Any views or opinions expressed by the author of this email do
not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system,
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

_______________________________________________
igraph-help mailing list
```