igraph-help
[Top][All Lists]

## [igraph] Possible error in documentation of KS.p for R/igraph fit_power_

 From: Dan Suthers Subject: [igraph] Possible error in documentation of KS.p for R/igraph fit_power_law (or in my understanding) Date: Mon, 30 Sep 2019 01:38:29 -1000 User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:68.0) Gecko/20100101 Thunderbird/68.1.1

Dear igraph community,

Gábor Csárdi recommended that I post this question. It concerns inconsistency between documentation of fit_power_law and results.

The documentation for fit_power_law {igraph} says (I added underlining):

The ‘plfit’ implementation also uses the maximum likelihood principle to determine alpha for a given xmin; When xmin is not given in advance, the algorithm will attempt to find itsoptimal value for which the p-value of a Kolmogorov-Smirnov test between the fitted distribution and the original sample is the largest.

and

KS.p    Numeric scalar, the p-value of the Kolmogorov-Smirnov test. Small p-values (less than 0.05) indicate that the test rejected the hypothesis that the original data could have been drawn from the fitted power-law distribution.

This suggests that large KS.p means greater likelihood that the distribution could have come from the power-law distribution.

The interpretation of p values can be confusing, especially when sometimes bigger values mean a better fit and other times smaller values mean a better fit (as is the case in the poweRlaw package). To help students understand p value interpretation, I demonstrate by testing extreme cases where the outcome should be clear. However, unlike poweRlaw, the results I get for fit_power_law do not match the documentation:

In a complete graph, each of N vertices has degree N-1; definitely not a power-law. Yet:
> complete_deg <- replicate(10000, 10000-1)
> fit_power_law(complete_deg, implementation="plfit")
\$continuous
[1] FALSE

\$alpha
[1] 1

\$xmin
[1] 1

\$logLik
[1] 3.550132e-313

\$KS.stat
[1] 1.797693e+308

\$KS.p
[1] 1

If the explanation of KS.p is correct, this suggests a strong fit to power law, but clearly a flat uniform distribution should not, and look at KS.stat!!! Is the documentation reversed, or is this correct in that one can fit a flat power law (alpha=1) where p(k) = c/k ?

Another example is a random graph:
> gnm <- sample_gnm(1000, 50000)
> fit_power_law(degree(gnm), implementation="plfit")
\$continuous
[1] FALSE

\$alpha
[1] 30.73034

\$xmin
[1] 115

\$logLik
[1] -166.9513

\$KS.stat
[1] 0.02861747

\$KS.p
[1] 1

> min(degree(gnm))
[1] 69

> min(degree(gnm))
[1] 69
> mean(degree(gnm))
[1] 100
> max(degree(gnm))
[1] 132

This is easier to explain: xmin shows it is fitting to the far right end of the distribution, and alpha is well within the random-like regime. So, I get it that interpreting p without looking at the other parameters is risky.

However, looking at the other extreme, let's generate a distribution expected to follow the power law:
> sfp <- sample_fitness_pl(1000, 50000, 2.2)
> fit_power_law(degree(sfp), implementation="plfit")
\$continuous
[1] FALSE

\$alpha
[1] 2.515488

\$xmin
[1] 49

\$logLik
[1] -4372.05

\$KS.stat
[1] 0.05708254

\$KS.p
[1] 0.007706345

> min(degree(sfp))
[1] 28
> max(degree(sfp))
[1] 411

Here, the alpha is as expected for a scale free network, xmin includes most of the distribution, and KS distance and KS.p are small, but the documentation says that small values mean we REJECT the hypothesis that it follows a power law.

Is the documentation in error? Should it say we reject the hypothesis that it comes from a random distribution (as is common in the social sciences use of p values)?

Any light you can shed on this is much appreciated. -- Dan

```--
Dan Suthers

Dept. of Information and Computer Sciences
University of Hawaii at Manoa
1680 East West Road, POST 309, Honolulu, HI 96822
(808) 956-3890 office
Personal: http://www2.hawaii.edu/~suthers/
Lab: http://lilt.ics.hawaii.edu/
Department: http://www.ics.hawaii.edu/

```