igraph-help
[Top][All Lists]

## Re: [igraph] igraph R: fit_power_law

 From: Tamas Nepusz Subject: Re: [igraph] igraph R: fit_power_law Date: Mon, 5 Aug 2019 16:32:12 +0200

Dear Sander

1. The igraph documentation suggests that the bfgs function is used to estimate the power law alpha, but I think the C implementation relies on the  Broyden-Fletcher-Goldfarb-Shanno optimization function of the lbfgs library instead. Is that correct?
This is the exact implementation of the BFGS optimization that we use in power law fitting:

https://github.com/ntamas/plfit/blob/master/src/lbfgs.c

As far as I know this is the C port of the limited memory variant of the Broyden-Fletcher-Goldfarb-Shanno method, originally written in FORTRAN. The license notes in the source code might give you more clues.

1. The fit_power_law function relies on the MLE function of the stat4 package. I am curious why this was deprecated, given the availability of plfit and MLE parameters. Is this simply a memory issue?
I don't know; this is purely in the domain of the R interface of igraph; the C core uses the L-BFGS method and my "plfit" library:

https://github.com/ntamas/plfit

The plfit library is an efficient implementation of the method published by Clauset, Shalizi and Newman:

Clauset A, Shalizi CR and Newman MEJ: Power-law distributions in empirical data. SIAM Review 51, 661-703 (2009).

1. How to interpret the p-value of the Kolmogorov-Smirnov test?
See the paper cited above for more details.

1. The igraph help file states: "Small p-values (less than 0.05) indicate that the test rejected the hypothesis that the original data could have been drawn from the fitted power-law distribution" . The C implementation of the KS test in igraph uses the Hurwitz Zeta function. Shouldn't this mean that high p-values indicate a good model fit, as suggested by Clauset et al (2009:678)?
Well, tests based on p-values are not really about whether a model is a "good fit" or a "bad fit"; a low p-value _roughly_ says that "it is very unlikely that the data could have been generated from the hypothesized distribution" (in our case, a power-law). A high p-value _roughly_ means that "the data may have come from the hypothesized distribution"; however, there could be alternative distributions that can describe the data just as well.

So, in a nutshell:

low p-value --> null hypothesis (power-law) rejected --> data is likely not a power-law
high p-value --> null hypothese (power-law) _not_ rejected --> data could come from a power-law, or maybe from something else, we don't know, we just could not _exclude_ the power-law

All the best,
T.