Re: Parameters for gsl_cdf_fdist_Q for p-value to compare nested models

help-gsl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parameters for gsl_cdf_fdist_Q for p-value to compare nested models

From:	Peter Johansson
Subject:	Re: Parameters for gsl_cdf_fdist_Q for p-value to compare nested models
Date:	Sun, 25 Dec 2022 08:56:50 +1000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

Hi Stephan,

You might have better luck asking your question on a stats list, buthere are my quick thoughts


On 22/12/22 19:14, Stephan Lorenzen wrote:

Dear list,
I want to compare how well different nested models fit my data, but Iam not sure how to choose the parameters, and the more I google themore confused I am. Since tests on my real data gave way too smallp-values, I decided to do tests on random data. The p-value issupposed to tell me how much better the full model fits the data, i.e.how much signal for the additional parameters is hidden in the data.Since I use random data (there is no signal at all), I would expect auniform distribution between 0 and 1 for the p-values if I comparefull vs nested models.
I do 10000 runs with 20 random normal distributed X and Y values(using gsl_ran_gaussian), equivalent to 20 data points. I do two fits:
M1: y =         a1x + a0  -> params1=1 (a0 does not count), df1=20-1-1=18
M2: y = a2x^2 + a1x + a0  -> params2=2 (a0 does not count), df2=20-2-1=17

I then calculate both errors (sum of squared residuals) and calculate F:

F = ((err1-err2) / (df1-df2)) / (err2/df2)

and calculate the p-value using

p=gsl_cdf_fdist_Q(F, df1-df2, df2)
I would expect a uniform distribution between 0 and 1, but thedistribution is skewed and shows way more small values then big ones(see attached file), stating that the full model is "better" in mostcases. Obviously, there is something wrong, so I have a couple ofquestions:
- is it correct that constant values (a0) do not count as parameters?

No, that seems wrong. If you have two data points and a model y = a1*x +a0, the model can always be perfectly fit to the data, i.e., zero df.

- is my calculation of the degrees of freedom correct?
- did I calculate F correctly?

I usually calculate it as a ratio squared error for one model oversquared errors of the other. I'd start with the log-likelihoods andcalculate the log-likelihood ratio. Possibly it can be translated toyour formula, but looks odd.

- did I insert the right parameters in gsl_cdf_fdist_Q?
- is my assumption that I expect a uniform distribution of p-valuescorrect?



That's correct (given that the null hypothesis is true).


Cheers,

Peter

[Prev in Thread]

Current Thread

[Next in Thread]

Parameters for gsl_cdf_fdist_Q for p-value to compare nested models, Stephan Lorenzen, 2022/12/22
- Re: Parameters for gsl_cdf_fdist_Q for p-value to compare nested models, Peter Johansson <=

Prev by Date: Parameters for gsl_cdf_fdist_Q for p-value to compare nested models
Previous by thread: Parameters for gsl_cdf_fdist_Q for p-value to compare nested models
Index(es):
- Date
- Thread