[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gsl] bugs in gsl_cdf_hypergeometric_P/gsl_cdf_hypergeometric_Q
From: |
Stijn van Dongen |
Subject: |
[Bug-gsl] bugs in gsl_cdf_hypergeometric_P/gsl_cdf_hypergeometric_Q |
Date: |
Sat, 9 Feb 2008 13:57:55 +0000 |
User-agent: |
Mutt/1.5.9i |
Hi,
I believe there is a bug in the functions gsl_cdf_hypergeometric_P and
gsl_cdf_hypergeometric_Q in gsl-1.10 when computing the midpoint. This can
result in severely wrong probabilities (as a result of computing the
non-optimal tail), including negative probabilities. A demonstration program
is included below.
Explanation: The relevant statement is
double midpoint = (int) (t * n1 / (n1 + n2));
The integer multiplication t * n1 may overflow. With certain
input parameters it does; this causes the wrong tail to be computed,
leading to accumulation of rounding errors, and erroneous results
including negative probabilities. Running the example program yields
midpoint should be: 3899.52, but it is: 9
midpoint should be: 3899.52, but it is: 9
hyper [ > 3511 1.000000006 ] [ <= 3511 -6.46390208e-09 ]
The midpoint is then compared with the variable k which has value 3511,
leading to a choice of the wrong tail to compute.
(the first two lines are from a patched version of cdf/hypergeometric.c that
prints out the midpoint). I believe that the correct computation of midpoint
should be:
double midpoint = (1.0 * t * n1) / (n1 + n2);
A (trivial) diff to this effect is attached.
On a side note, is there any knowledge/documentation available on the range
of parameters for which these functions are supposed to give good/reasonable
precision?
regards,
Stijn
Example program (run on a system where integers are 4 bytes):
----------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <gsl/gsl_cdf.h>
int main
( int argc
, char* argv[]
)
{ int k = 3511
; int n1 = 76200
; int bg = 13250474
; int t = 678090
; double f = gsl_cdf_hypergeometric_Q(k, n1, bg-n1, t)
; double g = gsl_cdf_hypergeometric_P(k, n1, bg-n1, t)
; fprintf(stdout, "hyper [ > %d %.10g ] [ <= %d %.10g ]\n", k, f, k, g)
; return 0
; }
----------------------------------------
--
Stijn van Dongen >8< -o) O< forename pronunciation: [Stan]
Wellcome Trust Sanger Institute, /\\ Tel: +44-(0)1223-494785
Hinxton, Cambridge, CB10 1SA, UK _\_/ http://www.sanger.ac.uk/Users/svd/
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
hypergeometric.c.diff
Description: Text document
- [Bug-gsl] bugs in gsl_cdf_hypergeometric_P/gsl_cdf_hypergeometric_Q,
Stijn van Dongen <=