help-octave
[Top][All Lists]

## Re: Study about accuracy of statistical software, incl. Octave

 From: David Thompson Subject: Re: Study about accuracy of statistical software, incl. Octave Date: Tue, 24 Mar 2009 12:57:34 -0700

```On Tue, 2009-03-24 at 13:28 -0600, Jaroslav Hajek wrote:
> On Tue, Mar 24, 2009 at 7:46 PM, Marco Caliari <address@hidden> wrote:
> > On Tue, 24 Mar 2009, Jaroslav Hajek wrote:
> >> On Tue, Mar 24, 2009 at 7:03 PM, Marco Caliari <address@hidden> wrote:
> >>>> And Marco Caliari writes:
> >>>>> The other deficiencies are much harder to fix. I will give a look.
> >>>> std() could be fixed relatively easily by calling the BLAS's
> >>>> routines (SNRM2, DNRM2, SCNRM2, DZNRM2) rather than relying on
> >>>> sqrt (sumsq (...)).  The half drop in precision is a typical
> >>>> failure mode for implementations that don't scale.  I suspect
> >>>> that fixing std() may fix corrcoef and help anova.
> >>> I already tried with a .m implementation of dnrm2, without any
> >>> improvement.
> >>> For interested people, the sample vector is
> >>> v=[10000000.2,repmat([10000000.1,10000000.3],1,500)];
> >>> whose exact mean is 10000000.2 and exact standard deviation 0.1.
> >>>
> >> What do you mean by "exact mean" here? I don't think any of the three
> >> numbers are exactly representable in IEEE floating-point. It would be
> >> better to choose numbers that have an exact representation, such as
> >> 1e7 + 0.25, 1e7 + 0.5, 1.7 + 0.75 or such. Otherwise, you can't avoid
> >> introducing errors just by writing these numbers.
> > I agree with you, Jaroslav. On the other hand, in the paper it is written
> > that R can compute the standard deviation with 15 correct digits.
> > Actually, Octave can compute the mean with 14 correct digits and with 15
> > correct digits with the patch I sent. The standard deviation with 8 correct
> > digits.
> But what is the correct answer? Just writing those numbers into Octave
> introduces an error ~ 3e-17. I *don't* expect the "correct" answer is
> 1e7 + 0.2. You can't expect Octave to guess your actual numbers.

Part of the problem *may* be that you are not computing the mean and
subtracting it before squaring -- instead relying on the fact that
variance = sum(x.**2) - sum(x).**2. (I haven't verified this is what
Octave does, but if it does, that would be a big source of error.) A
colleague of mine, Philippe Pébay, has written a short report with
update formulas for computing the variance in an accurate way without
having to first compute the mean. See
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
for more detail and a link to the report. There is also an open-source
implementation in VTK (www.vtk.org) of the on-line update formulas.

David

```