[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Skipping NA's in statistical functions?
From: |
Jaroslav Hajek |
Subject: |
Re: Skipping NA's in statistical functions? |
Date: |
Sun, 8 Mar 2009 12:20:20 +0100 |
On Sun, Mar 8, 2009 at 11:12 AM, Søren Hauberg <address@hidden> wrote:
> Hi All
>
> Recently, there has been some discussion about how to treat NaN's in
> 'mean'. The thread somehow wandered of to the Octave-Forge list.
>
> The core of the problem is that people interpret NaN differently. Some
> people think that NaN is a way to show that something went wrong in a
> calculation, whereas others believe NaN's indicate missing values. I
> think it's fair to say that both opinions are true, and if we were
> Matlab users the discussion would end here.
>
> However, Octave supports the special case of NaN called NA (Not
> Available), so we can actually cope with both interpretations of NaN.
> At the moment we, however, do not do this. We have NA, but we don't
> really support it in any functions. The questions is: should we?
>
> For the basic statistical functions ('mean', 'std', ...) this boils down
> to making 'sum' and 'sumsq' skip NA's. For more complicated functions,
> e.g. 'cov', some more work is required. In the thread at the
> Octave-Forge list, Jaroslav and myself proposed an implementation of
> 'cov' that optionally allows NA skipping. The implementation is
> compatible with Matlab, and follows the same strategy as R.
>
> The point is that it seems like we should be able to support NA's in
> more functions if we wish. The question is: do we want to?
>
> Søren
>
To add my 2 cents: It seems to me, after weighting all pros and cons,
that the best approach would indeed be to ignore NAs (i.e. do nothing
special with them) by default. For instance, in "cov", there are
actually two methods of skipping NAs, and it is not clear which one is
preferable. Besides, the performance penalty for the "pairs" method is
really high (3 matrix products instead of 1 + a number of lower level
ops).
This is also what R does, i.e. skipping NAs is optional. I don't think
Octave needs to be "more statistical" than R is.
regards
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz