help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Difference between NaN and NA?


From: Jaroslav Hajek
Subject: Re: Difference between NaN and NA?
Date: Fri, 9 Apr 2010 07:54:23 +0200

On Fri, Apr 9, 2010 at 7:44 AM, Søren Hauberg <address@hidden> wrote:
> fre, 09 04 2010 kl. 07:20 +0200, skrev Jaroslav Hajek:
>> On Fri, Apr 9, 2010 at 7:03 AM, Søren Hauberg <address@hidden> wrote:
>> > tor, 08 04 2010 kl. 13:31 +0200, skrev Jaroslav Hajek:
>> >> But why make it built-in? I think that even with the current limited
>> >> OOP capabilities, it is possible to build a class that maintains the
>> >> NA's as a separate mask, making even the elementary operations
>> >> NA-aware (i.e. so that x + NA is NA and never NaN). Overloading the
>> >> statistics functions like mean etc. would also be quite simple.
>> >
>> > The problem with the current OOP programming is that you can't really
>> > inherit from base classes. If we created a 'NA_Friendly_Matrix' (NAFM?)
>> > class, but didn't (as an example) provide an implementation of the 'sin'
>> > function, the user would experience an error when calling 'sin'. The
>> > user would be forced to manually convert to an ordinary matrix whenever
>> > calling a function that does not have a NAFM implementation (the OOP
>> > stuff doesn't seem to provide a mechanism for automatically converting
>> > to a built-in type). This would IMHO just be a pain to use.
>> >
>> > So, if we were to have such a class, I fear it would have to be
>> > implemented in C++ and be part of Octave core. But perhaps I'm just
>> > missing the obvious here...
>> >
>>
>> No, you're not, I agree this is a downside. Still, I think it would be
>> better than what we currently have.
>> Besides, it would be rather easy to overload all known mappers for the
>> class, using some sort of auto-generated m-files.
>
> For me, the most important functions for dealing with NA's are the basic
> statistical functions, such as 'mean', 'var', 'cov' and similar. We have
> implementations for dealing with NaN's for some of these functions in
> the 'statistics' package ('nanmean', ...), but we don't have all. Quite
> some time ago, you and I collaborated on creating a 'nancov' function.
> The problem that kept this function form going anywhere (besides me
> forgetting all about it...) was that there are several algorithms for
> computing the covariance of a data set while ignoring NaN's and it is
> not clear which is better. The only proper solution to this issue is to
> let the user choose which algorithm to use, and then we are back to the
> solution the R people chose.
>
> So, I guess my basic question is: is it even possible to create a class
> that deals with NaN's (or NA's) entirely behind the users back?
>

Perhaps not. In any case I don't care, because I generally don't want
stuff operating behind my back.
Besides, my idea was to make the class so that NA's and NaN's can be
distinguished. Mathematically, NA + NaN should be a NA, not a NaN. If
you're OK with using NaN as both the missing and invalid value
indicator, you're just better off using the statistics functions or
the NaN package directly.

> Søren
>
> P.S. Would it make sense to also provide 'namean' and similar functions
> in the 'statistics' package?
>

I don't know. Personally, I don't use NA at all, so I'm all happy with nanmean.

> P.P.S. Should I just commit the 'nancov' function we wrote long ago to
> the 'statistics' package?

I think we wrote something more complicated, that handled both NaN and
NA at the same time in "the correct way", didn't we?
I suppose statistics should get just nancov, which simply skips all
NaN's, to make it consistent with the rest.


-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]