octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NaN-toolbox much faster now


From: Alois Schlögl
Subject: Re: NaN-toolbox much faster now
Date: Tue, 17 Mar 2009 09:25:12 +0100
User-agent: Thunderbird 2.0.0.19 (X11/20090105)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jaroslav Hajek wrote:
> On Mon, Mar 16, 2009 at 1:13 PM, Alois Schlögl <address@hidden> wrote:
> 
>> Thanks for confirming the test. You asked for "opinions about skipping
>> NaN's, and the Octave NA's support". Here are some thoughts on that issue.
>>
>> Concerning the question whether NaN's and NA should be handled separately.
>> - - Just because R has NA's is not necessarily a good reason why Octave
>> needs it, too. Possible advantages need to be explained.
>>
>> - - In statistical and probabilistic applications skipping both, NaN and
>> NA, is a reasonable approach - there is no need to distinguish NaN from NA.
>>
> 
> In fact this seems to be what R actually does. It seems that in R, the
> classification of NA/NaN is exactly the converse of Octave's, i.e.
> isna(NaN) is true, while isnan(NA) is not, and that when you tell a
> function to "skip NAs" (na.rm = TRUE), it indeed skips both NaNs and
> NAs. So the question I'm raising here is: is Octave's support of NAs
> actually a good idea, given that it is, in a good sense, actually
> incompatible with R? Of course, "fixing" isnan to work like in R would
> on the other hand break compatibility with C and Matlab and
> everything.
> 
> 
>> - - In case NaN's are used for error handling, the question is how is NA
>> improving the error handling? The main advantage would be that less
>> NaN's need to be handled, but NA come with additional costs of added
>> complexity and possible confusion (causing more programming errors, slow
>> down of development speed, as well as performance loss). Therefore, if
>> NA's should get special support, the benefits of this concept should be
>> made clear.
>>
>> - - The benefits of the NaN-toolbox over the traditional approach are:
>> (i) functions are doing more often the right thing,
>> (ii) applications are less likely to fail due to NaN-related issues.
>> (iii) its more likely that users unaware of the NaN-issue get it right
>> in the first place,
>> (iv) no need to think about whether nanmean or mean is the right function;
>> (v) of course using always nanmean(), etc. would also do, but its nicer
>> to write only mean(), etc.;
>> Basically, the idea is to make the use of these functions easier. The
>> use of NA in addition to NaN's is detrimental to this aim. So the
>> advantage of using NA's is not clear.
>>
> 
> These are points that we've discussed previously. They're mostly
> agreeable with unless functions are used in a non-statistical sense -
> and I can only imagine that for "mean". After all, they're classified
> as "statistics", so one could agree that "mean" should be understood
> to be the statistical mean.

That's how I see it, too.

> 
> Performance is another consideration. It seems that penalty for
> removing NaNs ranges up to some 30%, which may be significant for some
> uses. So maybe the functions should provide an option to turn off the
> checking for NaNs, just for the case when data are guaranteed to be
> NaN-free.
> 
> cheers
> 


Ok, in order to address that request, I've added the function
flag_implicit_skip_nan.m. If you call flag_implicit_skip_nan(0), NaN's
are not skipped anymore, and the traditional behavior is reproduced.
This will affect all functions that are based on sumskipnan.m
flag_implicit_skip_nan(1) will again turn on the NaN-skipping behavior.

octave:169> x=randn(1e4);x(4,4)=NaN;
octave:170> flag_implicit_skip_nan(1); %% default: skip NaN's
octave:171> tic; t=cputime; mean(x,1); [toc,cputime()-t]
ans =

   0.26556   0.26402

octave:172> tic; t=cputime; mean(x,2); [toc,cputime()-t]
ans =

   0.36296   0.36402

octave:173> flag_implicit_skip_nan(0);  %% do not skip NaN's
octave:174> tic; t=cputime; mean(x,1); [toc,cputime()-t]
ans =

   0.20163   0.20001

octave:175> tic; t=cputime; mean(x,2); [toc,cputime()-t]
ans =

   0.24151   0.24001

octave:176> ver
- ----------------------------------------------------------------------
GNU Octave Version 3.1.54+
GNU Octave License: GNU General Public License
Operating System: Linux 2.6.27-11-generic #1 SMP Thu Jan 29 19:28:32 UTC
2009 x86_64
- ----------------------------------------------------------------------


Cheers,
  Alois
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkm/XmQACgkQzSlbmAlvEIg1bACgs7GAvtqPXUd2Xt/AJuwNo6GZ
X9AAoJxy6UFWN3m6cgFroHPdqa2Y+J5k
=GSVx
-----END PGP SIGNATURE-----


reply via email to

[Prev in Thread] Current Thread [Next in Thread]