bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-datamash] Feature request: ignore NaN


From: Assaf Gordon
Subject: Re: [Bug-datamash] Feature request: ignore NaN
Date: Fri, 21 Nov 2014 18:54:41 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

Hello Brandon,

On 11/21/2014 12:21 PM, Brandon Invergo wrote:
<...>
Currently, if there's an NaN in a single row, the entire mean is NaN.
If a cell is empty, datamash returns an error (invalid numeric input).
It would be nice (well, necessary in my case) to be able to tell
datamash to ignore those values.

I like that.

I've added an option "--narm" to ignore those values.
The code is not yet in the official repository, as I need to add more tests and 
handle edge-cases.

But if you want to help me test it, I'll be happy to get feedback.
It's available here:
   git clone -b narm http://git.housegordon.org/cgit/datamash.git
Or here:
   http://files.housegordon.org/datamash/datamash-1.0.6.52-77cc.tar.gz

The new option allows you to do:

    $ printf "%s\n" 1 2 NaN 3 | ./datamash sum 1
    nan
    $ printf "%s\n" 1 2 NaN 3 | ./datamash --narm sum 1
    6

    $ printf "%s\n" 1 2 NA 3 | ./datamash sum 1
    ./datamash: invalid numeric value in line 3 field 1: 'NA'
    $ printf "%s\n" 1 2 NA 3 | ./datamash --narm sum 1
    6

    $ printf "%s\n" 1 2 NA 3 | ./datamash unique 1
    1,2,3,NA
    $ printf "%s\n" 1 2 NA 3 | ./datamash --narm unique 1
    1,2,3

I'm not sure about empty cells.
Do you think empty cells should be treated as NA, or always trigger an error?
There could be some tricky edge-cases there...

Other than that, I'm finding datamash to be super useful.

Thanks for the kind words!

- Assaf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]