[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-datamash] Feature request: ignore NaN
From: |
Assaf Gordon |
Subject: |
Re: [Bug-datamash] Feature request: ignore NaN |
Date: |
Fri, 21 Nov 2014 18:54:41 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 |
Hello Brandon,
On 11/21/2014 12:21 PM, Brandon Invergo wrote:
<...>
Currently, if there's an NaN in a single row, the entire mean is NaN.
If a cell is empty, datamash returns an error (invalid numeric input).
It would be nice (well, necessary in my case) to be able to tell
datamash to ignore those values.
I like that.
I've added an option "--narm" to ignore those values.
The code is not yet in the official repository, as I need to add more tests and
handle edge-cases.
But if you want to help me test it, I'll be happy to get feedback.
It's available here:
git clone -b narm http://git.housegordon.org/cgit/datamash.git
Or here:
http://files.housegordon.org/datamash/datamash-1.0.6.52-77cc.tar.gz
The new option allows you to do:
$ printf "%s\n" 1 2 NaN 3 | ./datamash sum 1
nan
$ printf "%s\n" 1 2 NaN 3 | ./datamash --narm sum 1
6
$ printf "%s\n" 1 2 NA 3 | ./datamash sum 1
./datamash: invalid numeric value in line 3 field 1: 'NA'
$ printf "%s\n" 1 2 NA 3 | ./datamash --narm sum 1
6
$ printf "%s\n" 1 2 NA 3 | ./datamash unique 1
1,2,3,NA
$ printf "%s\n" 1 2 NA 3 | ./datamash --narm unique 1
1,2,3
I'm not sure about empty cells.
Do you think empty cells should be treated as NA, or always trigger an error?
There could be some tricky edge-cases there...
Other than that, I'm finding datamash to be super useful.
Thanks for the kind words!
- Assaf