[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Adding dot product operation to GNU Datamash
From: |
Tim Rice |
Subject: |
Adding dot product operation to GNU Datamash |
Date: |
Sat, 6 Aug 2022 01:30:22 +0000 |
Hey all,
I've been thinking about this for a while: it would be nice to have an
operation which multiplies the corresponding records of two columns and returns
the sum of these products. Aka the dot product or scalar product of the two
columns.
At the moment, you could do something similar by combining GNU Datamash with
GNU Awk:
```
$ awk '{print $1 * $2}' /tmp/data.txt | datamash sum 1
```
Or you could do it all in gawk if you want:
```
$ awk '{sum += $1 * $2} END{print sum}' /tmp/data.txt
```
But I think doing it all in GNU Datamash allows a more intuitive command:
```
$ datamash -W dotprod 1:2 < /tmp/data.txt
```
A proposed implementation is attached. Please let me know if you see any
problems with it.
If this looks good, then it should be trivial to also add a weighted mean. That
will just be like the dot product except for dividing the result by one of the
column sums. (But which column should be preferred for that? Maybe need to pass
an extra option?)
~ Tim
dotprod.diff
Description: Text document
- Adding dot product operation to GNU Datamash,
Tim Rice <=