[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Adding range support to groupby

From: Shawn Wagner
Subject: Re: Adding range support to groupby
Date: Sat, 6 Aug 2022 06:39:04 -0700

On Sat, Aug 6, 2022 at 3:36 AM Erik Auerswald <auerswal@unix-ag.uni-kl.de> wrote:
Hi Shawn and Tim,

On 05.08.22 23:06, Tim Rice wrote:
> On Fri, Aug 05, 2022 at 02:42:12AM -0700, Shawn Wagner wrote:
>> Attached patch allows for usage like `groupby 1-4` instead of `groupby
>> 1,2,3,4`. Thoughts before I commit it?
> Cool idea. I had a couple of very minor nits like some of the column
> alignment in the tests was off. It does make it easier to eyeball tests
> when everything lines up just so. Also, c3 in the crosstab tests
> shouldn't be changed, but instead there should be a new test there.
> Then I thought that we should also test having multiple columns instead
> of just two.
> I've attached a patch which addresses those thoughts.

I think you just added tests in that diff.  I did not look at the
details, but I like the addition of those tests.  :-)

> Let's wait for Erik to also weigh in before committing it. He has a good
> eye for detail, and might notice additional things that will make it
> better.

I have read the diff and had a cursory look at existing tests (with
"git grep ... tests/").  Skimming the current documentation, I did
not find a documented restriction regarding using field ranges for
"groupby" or "crosstab" (which would just need to be removed IMHO).

I like the functionality, and the diff looks OK.  :-)

Please take the comments below into consideration as suggestions:

Since this affects "crosstab" aka "ct" too, that should be mentioned
in NEWS as well.

crosstab and groupby use the same function for getting columns; that's the only reason it's affected. Given it only works with two columns I didn't see a point to mentioning it in the documentation, as doing so might make people think it's supposed to work with more.
(Does this change affect further operations?  I did not test, neither
did I audit the code.)

I don't believe so.
I concur with Tim with keeping the 'c3' test and adding a new one
based on 'c3' that only replaces "1,2" with "1-2".

Then there should be a test that verifies that "crosstab 1-3" results
in an error since crosstab requires exactly two fields.  There are
tests with just one and four fields in tests/datamash-crosstab.pl,
I think it could be added near those tests.

Sounds good.
I would like the new "invalid field range" tests for "groupby" to
be duplicated with "crosstab".

> By the way, testing the range out on crosstab helped me notice another
> minor bug. If the output has wide columns (eg when using unique),
> crosstab's column alignment ends up all out of whack. I've added a note
> to myself to fix that sometime. It might be difficult, though.

I did not look at this, but AFAIUI GNU Datamash does not align
columns, it just separates them with one TAB character.

A different program, e.g., column(1) (in Debian GNU/Linux from
package "bsdmainutils"), can be used to visually align output

Is there really an issue here?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]