[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH]: uniq: add "--group" option
From: |
Assaf Gordon |
Subject: |
Re: [PATCH]: uniq: add "--group" option |
Date: |
Thu, 21 Feb 2013 10:42:23 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4 |
Hello Pádraig,
Pádraig Brady wrote, On 02/20/2013 08:47 PM:
> On 02/20/2013 06:44 PM, Assaf Gordon wrote:
>> Hello,
>>
>> Attached is a suggestion for "--group" option in uniq, as discussed here:
>> http://lists.gnu.org/archive/html/coreutils/2011-03/msg00000.html
>> http://lists.gnu.org/archive/html/coreutils/2012-03/msg00052.html
>>
>> The patch adds two parameters:
>> --group=[method] separate each unique line (whether duplicated or
>> not)
>> with a marker.
>> method={none,separate(default),prepend,append,both)
>> --group-separator=SEP with --group, separates group using SEP
>> (default: empty line)
>>
>
> --group-sep is probably overkill.
> I'd just use \n or \0 if -z specified.
>
OK.
> As for separation methods I'd just go with what we have for
> --all-repeated (but remove 'none' which wouldn't be useful with --group),
> as we've never had requests for anything else. so:
> --group={prepend, separate(default)}
>
I'd like to have at least "append" or "both", for the added convenience of
downstream analysis.
It's obviously a "nice-to-have" and not "must-have" feature, and can be
implemented in other ways, but knowing that there will always be a terminating
marker *after* a group (even the last group) makes downstream processing code
simpler.
Typical example:
$ cat INPUT | uniq --group=append | \
awk '$0!="" { ## item in the group, collect it }
$0=="" { ## end of group, do something }'
Without the final group marker, any downstream code will require two points of
"group processing": when a marker is found, and at EOF.
Something like:
$ cat INPUT | uniq --group=append | \
awk '$0!="" { ## item in the group, collect it }
$0=="" { ## end of group, do something }
END { ## end of last group, do something, duplicated code }'
Similar reason for having "both", as it ensures there I can put any special
initialization code in the group-marker case, and doesn't need to duplicate it
in a separate 'BEGIN{}' clause (Of course, this doesn't have to be awk - can be
perl/python/ruby/whatever that will do downstream processing).
I realize it's not a "make-or-break" feature - but if we're trying to make text
processing easier, I believe "append/both" makes it even easier.
> So on to operation...
>
>> And it behaves "as expected":
>> ===
>> $ printf "a\na\na\nb\nc\nc\n" | ./src/uniq --group-sep="--" --group=separate
>
> The above isn't that useful and could be done with sed.
>
I assume you're specifically referring to the "group-sep" part - then OK.
> Supporting -u or -d with --group wouldn't be useful either really.
> It's probably most consistent to just disallow those combinations.
>
Just to be clear on the reasoning: because with "-u" and "-d", each *line* is
implicitly a separate group, there's no apparent utility for an end-of-group
marker.
I guess it's true from a technical POV - but again, for downstream analysis
convenience it's nice to have a fixed end-of-group marker.
I could use the same downstream script (which expects end-of-group markers)
with uniq, whether I used "-d" or "-u" or nothing at all.
What do you think?
-gordon
- [PATCH]: uniq: add "--group" option, Assaf Gordon, 2013/02/20
- Re: [PATCH]: uniq: add "--group" option, Pádraig Brady, 2013/02/20
- Re: [PATCH]: uniq: add "--group" option,
Assaf Gordon <=
- Re: [PATCH]: uniq: add "--group" option, Pádraig Brady, 2013/02/21
- Re: [PATCH]: uniq: add "--group" option, Assaf Gordon, 2013/02/21
- Re: [PATCH]: uniq: add "--group" option, Assaf Gordon, 2013/02/21
- Re: [PATCH]: uniq: add "--group" option, Pádraig Brady, 2013/02/27
- Re: [PATCH]: uniq: add "--group" option, Assaf Gordon, 2013/02/28
- Re: [PATCH]: uniq: add "--group" option, Pádraig Brady, 2013/02/28