[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uniq - check specific fields
From: |
Pádraig Brady |
Subject: |
Re: uniq - check specific fields |
Date: |
Thu, 07 Feb 2013 17:34:25 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 |
On 02/07/2013 05:13 PM, Assaf Gordon wrote:
Hello,
Attached is a proof-of-concept patch to add "--check-fields=N" to uniq,
allowing uniq'ing by specific fields.
(Trying a different approach at promoting csplit-by-field [1] :) ).
It works just like 'check-chars' but on fields, and if not used, it does not
affect the program flow.
===
# input file, every whole-line is uniq
$ cat input.txt
A 1 z
A 1 y
A 2 x
B 2 w
B 3 w
C 3 w
C 4 w
# regular uniq
$ uniq -c input.txt
1 A 1 z
1 A 1 y
1 A 2 x
1 B 2 w
1 B 3 w
1 C 3 w
1 C 4 w
# Stop after 1 field
$ uniq -c --check-fields 1 input.txt
3 A 1 z
2 B 2 w
2 C 3 w
# Stop after 2 fields
$ uniq -c --check-fields 2 input.txt
2 A 1 z
1 A 2 x
1 B 2 w
1 B 3 w
1 C 3 w
1 C 4 w
# Skip the first field and check 1 field (effectively, uniq on field 2)
$ uniq -c --skip-fields 1 --check-fields 1 input.txt
2 A 1 z
2 A 2 x
2 B 3 w
1 C 4 w
# "--field" is convenience shortcut for skip&check fields
$ uniq -c --field 2 input.txt
2 A 1 z
2 A 2 x
2 B 3 w
1 C 4 w
$ uniq -c --field 3 input.txt
1 A 1 z
1 A 1 y
1 A 2 x
4 B 2 w
===
What do you think ?
Useful, but only a partial solution as discussed here:
http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=5832
I.E. essentially this patch has been rejected before,
and being able to specify --key to uniq just like sort,
would be much preferred.
To avoid redundant coding it's always good to
touch base with the list first on ideas,
or search the bug database.
cheers,
Pádraig