[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uniq with sort-like "--key" support
From: |
Pádraig Brady |
Subject: |
Re: uniq with sort-like "--key" support |
Date: |
Wed, 13 Feb 2013 17:20:19 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 |
On 02/13/2013 04:45 PM, Assaf Gordon wrote:
On 02/12/2013 01:31 AM, Assaf Gordon wrote:
I'd like to offer a proof-of-concept patch for adding sort-like "--key" support
for the 'uniq' program, as discussed here:
http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html
and in several other threads.
One more update with two changes:
1. re-arranged "src/uniq_sort_common.h" to have the functions in the same order as in
"src/sort.c",
making "diff src/uniq_sort_common.h src/sort.c" much easier to view (and seeing
that the functions were not modified at all).
2. when specifying explicit field separator and using "-c", report the counts
with no space-padding right-aligned numbers (and the separator).
This might be controversial, but I always needed that :) (used to wrap every "uniq -c"
with "sed 's/^ *// ; s/ /\t/'" )
==
## Existing:
$ printf "a\tx\na\tx\nb\ty\n" | uniq -c
2 a x
1 b y
## New:
$ printf "a\tx\na\tx\nb\ty\n" | ./src/uniq -t $'\t' -c
2 a x
1 b y
==
Also, I'm wondering what exactly is the effect of the following statement
( from http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00217.html ):
"This point was addressed in IEEE Std 1003.1-2001/Cor 1-2002, item
XCU/TC1/D6/40, and it's why the current Posix spec says that the
behavior of uniq depends on LC_COLLATE."
And whether sort's keycompare functions fulfill this requirement, and whether
the current 'uniq' tests check this situation?
Otherwise my changes are not backwards-compatible.
Sort's keycompare handles that.
The above was just in relation to a perf improvement to just
byte compare rather than convert before comparison.
We still may be able to do something more efficient along
these lines when considering multibyte.
A related possibility for the non multibyte case is
that the -k option order doesn't matter to uniq I think,
so there might be perf/cache benefits to always processing
the keys in numerical rather than specified order.
cheers,
Pádraig.
- Re: uniq with sort-like "--key" support, (continued)
- Re: uniq with sort-like "--key" support, Pádraig Brady, 2013/02/11
- Re: uniq with sort-like "--key" support, Assaf Gordon, 2013/02/12
- Re: uniq with sort-like "--key" support, Assaf Gordon, 2013/02/13
- Re: uniq with sort-like "--key" support, Assaf Gordon, 2013/02/13
- Re: uniq with sort-like "--key" support, Jim Meyering, 2013/02/13
- Re: uniq with sort-like "--key" support, Assaf Gordon, 2013/02/13
- Re: uniq with sort-like "--key" support, Assaf Gordon, 2013/02/13
- Re: uniq with sort-like "--key" support, Pádraig Brady, 2013/02/13
- Re: uniq with sort-like "--key" support, Assaf Gordon, 2013/02/13
- Re: uniq with sort-like "--key" support (now with sort and join), Assaf Gordon, 2013/02/13
- Re: uniq with sort-like "--key" support,
Pádraig Brady <=