bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7068: Feature request: uniq --field-separator="SEP" --consider-field


From: Stefan Nowak
Subject: bug#7068: Feature request: uniq --field-separator="SEP" --consider-fields="a, b, c" --ignore-fields="x, y, z"
Date: Sun, 19 Sep 2010 00:44:17 +0200

Hello developers!


CURRENT SYNTAX:

http://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html

--skip-fields=n Skip n fields on each line before checking for uniqueness. Use a null string for comparison if a line has fewer than n fields. Fields are sequences of non-space non-tab characters that are separated from each other by at least one space or tab.


--- FEATURE REQUEST #1 ---

--field-separator="SEP", -F

EXAMPLE:

Scenario: Imagine a filesystem listing. Because of the hierarchical nature, all entries are unique. Now I want to ignore the filepath- prefix (skip the field/s by -F), and only consider the basename, and see how many instances exist of it, and where (all duplicate instances by -D).

Input:
folder a<TAB>file 1
folder b<TAB>file 1
folder b<TAB>file 2
folder c<TAB>file 3

Commandline:
cat sample.txt | guniq -D -F "\t" -f 1

Output:
folder a<TAB>file 1
folder b<TAB>file 1

BENEFIT: If you can define the separator character (i.e. TAB), then you have the freedom to have all other characters besides SEP within your column data, i.e. your column could then contain SPACE characters.


--- FEATURE SUGGESTION #2 ---

--consider-fields=a[,b,c, ...] Build the comparison string of a line from these field(s). --ignore-fields=x[,y,z,...] Build the comparison string of a line by excluding these field(s).


EXAMPLE:

Input:
folder a<TAB>file 1<TAB>suffixA
folder b<TAB>file 1<TAB>suffixB
folder b<TAB>file 2<TAB>suffixA
folder c<TAB>file 3<TAB>suffixA

Commandline:
cat sample.txt | guniq -D -F "\t" --consider-fields="2"
Equivalent to:
cat sample.txt | guniq -D -F "\t" --ignore-fields="1,3"

Output:
folder a<TAB>file 1<TAB>suffixA
folder b<TAB>file 1<TAB>suffixB

WORKAROUND MEANWHILE: Pre-insert a RegEx find/replace process in the pipe before uniq, which brings all the comparison-ignored data to the front, and then --skip-fields.

BENEFIT: Of course it would be much more convenient to work with the data as-is, and have the functions --consider-fields and --ignore- fields.



Regards, Stefan Nowak





reply via email to

[Prev in Thread] Current Thread [Next in Thread]