bug#7068: Feature request: uniq --field-separator="SEP" --consider-field

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7068: Feature request: uniq --field-separator="SEP" --consider-field

From:	Stefan Nowak
Subject:	bug#7068: Feature request: uniq --field-separator="SEP" --consider-fields="a, b, c" --ignore-fields="x, y, z"
Date:	Sun, 19 Sep 2010 00:44:17 +0200

Hello developers!


CURRENT SYNTAX:

http://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html

--skip-fields=n Skip n fields on each line before checking foruniqueness. Use a null string for comparison if a line has fewer thann fields. Fields are sequences of non-space non-tab characters thatare separated from each other by at least one space or tab.



--- FEATURE REQUEST #1 ---

--field-separator="SEP", -F

EXAMPLE:

Scenario: Imagine a filesystem listing. Because of the hierarchicalnature, all entries are unique. Now I want to ignore the filepath-prefix (skip the field/s by -F), and only consider the basename, andsee how many instances exist of it, and where (all duplicate instancesby -D).


Input:
folder a<TAB>file 1
folder b<TAB>file 1
folder b<TAB>file 2
folder c<TAB>file 3

Commandline:
cat sample.txt | guniq -D -F "\t" -f 1

Output:
folder a<TAB>file 1
folder b<TAB>file 1

BENEFIT: If you can define the separator character (i.e. TAB), thenyou have the freedom to have all other characters besides SEP withinyour column data, i.e. your column could then contain SPACE characters.



--- FEATURE SUGGESTION #2 ---

--consider-fields=a[,b,c, ...] Build the comparison string of a linefrom these field(s).--ignore-fields=x[,y,z,...] Build the comparison string of a lineby excluding these field(s).



EXAMPLE:

Input:
folder a<TAB>file 1<TAB>suffixA
folder b<TAB>file 1<TAB>suffixB
folder b<TAB>file 2<TAB>suffixA
folder c<TAB>file 3<TAB>suffixA

Commandline:
cat sample.txt | guniq -D -F "\t" --consider-fields="2"
Equivalent to:
cat sample.txt | guniq -D -F "\t" --ignore-fields="1,3"

Output:
folder a<TAB>file 1<TAB>suffixA
folder b<TAB>file 1<TAB>suffixB

WORKAROUND MEANWHILE: Pre-insert a RegEx find/replace process in thepipe before uniq, which brings all the comparison-ignored data to thefront, and then --skip-fields.

BENEFIT: Of course it would be much more convenient to work with thedata as-is, and have the functions --consider-fields and --ignore-fields.




Regards, Stefan Nowak

[Prev in Thread]

Current Thread

[Next in Thread]

bug#7068: Feature request: uniq --field-separator="SEP" --consider-fields="a, b, c" --ignore-fields="x, y, z", Stefan Nowak <=

Prev by Date: bug#7067: [PATCH] build: use gnulib's new termios module
Next by Date: bug#7073: no pthread_spinlock_t on Mac OS 10.6.4
Previous by thread: bug#7067: [PATCH] build: use gnulib's new termios module
Next by thread: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Index(es):
- Date
- Thread