[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Problem with definition of a field in 'uniq'
From: |
Paul E Condon |
Subject: |
Problem with definition of a field in 'uniq' |
Date: |
Mon, 11 Sep 2006 17:17:30 -0600 |
User-agent: |
Mutt/1.5.9i |
I report a feature of uniq which seems IMHO to be a bug:
I am using test files containing the following lines:
tsttmp1:
2/dl1/f01 lnk2/f01 Benvenue house, Kat in blue dress
on back porch
2/dl1/f02 lnk2/f02 Palm Springs, CA ????
2/dl1/f03 lnk2/f03 Amerivox company picnic, Palo Alto,
CA
2/dl1/f03a lnk2/f03a Amerivox company picnic, Palo Alto,
CA
2/dl1/f04 lnk2/f04 Europe but where?
2/dl1/f04a lnk2/f04 Europe but where?
2/dl1/f05 lnk2/f05 Carol and Faith trip to Spain, etc.
2/dl1/f06 lnk2/f06 Carol and Faith trip
tsttmp2:
2/dl1/f01 lnk2/f01 Benvenue house, Kat in blue dress
on back porch
2/dl1/f02 lnk2/f02 Palm Springs, CA ????
2/dl1/f03 lnk2/f03 Amerivox company picnic, Palo Alto,
CA
2/dl1/f03a lnk2/f03a Amerivox company picnic, Palo Alto,
CA
2/dl1/f04 lnk2/f04 Europe but where?
2/dl1/f04a lnk2/f04 Europe but where?
2/dl1/f05 lnk2/f05 Carol and Faith trip to Spain, etc.
2/dl1/f06 lnk2/f06 Carol and Faith trip
Note that both files contain a pair of lines having 'lnk2/f04' as the second
field.
The space between fields in both files is strings of space characters. No tabs
are
used.
I use the commands:
$ uniq -f 1 -W 1 -D tsttmp1
and
$ uniq -f 1 -W 1 -D tsttmp2
In both commands, the options call for examining _only_ field 2, and should
report two
duplicate lines in both files. But not so. There is no report of duplicates for
tsttmp1.
And there is a report of two duplicate lines for tsttmp2.
I believe that the actual program treats a field as beginning with the first
blank
after a non-blank character. This behavior is the standard behavior for 'sort',
but is
inconsistent with 'info coreutils uniq', which states that a field begins with
the first
non-blank character after a string of blanks. What keeps there from being a
report for
tsttmp1 is the differing number of leading blanks in the two lines.
I suggest a fix for this in uniq:
1/ change the documenatation to accurately describe the actual behavior.
2/ add an option, -b, to uniq that tells it to ignore leading blanks in a
field, as is
available in sort.
Cheers,
--
Paul E Condon
address@hidden
- Problem with definition of a field in 'uniq',
Paul E Condon <=