bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problem with definition of a field in 'uniq'


From: Paul E Condon
Subject: Problem with definition of a field in 'uniq'
Date: Mon, 11 Sep 2006 17:17:30 -0600
User-agent: Mutt/1.5.9i

I report a feature of uniq which seems IMHO to be a bug:
I am using test files containing the following lines:

tsttmp1:
2/dl1/f01             lnk2/f01              Benvenue house, Kat in blue dress 
on back porch
2/dl1/f02             lnk2/f02              Palm Springs, CA ????
2/dl1/f03             lnk2/f03              Amerivox company picnic, Palo Alto, 
CA
2/dl1/f03a            lnk2/f03a             Amerivox company picnic, Palo Alto, 
CA
2/dl1/f04             lnk2/f04              Europe but where?
2/dl1/f04a            lnk2/f04              Europe but where?
2/dl1/f05             lnk2/f05              Carol and Faith trip to Spain, etc.
2/dl1/f06             lnk2/f06              Carol and Faith trip

tsttmp2:
2/dl1/f01             lnk2/f01              Benvenue house, Kat in blue dress 
on back porch
2/dl1/f02             lnk2/f02              Palm Springs, CA ????
2/dl1/f03             lnk2/f03              Amerivox company picnic, Palo Alto, 
CA
2/dl1/f03a            lnk2/f03a             Amerivox company picnic, Palo Alto, 
CA
2/dl1/f04            lnk2/f04              Europe but where?
2/dl1/f04a            lnk2/f04              Europe but where?
2/dl1/f05             lnk2/f05              Carol and Faith trip to Spain, etc.
2/dl1/f06             lnk2/f06              Carol and Faith trip
 
Note that both files contain a pair of lines having 'lnk2/f04' as the second 
field.
The space between fields in both files is strings of space characters. No tabs 
are
used.

I use the commands:
$ uniq -f 1 -W 1 -D tsttmp1
and
$ uniq -f 1 -W 1 -D tsttmp2

In both commands, the options call for examining _only_ field 2, and should 
report two
duplicate lines in both files. But not so. There is no report of duplicates for 
tsttmp1.
And there is a report of two duplicate lines for tsttmp2.

I believe that the actual program treats a field as beginning with the first 
blank
after a non-blank character. This behavior is the standard behavior for 'sort', 
but is
inconsistent with 'info coreutils uniq', which states that a field begins with 
the first
non-blank character after a string of blanks. What keeps there from being a 
report for
tsttmp1 is the differing number of leading blanks in the two lines.

I suggest a fix for this in uniq:
1/ change the documenatation to accurately describe the actual behavior.
2/ add an option, -b, to uniq that tells it to ignore leading blanks in a 
field, as is
   available in sort.

Cheers,
-- 
Paul E Condon           
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]