[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bug in uniq?
Ian Sue Wing
Bug in uniq?
Fri, 11 Mar 2005 15:05:55 -0500
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
Yesterday I downloaded and installed a copy of CYGWIN. I am using the
uniq utility to purge duplicate line entries from a large, tab-delimited
file with several columns of data. (The file, which I have already run
through sort, is included as a .bz2 attachment. It has about 60,000 lines.)
I have examined the file visually in a text editor, and confirmed that
it has duplicate lines. I have loaded the file into excel and calculated
that there are about 8700 duplicate lines. However, in the CYGWIN Bash
uniq test_file_for_uniq > foo; diff test_file_for_uniq foo
shows no changes between the files. Examining the uniquified file 'foo'
in excel reveals it to be identical to the original.
I then fired up my trusty old MKS Toolkit and ran its implementation of
uniq. Running MKS visual diff on the original and uniquified files
identified about 8700 line differences, consistent with my earlier
Is this a bug in CYGWIN's implementation of uniq or a or a silly error
on my part? Last I checked, uniq was simple, straightforward to use, and
had nuclear-hardened reliability.
Ian Sue Wing 675 Commonwealth Ave.
Assistant Professor Rm. 141, Boston MA 02215
Center for Energy & Environmental Studies Tel: (617) 353-5741
Department of Geography & Environment Fax: (617) 353-5986
Boston University Web: http://people.bu.edu/isw
Description: Binary data
- Bug in uniq?,
Ian Sue Wing <=