[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug in uniq?

From: Ian Sue Wing
Subject: Bug in uniq?
Date: Fri, 11 Mar 2005 15:05:55 -0500
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)


Yesterday I downloaded and installed a copy of CYGWIN. I am using the uniq utility to purge duplicate line entries from a large, tab-delimited file with several columns of data. (The file, which I have already run through sort, is included as a .bz2 attachment. It has about 60,000 lines.)

I have examined the file visually in a text editor, and confirmed that it has duplicate lines. I have loaded the file into excel and calculated that there are about 8700 duplicate lines. However, in the CYGWIN Bash shell, typing

uniq test_file_for_uniq > foo; diff test_file_for_uniq foo

shows no changes between the files. Examining the uniquified file 'foo' in excel reveals it to be identical to the original.

I then fired up my trusty old MKS Toolkit and ran its implementation of uniq. Running MKS visual diff on the original and uniquified files identified about 8700 line differences, consistent with my earlier calculations.

Is this a bug in CYGWIN's implementation of uniq or a or a silly error on my part? Last I checked, uniq was simple, straightforward to use, and had nuclear-hardened reliability.


Ian Sue Wing                               675 Commonwealth Ave.
Assistant Professor                        Rm. 141, Boston MA 02215
Center for Energy & Environmental Studies  Tel: (617) 353-5741
Department of Geography & Environment      Fax: (617) 353-5986
Boston University                          Web: http://people.bu.edu/isw

Attachment: test_file_for_uniq.bz2
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]