bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7878: "sort" bug--inconsistent single-column sorting influenced by o


From: Randall Lewis
Subject: bug#7878: "sort" bug--inconsistent single-column sorting influenced by other columns?
Date: Thu, 20 Jan 2011 18:40:01 -0800

"sort" does inconsistent sorting.

I'm pretty sure it has NOTHING to do with the following warning, although I 
could be totally wrong.

" *** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values. "


See the attached shell script and text files.

bash-3.2$


cat test1.txt
323|1
36|2
406|3
40|4
587|5
cat test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Note that the first column is the same for both files.

sort test1.txt
323|1
36|2
40|4
406|3
587|5
sort test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
The rows are in a different order depending on the dataset--and it is NOT a 
numeric sort. I'm not even sure it is is ANY type of sort.

sort -k1 test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Trying to fix the problem by focusing on the first column doesn't work.

sort -t "|" test1.txt
323|1
36|2
40|4
406|3
587|5
sort -t "|" test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -t '|' test1.txt
323|1
36|2
40|4
406|3
587|5
sort -t '|' test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -k1 -t "|" test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 -t "|" test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -k1 -t '|' test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 -t '|' test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Trying to fix the problem by including delimiter information doesn't work.
sort -k1d test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1d test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -s test1.txt
323|1
36|2
40|4
406|3
587|5
sort -s test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -s -k1 test1.txt
323|1
36|2
40|4
406|3
587|5
sort -s -k1 test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Neither does dictionary order or stable matching.
sort -g test1.txt
36|2
40|4
323|1
406|3
587|5
sort -g test7.txt
36|C2
40|B4
323|B1
406|B3
587|C5
sort -n test1.txt
36|2
40|4
323|1
406|3
587|5
sort -n test7.txt
36|C2
40|B4
323|B1
406|B3
587|C5
Using numeric or general sorting appears to fix the problem on this numeric 
example. But why did it sort inconsistently in the first place based on the 
other contents of the
 file rather than just focusing on the first column--even when I told it to?
sort test1.txt | join -a1 -a2 -t "\|" - test7.txt
323|1|B1
36|2|C2
40|4
406|3|B3
40|B4
587|5|C5
Inconsistent sorting when combined with 'join' provides incorrect matches and 
duplication of records. This is a mess.
sort test1.txt | sort -c
sort test7.txt | sort -c
Yet, sort -c says that it is sorted correctly.
sort test1.txt
323|1
36|2
40|4
406|3
587|5
sort test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort test1.txt | join -a1 -a2 -j1 -t "\|" -e "0" -o "1.1,1.2,2.2" - test7.txt
See COMMENTED Cygwin output.

# $ sort test1.txt
# 323|1
# 36|2
# 406|3
# 40|4
# 587|5

# $ sort test7.txt
# 323|B1
# 36|C2
# 406|B3
# 40|B4
# 587|C5

# $ sort test1.txt | join -a1 -a2 -j1 -t "|" -e "0" -o "1.1,1.2,2.2" - test7.txt
# |B1|1
# |C22
# |B3|3
# |B44
# |C5|5


And finally, Cygwin does this sort consistently across all three examples (but 
it does mess up the 'join'). ????? Sucks to be me with a defective Cygwin and 
an unreliable so
rt and work to get done. Any advice?


randall lewis
research scientist

address@hidden
mobile 617-671-8294

4401 great america parkway, santa clara, ca, 95054, us




Attachment: SortBug.sh
Description: SortBug.sh

Attachment: test7.txt
Description: test7.txt

Attachment: test1.txt
Description: test1.txt


reply via email to

[Prev in Thread] Current Thread [Next in Thread]