[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#8040: join 5.97 bug
From: |
Eric Blake |
Subject: |
bug#8040: join 5.97 bug |
Date: |
Mon, 14 Feb 2011 17:18:43 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 |
On 02/14/2011 04:45 PM, Batson, Brannon wrote:
>
> File a:
> 10 A
> 1 B
>
> File b:
> 1
>
> $ join b a
> <nada>
>
> $ join -v 1 d c
> 1
You didn't provide a file d or c to compare against.
>
> files a & b are both sorted lexicographically (according to 'sort', anyway).
> The problem is that the join lexicographic '<' operator disagrees with sort's.
Thanks for the report. I can't help but wonder if you've stumbled into
this:
http://www.gnu.org/software/coreutils/faq/#join-requires-sorted-input-files
At any rate, the only bug here is in your input files.
>
> Sorry if this bug has been found like a thousand times before, couldn't find
> it via 30s of googling.
Coreutils 5.97 is OLD. The latest stable release is 8.10, and it has
improved diagnostics for helping you discover sorting problems with join:
join --help reminds you that:
Important: FILE1 and FILE2 must be sorted on the join fields.
E.g., use ` sort -k 1b,1 ' if `join' has no options,
or use ` join -t '' ' if `sort' has no options.
And trying your example with LC_ALL=en_US.UTF-8 gives:
$ join b a
join: file 2 is not in sorted order
Sure enough, using sort --debug to find the culprit (a was not sorted
according to -k 1b,1):
$ sort --debug a
sort: using `en_US.UTF-8' sorting rules
10 A
____
1 B
____
$ sort --debug -k 1b,1 a
sort: using `en_US.UTF-8' sorting rules
1 B
_
____
10 A
__
____
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature