bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6366: join can't join on numeric fields


From: Jim Meyering
Subject: bug#6366: join can't join on numeric fields
Date: Wed, 09 Jun 2010 08:56:07 +0200

Alex Shinn wrote:

> 2010/6/8 Pádraig Brady <address@hidden>:
>> On 07/06/10 06:19, Alex Shinn wrote:
>>>
>>> Ideally join should be able to handle files sorted in any order
>>> that sort provides, but as a bare minimum it should at least
>>> be able to join files sorted on numeric fields.
>>
>> Well if there were no aliases in the numbers, you could always
>> sort the output numerically after the join if it was important.
>
> By first sorting lexicographically, you mean?
> In the use case I had, the data was already sorted
> numerically.  So whenever I want to join two files,
> currently I have to do:
>
>   sort file1 > file1.tmp
>   sort file2 > file2.tmp
>   join file1.tmp file2.tmp | sort -n > out
>   rm -f file1.tmp file2.tmp
>
> instead of just
>
>   join -n file1 file2 > out
>
> In the small tools philosophy you want to avoid adding
> redundancy, but in this case join isn't doing the same
> thing as sort, it's just working with it better.  Not to mention
> the fact that sort is an expensive operation to have to
> perform multiple times, not just an extra O(n) filter
> to throw in the middle of a pipeline.
>
>> However if you wanted to join "01" and "1" then your patch is required.
>> Are numeric aliases common enough to warrant this? I think so.
>
> Leading zeros may not be so common, but don't forget
> "1.0" and "1" or "1e2" and "100" and "100.0", etc.
>
>> I'd use -g, --general-numeric to correspond with `sort`.
>
> Yes, that's probably better.

There may be a fly in the ointment.

When comparing floating point numbers how would join measure equality?
Should it consider 1.000000000000001e2 to be equal to 100.0 ?
What if the maximum precision available does not
allow us to distinguish those two values?

What about -0 and 0? (with IEEE 754, they'll compare equal)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]