[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: join more than two files
From: |
Pádraig Brady |
Subject: |
Re: join more than two files |
Date: |
Fri, 04 May 2012 15:19:41 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 |
On 05/04/2012 02:56 PM, Pádraig Brady wrote:
> On 01/13/2012 08:49 AM, address@hidden wrote:
>> Hello,
>>
>> I fairly recently discovered the joys of join, but now I wonder why it
>> is limited to two files?
>>
>> In other words, I would like to do the following:
>>
>> join file1 file2 ... fileN
>>
>> While I CAN achieve this through other methods, they are not ideal.
>> For instance, paste works with multiple files, but then I must cut out
>> the repeated key columns. The following also works, but doesn't
>> generalize to filename expansions (e.g. `join file*`):
>>
>> join file1 file2 | join - file3 | ... | join - fileN
>>
>> As for my use case, I am working with data files containing the
>> results of multiple systems running over the same test items. I would
>> like to compare the results of all systems for each item by putting
>> them side-by-side.
>>
>> I don't know the history of the command, so I am not aware of any
>> technical or ideological reasons why it shouldn't support more than
>> two files. Any explanation appreciated!
>
> Sorry I thought I replied to this.
>
> Well it would be handy to support more than 2 files,
> but you couldn't support an arbitrary number,
> as you'd run out of file descriptors or RAM etc.
>
> Also it would complicate the implementation
> of join to handle multiple files, especially
> an arbitrary number of multiple files.
>
> So for primarily these reason you'd need to
> split the task something like:
>
> find files |
> while read file1; do
> test "$file2" || read file2
> join "$file1" "$file2" > out.tmp
> file2=out.tmp
> done
>
> Note depending on the size of the data,
> out.tmp might be better on a ram disk
> (which /tmp is in modern GNU/Linux distros for example).
>
> Note also that your non tmp file piped version above
> is not scalable for many files due to process limits etc.
Sigh I had previously responded:
http://lists.gnu.org/archive/html/coreutils/2012-02/msg00064.html
but the subsequent request was in a new thread.
sorry for the noise,
Pádraig.