[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: join more than two files
From: |
Pádraig Brady |
Subject: |
Re: join more than two files |
Date: |
Fri, 04 May 2012 14:56:30 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 |
On 01/13/2012 08:49 AM, address@hidden wrote:
> Hello,
>
> I fairly recently discovered the joys of join, but now I wonder why it
> is limited to two files?
>
> In other words, I would like to do the following:
>
> join file1 file2 ... fileN
>
> While I CAN achieve this through other methods, they are not ideal.
> For instance, paste works with multiple files, but then I must cut out
> the repeated key columns. The following also works, but doesn't
> generalize to filename expansions (e.g. `join file*`):
>
> join file1 file2 | join - file3 | ... | join - fileN
>
> As for my use case, I am working with data files containing the
> results of multiple systems running over the same test items. I would
> like to compare the results of all systems for each item by putting
> them side-by-side.
>
> I don't know the history of the command, so I am not aware of any
> technical or ideological reasons why it shouldn't support more than
> two files. Any explanation appreciated!
Sorry I thought I replied to this.
Well it would be handy to support more than 2 files,
but you couldn't support an arbitrary number,
as you'd run out of file descriptors or RAM etc.
Also it would complicate the implementation
of join to handle multiple files, especially
an arbitrary number of multiple files.
So for primarily these reason you'd need to
split the task something like:
find files |
while read file1; do
test "$file2" || read file2
join "$file1" "$file2" > out.tmp
file2=out.tmp
done
Note depending on the size of the data,
out.tmp might be better on a ram disk
(which /tmp is in modern GNU/Linux distros for example).
Note also that your non tmp file piped version above
is not scalable for many files due to process limits etc.
cheers,
Pádraig.
- Re: join more than two files,
Pádraig Brady <=