[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: parallel cat
From: |
Dan Kokron |
Subject: |
Re: parallel cat |
Date: |
Fri, 29 Jul 2011 12:41:56 -0400 |
Thanks to all who made suggestions. Using parallel for this task did
improve performance substantially.
Dan
On Sun, 2011-07-17 at 09:08 -0500, Ole Tange wrote:
> On Fri, Jul 15, 2011 at 8:39 PM, Dan Kokron <daniel.kokron@nasa.gov> wrote:
>
> > I have a bunch (~200) small (1K to 100K) binary files that I want to
> > 'cat' into a larger file. I usually use "cat pe* > diag", but this
> > takes considerable time on the Lustre file system we are using. I am
> > exploring using GNU parallel for this task but have run into some
> > difficulties. Basically the resulting diag file only contains one of
> > the input files.
> >
> > I've tried the following variations.
> >
> > parallel "cat {} >diag_amsua_n18_03.2011041700" ::: pe*
> > parallel cat {} ">"diag_amsua_n18_03.2011041700 ::: pe*
> > ls pe* | parallel cat {} ">"diag_amsua_n18_03.2011041700
> > ls pe* | parallel -j4 -k cat {} ">"diag_amsua_n18_03.2011041700
> > ls pe* | parallel -k cat {} ">"diag_amsua_n18_03.2011041700
> > parallel -j4 -k "cat {} >diag_amsua_n18_03.2011041700" ::: pe*
>
> You are _so_ close.
>
> parallel cat >diag_all ::: pe*
>
> It is probably more readable for UNIX users to write this (It does
> exactly the same):
>
> parallel cat ::: pe* >diag_all
>
> Or if you prefer the order kept:
>
> parallel -k cat ::: pe* >diag_all
>
> I have no experience with Lustre, but I would imagine that Lustre is
> slow at getting the first byte and after that it is pretty fast. Also
> the reason why it is slow is because it is waiting. If that is the
> case then it will be OK to run a lot of cats simultaneously:
>
> parallel -j0 cat ::: pe* >diag_all
>
> These sections of the man page touches the subject of using the output
> from GNU Parallel:
>
> EXAMPLE: Rewriting a for-loop and a while-read-loop
> EXAMPLE: Rewriting nested for-loops
> EXAMPLE: Keep order of output same as order of input
> EXAMPLE: Processing a big file using more cores
>
> If you believe it can be explained better please post your suggestion
> for discussion here.
>
>
> /Ole
--
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron@nasa.gov
Phone: (301) 614-5192
Fax: (301) 614-5304
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: parallel cat,
Dan Kokron <=