[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Threaded versions of cp, mv, ls for high latency / parallel filesyst

From: Andrew McGill
Subject: Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
Date: Tue, 11 Nov 2008 16:58:02 +0200
User-agent: KMail/1.9.9

On Saturday 08 November 2008 20:05:25 Jim Meyering wrote:
> Andrew McGill <address@hidden> wrote:
> > Greetings coreutils folks,
> >
> > There are a number of interesting filesystems (glusterfs, lustre? ...
> > NFS) which could benefit from userspace utilities doing certain
> > operatings in parallel.  (I have a very slow glusterfs installation that
> > makes me think that some things can be done better.)
> >
> > For example, copying a number of files is currently done in series ...
> >     cp a b c d e f g h dest/
> > but, on certain filesystems, it would be roughly twice as efficient if
> > implemented in two parallel threads, something like:
> >     cp a c e g dest/ &
> >     cp b d f h dest/
> > since the source and destination files can be stored on multiple physical
> > volumes.
> How about parallelizing it via xargs, e.g.,
>     $ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \
>       --max-procs=2 -- cp --target-directory=dest
>     cp --target-directory=dest a b c d
>     cp --target-directory=dest e f g h
> Obviously the above is tailored (-L4) to your 8-input example.
> In practice, you'd use a larger number, unless latency is
> so high as to dwarf the cost of extra "fork/exec" syscalls,
> in which case even -L1 might make sense.
I did the command above with md5sum as the command, and got missing lines in 
the output.  I optimistically hoped that would not happen!

> mv and ln also accept the --target-directory=dest option.
> > Simlarly, ls -l . will readdir(), and then stat() each file in the
> > directory. On a filesystem with high latency, it would be faster to issue
> > the stat() calls asynchronously, and in parallel, and then collect the
> > results for
> If you can demonstrate a large performance gain on
> systems that many people use, then maybe...
> There is more than a little value in keeping programs
> like those in the coreutils package relatively simple,
> but if the cost(maintenance+portability burden)/benefit
> ratio is low enough, then anything is possible.
> For example, a well-encapsulated, optionally-threaded
> "stat_all_dir_entries" API might be useful in some situations.
So a relatively small change for parallel stat() in "ls" could fly.

> If getting any eventual patch into upstream coreutils is
> important to you, be sure there is some consensus on this
> list before doing a lot of work on it.
Any ideas on how to do a parallel cp / mv in a way that is not Considered 
Harmful?  Maybe prefetch_files(max_bytes,file1,...,NULL) ... aargh.

> > display.  (This could improve performance for NFS, in proportion to the
> > latency and the number of threads.)
> >
> >
> > Question:  Is there already a set of "improved" utilities that implement
> > this kind of technique?
> Not that I know of.
> > If not, would this kind of performance enhancements be
> > considered useful?
> It's impossible to say without knowing more.

On the (de?)merits of xargs for parallel processing:

What would you expect this to do --:

    find -type f -print0 | 
        xargs -0 -n 8 --max-procs=16 md5sum >& ~/md5sums

    sort -k2 < md5sums > md5sums.sorted

Compared to this?

    find -type f -print0 | 
        xargs -0                     md5sum >& ~/md5sums

    sort -k2 < md5sums > md5sums.sorted

I was a little surprised that on my system running in parallel (the first 
version) loses around 1 line of output per thousand (md5sum of 22Gb in mostly 
small files).  

Is there a correct way to do md5sums in parallel without having a shared 
output buffer which eats output (I presume) -- or is losing output when 
haphazardly combining output streams actually strange and unusual?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]