bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Threaded versions of cp, mv, ls for high latency / parallel filesyst


From: James Youngman
Subject: Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
Date: Wed, 12 Nov 2008 10:22:20 +0000

[ CC ++ address@hidden ]


On Tue, Nov 11, 2008 at 2:58 PM, Andrew McGill <address@hidden> wrote:
> What would you expect this to do --:
>
>     find -type f -print0 |
>         xargs -0 -n 8 --max-procs=16 md5sum >& ~/md5sums

Produce a race condition :)    It generates 16 parallel processes,
each writing to the md5sums file.  Unfortunately sometimes the writes
occur at the same offset in the output file. To illustrate:

~$ strace -f -e open,fork,execve sh -c "echo hello > foo"
execve("/bin/sh", ["sh", "-c", "echo hello > foo"], [/* 39 vars */]) = 0
[...]
open("foo", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
~$ strace -f -e open,fork,execve sh -c "echo hello >> foo"
execve("/bin/sh", ["sh", "-c", "echo hello >> foo"], [/* 39 vars */]) = 0
[...]
open("foo", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3

This version should be race-free:

find -type f -print0 |
     xargs -0 -n 8 --max-procs=16 md5sum >> ~/md5sums 2>&1

I think that writing into a pipe should be OK, since pipes are
non-seekable.  However, with pipes in this situation you still have a
problem if processes try to write more than PIPE_BUF bytes.


> Is there a correct way to do md5sums in parallel without having a shared
> output buffer which eats output (I presume) -- or is losing output when
> haphazardly combining output streams actually strange and unusual?

I hope the solution about solved your problem - and please follow up
if so.  This example is probably worthy of being mentioned in the
xargs documentation, too.

Thanks for your comment!

James.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]