[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Threaded versions of cp, mv, ls for high latency / parallel filesyst
From: |
James Youngman |
Subject: |
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems? |
Date: |
Wed, 12 Nov 2008 10:22:20 +0000 |
[ CC ++ address@hidden ]
On Tue, Nov 11, 2008 at 2:58 PM, Andrew McGill <address@hidden> wrote:
> What would you expect this to do --:
>
> find -type f -print0 |
> xargs -0 -n 8 --max-procs=16 md5sum >& ~/md5sums
Produce a race condition :) It generates 16 parallel processes,
each writing to the md5sums file. Unfortunately sometimes the writes
occur at the same offset in the output file. To illustrate:
~$ strace -f -e open,fork,execve sh -c "echo hello > foo"
execve("/bin/sh", ["sh", "-c", "echo hello > foo"], [/* 39 vars */]) = 0
[...]
open("foo", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
~$ strace -f -e open,fork,execve sh -c "echo hello >> foo"
execve("/bin/sh", ["sh", "-c", "echo hello >> foo"], [/* 39 vars */]) = 0
[...]
open("foo", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
This version should be race-free:
find -type f -print0 |
xargs -0 -n 8 --max-procs=16 md5sum >> ~/md5sums 2>&1
I think that writing into a pipe should be OK, since pipes are
non-seekable. However, with pipes in this situation you still have a
problem if processes try to write more than PIPE_BUF bytes.
> Is there a correct way to do md5sums in parallel without having a shared
> output buffer which eats output (I presume) -- or is losing output when
> haphazardly combining output streams actually strange and unusual?
I hope the solution about solved your problem - and please follow up
if so. This example is probably worthy of being mentioned in the
xargs documentation, too.
Thanks for your comment!
James.
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?,
James Youngman <=