Re: [PATCH] md5: accepts a new --threads option

From: Pádraig Brady
Subject: Re: [PATCH] md5: accepts a new --threads option
Date: Wed, 21 Oct 2009 11:41:16 +0100


Jim Meyering wrote:
> Pádraig Brady wrote:
>> Pádraig Brady wrote:
>>> You wouldn't want multiple threads/processes fighting over
>>> the disk head so you would do something like:
>>>   find /disk1 | xargs md5sum & find /disk2 | xargs md5sum
>>> Note if we're piping/redirecting the output of the above
>>> then we must be careful to line buffer the output from md5sum
>>> so that it's not interspersed. Hmm I wonder should
>>> we linebuffer the output from *sum by default.
>> In the attached patch, I've changed the default buffering
>> to line buffered to address the above issue. For standard
>> size files there is a 2% performance drop.
> Good catch.
> It sounds like this fixes a real (albeit obscure) bug, so this
> might deserve a NEWS item, though I admit it is borderline.

Well it would easily be hit when one tries to parallelize the processes.
So I'll add a NEWS item and a test along the lines of:

(mkdir t && cd t && seq 100 | xargs touch)
(find t t t t -type f | xargs -n100 -P4 md5sum) | sed -n '/[0-9a-f]\{32\}  /!p' 
grep . >/dev/null && fail=1


