bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnu parallel in the bash manual


From: John Kearney
Subject: Re: gnu parallel in the bash manual
Date: Wed, 06 Mar 2013 05:32:50 +0100
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3

Am 06.03.2013 01:03, schrieb Linda Walsh:
>
> John Kearney wrote:
>> The example is bad anyway as you normally don't want to parallelize disk
>> io , due to seek overhead and io bottle neck congestion. This example
>> will be slower and more likely to damage your disk than simply using mv
>> on its own. but thats another discussion.
> ---
>       That depends on how many IOPS your disk subsystem can
> handle and how much cpu is between each of the IO calls.
> Generally, unless you have a really old, non-queuing disk,
>> 1 procs will be of help.  If you have a RAID, it can go
> up with # of data spindles (as a max, though if all are reading
> from the same area, not so much...;-))...
>
>
>       Case in point, I wanted to compare rpm versions of files
> on disk in a dir to see if there were duplicate version, and if so,
> only keep the newest (highest numbered) version) (with the rest
> going into a per-disk recycling bin (a fall-out of sharing
> those disks to windows and implementing undo abilities on
> the shares (samba, vfs_recycle).
>
>       I was working directories with 1000's of files -- (1 dir,
> after pruning has 10,312 entries).  Sequential reading of those files
> was DOG slow.
>
>       I parallelized it (using perl) first by sorting all the names,
> then breaking it into 'N' lists -- doing those in parallel, then
> merging the results (and comparing end-points -- like end of one list
> might have been diff-ver from beginning of next).  I found a dynamic
> 'N' based on max cpu load v.disk (i.e. no matter how many procs I
> threw at it, it still used about 75% cpu).
>
> So I chose 9:
>
> Hot cache:
> Read 12161 rpm names.
> Use 1 procs w/12162 items/process
> #pkgs=10161, #deletes=2000, total=12161
> Recycling 2000 duplicates...Done
>  Cumulative      This Phase      ID
>  0.000s          0.000s          Init
>  0.000s          0.000s          start_program
>  0.038s          0.038s          starting_children
>  0.038s          0.001s          end_starting_children
>  8.653s          8.615s          endRdFrmChldrn_n_start_re_sort
>  10.733s         2.079s          afterFinalSort
> 17.94sec 3.71usr 6.21sys (55.29% cpu)
> ---------------
> Read 12161 rpm names.
> Use 9 procs w/1353 items/process
> #pkgs=10161, #deletes=2000, total=12161
> Recycling 2000 duplicates...Done
>  Cumulative      This Phase      ID
>  0.000s          0.000s          Init
>  0.000s          0.000s          start_program
>  0.032s          0.032s          starting_children
>  0.036s          0.004s          end_starting_children
>  1.535s          1.500s          endRdFrmChldrn_n_start_re_sort
>  3.722s          2.187s          afterFinalSort
> 10.36sec 3.31usr 4.47sys (75.09% cpu)
>
> Cold Cache:
> ============
> Read 12161 rpm names.
> Use 1 procs w/12162 items/process
> #pkgs=10161, #deletes=2000, total=12161
> Recycling 2000 duplicates...Done
>  Cumulative      This Phase      ID
>  0.000s          0.000s          Init
>  0.000s          0.000s          start_program
>  0.095s          0.095s          starting_children
>  0.096s          0.001s          end_starting_children
>  75.067s         74.971s         endRdFrmChldrn_n_start_re_sort
>  77.140s         2.073s          afterFinalSort
> 84.52sec 3.62usr 6.26sys (11.70% cpu)
> ----
> Read 12161 rpm names.
> Use 9 procs w/1353 items/process
> #pkgs=10161, #deletes=2000, total=12161
> Recycling 2000 duplicates...Done
>  Cumulative      This Phase      ID
>  0.000s          0.000s          Init
>  0.000s          0.000s          start_program
>  0.107s          0.107s          starting_children
>  0.112s          0.005s          end_starting_children
>  29.350s         29.238s         endRdFrmChldrn_n_start_re_sort
>  31.497s         2.147s          afterFinalSort
> 38.27sec 3.35usr 4.47sys (20.47% cpu)
>
> ---
> hot cache savings: 42%
> cold cache savings: 55%
>
>
>
>
>
Different use case you can't really compare mv to data processing.  And
Generally it is a bad idea, unless yo know what you are doing.
trying to parallelize
mv <dir 1>/*  <Dir 2>
Is a bad idea unless you are on some expensive hardware. This is because
of the sequential nature of the access model.

Your use case was a sparse access model and there is normally no
performance penalty to interleaving sparse access methods.

Depending on the underlying hardware it can be very costly to interleave
sequential access streams especially on embedded devices e.g. emmc.
Not to mention the sync object overhead you may be incurring in the fs
driver and or hardware driver.


With 13000 files in one directory you must have been taking a dir list
and file open access penalty. What fs was that on? I'm tempted to say 1
reason parallelization helped in you above example was due to the fs
overhead for a directory that size. Generally I don't advise having more
files in a directory than be be contained in 1 extent.

In general max cpu is not the best metric to go by, rather compare real
time, as the 2 don't always correlate.



Was that comment about the raid from me? Anyway it depends on if you
have sw/hw raid, and the type of raid. Take for example a decent sas
controller card with 16 fast ssd drive in a 1+0 raid configuration. At
least for reads there are no real effective limits on parallel access.
If however you have a sw Raid5/6, it might as well be  a spindle drive,
from this perspective.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]