[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] sort: parallel external sort implementation

From: Pádraig Brady
Subject: Re: [PATCH] sort: parallel external sort implementation
Date: Thu, 04 Mar 2010 12:06:19 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20100216 Thunderbird/3.0.2

On 04/03/10 06:01, Joey Degges wrote:

Matt (cc'd, not on list) and I have developed a patch that parallelizes sort
using pthreads. The threading approach taken here is to break up input files
into groups based on physical device and sort each of those groups in
This allows for storage and processing resources to be more fully utilized
during the sort process.

multidisk sort with 4x 50M inputs
$ time -v sort_multidisk /q/0/50M /q/1/50M /q/2/50M /q/3/50M>  /dev/null
         User time (seconds): 10.89
         System time (seconds): 0.83
         Percent of CPU this job got: 180%
         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.50
         Major (requiring I/O) page faults: 20
         Minor (reclaiming a frame) page faults: 72843
         Voluntary context switches: 1936
         Involuntary context switches: 149
         File system inputs: 393080

current sort with 4x 50M inputs
$ time -v sort_current /q/0/50M /q/1/50M /q/2/50M /q/3/50M>  /dev/null
         User time (seconds): 11.05
         System time (seconds): 0.32
         Percent of CPU this job got: 86%
         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.19
         Major (requiring I/O) page faults: 18
         Minor (reclaiming a frame) page faults: 72532
         Voluntary context switches: 915
         Involuntary context switches: 48
         File system inputs: 198936

diff --git a/src/sort.c b/src/sort.c
+      --threads=N           use no more than N threads to improve 

Threads is a bit of an implementation detail?
Perhaps it would be better to use --parallel so one
could move to multiprocess in future if desired?

Since this is a new option, it requires documentation
in docs/coreutils.texi which would include examples
of how and why you would use this option.

+   Threading approach: Break FILES up into several groups where each contains
+   only files that can be found on the same physical device (according to
+   device_cmp()). Spawn threads to execute do_sort() on each group of files in
+   parallel.
+   This allows for all concerned resources (storage devices and processors) to
+   be more fully utilized.
+static void
+sort_multidisk (char * const *files, size_t nfiles, char const *output_file,
+                unsigned long int nthreads)

Have you considered the seek characteristics of SSDs
and how they might affect things (with consideration
that mechanical disks will be replaced by SSDs quite quickly).
There still would be some benefit splitting per SSD,
but it would be worth measuring to see.

@@ -3691,7 +3996,21 @@ main (int argc, char **argv)
       IF_LINT (free (sortfiles));
-    sort (files, nfiles, outfile);
+    {
+      if (!nthreads)
+        {
+          /* The default NTHREADS is 2 ** floor (log2 (number of processors)).
+             If comparisons do not vary in cost and threads are
+             scheduled evenly, with the current algorithm there is no
+             performance advantage to using a number of threads that
+             is not a power of 2.  */
+          unsigned long int np2 = num_processors (NPROC_CURRENT) / 2;
+          for (nthreads = 1; nthreads <= np2; nthreads *= 2)
+            continue;
+        }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]