bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multi-threading in sort(or core-utils)


From: Bo Borgerson
Subject: Re: Multi-threading in sort(or core-utils)
Date: Sun, 22 Jun 2008 13:43:46 -0400
User-agent: Thunderbird 2.0.0.14 (X11/20080505)

Paul Eggert wrote:
> Bo Borgerson <address@hidden> writes:
> 
>> Does this sound like a step in the right direction for sort?  If I were
>> to clean this up and submit it would you be willing to assess its
>> viability as a portable improvement?
> 
> Yes, and yes.  And thanks!

Hi Paul,

When I isolated my parallel merge enhancement I discovered that the
improvement I was seeing was mostly the result of my not having properly
divided resources (particularly NMERGE) among children.  Aside from some
benefit to unique merges with many duplicates I wasn't able to produce
satisfactory results using this approach.

So I started from scratch on a parallel bulk sort enhancement.  Here I
was able to see some modest but reliable improvement.

The approach I took was to divide the main work among a number of
children (sorters) by using an additional child (the feeder) to read
inputs and distribute data among them.  The parent then merges sorter
output.

This approach has some pros and cons in my view:

Pros:
- It's simple.  Sorters don't need to worry about what their siblings
are doing.  They just process the data they're fed.
- It doesn't require a known amount of data.  Work is distributed among
sorters by the feeder in small interleaved chunks.
- It doesn't require regular files.  Data coming through FIFOs or from
sub-pipelines via process substitution is no problem.
- It limits increased resource consumption.  The feeder is the only
process reading from disk until/unless the sorters need temporary files.
 NMERGE and SORT_BUFFER_SIZE are divided among sorters.

Cons:
- It's limited by the feeder.  If the sorters were able to read at their
own pace I think this would scale better.
- It uses N+2 processes.  When sorters are run in parallel there are two
helper processes, one feeding input and one merging output.

I've attached the results of some performance testing I did both on my
laptop (dual core) and on a server (4x dual core, hyper-threaded == 16
"processors" visible).  I included two graphs of the server results
which I thought were interesting.  Each line represents a level of
concurrency.  One graph shows time in seconds on the Y axis while the
other shows percentage of single-process time.  In both cases lower
lines indicate better performance.  As you can see, even on the 16
processor machine performance peaks at a low concurrency.

I included the script I used for testing in case anyone else is
interested and has a machine they're willing to run hot for a few hours.

The attached patch is also available for fetch from
git://repo.or.cz/coreutils/bo.git as branch 'sort'.

I haven't included any tests or documentation in the patch yet.  I was
hoping to first get a sense of whether you and other more experienced
coreutils developers consider this alternate approach to be worth pursuing.

Thanks,

Bo

PNG image

PNG image

>From e31f3f11a2d06079182ae7892e3af280dc4044cc Mon Sep 17 00:00:00 2001
From: Bo Borgerson <address@hidden>
Date: Wed, 18 Jun 2008 09:59:46 -0400
Subject: [PATCH] sort: Add new option --concurrency=N.

* src/sort.c (xfopen): Take an additional argument, FD. If FILE is NULL
then fdopen FD instead.  If FD is -1, use STDOUT as before.
(specify_concurrency): Process the --concurrency=N option value.
(check): Use new xfopen calling convention.  Pass -1 for FD.
(mergefps): Use new xfopen calling convention.  Pass the FILE's FD for input
and SORTER_OUTPUT_FD for output.
(sort): If MAX_CONCURRENCY allows, try to fork off SORTER children.
Fork off a final child (the FEEDER) to read inputs and distribute among
SORTER children.  Merge SORTER output in the parent.
---
 src/sort.c |  288 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 263 insertions(+), 25 deletions(-)

diff --git a/src/sort.c b/src/sort.c
index 2039dab..18a8882 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -101,6 +101,11 @@ enum
 
 enum
   {
+    /* The number of times we should try to fork a child to help with
+       a large sort.  We can always sort everything ourselves if need
+       be so this number can be small. */
+    MAX_FORK_TRIES_SORT = 2,
+
     /* The number of times we should try to fork a compression process
        (we retry if the fork call fails).  We don't _need_ to compress
        temp files, this is just to reduce disk access, so this number
@@ -223,6 +228,10 @@ static struct month monthtab[] =
   {"SEP", 9}
 };
 
+/* How much data a parallel sort feeder will give to each sorter
+   at a time. */
+#define FEEDER_BUF_SIZE 65536
+
 /* During the merge phase, the number of files to merge at once. */
 #define NMERGE_DEFAULT 16
 
@@ -285,10 +294,19 @@ static struct keyfield *keylist;
 /* Program used to (de)compress temp files.  Must accept -d.  */
 static char const *compress_program;
 
+/* Maximum number of sorters that may be run in parallel.
+   This can be modified on the command line with the --concurrency
+   option. */
+static unsigned int max_concurrency = 1;
+
 /* Maximum number of files to merge in one go.  If more than this
    number are present, temp files will be used. */
 static unsigned int nmerge = NMERGE_DEFAULT;
 
+/* If multiple sorters are run in parallel, then they will send
+   their output through a pipe to the parent who will merge. */
+static int sorter_output_fd = -1;
+
 static void sortlines_temp (struct line *, size_t, struct line *);
 
 /* Report MESSAGE for FILE, then clean up and exit.
@@ -357,6 +375,9 @@ Other options:\n\
   -C, --check=quiet, --check=silent  like -c, but do not report first bad 
line\n\
       --compress-program=PROG  compress temporaries with PROG;\n\
                               decompress them with PROG -d\n\
+"), stdout);
+      fputs (_("\
+      --concurrency=N       run up to N sorters in parallel\n\
       --files0-from=F       read input from the files specified by\n\
                             NUL-terminated names in file F\n\
 "), stdout);
@@ -413,6 +434,7 @@ enum
 {
   CHECK_OPTION = CHAR_MAX + 1,
   COMPRESS_PROGRAM_OPTION,
+  CONCURRENCY_OPTION,
   FILES0_FROM_OPTION,
   NMERGE_OPTION,
   RANDOM_SOURCE_OPTION,
@@ -426,6 +448,7 @@ static struct option const long_options[] =
   {"ignore-leading-blanks", no_argument, NULL, 'b'},
   {"check", optional_argument, NULL, CHECK_OPTION},
   {"compress-program", required_argument, NULL, COMPRESS_PROGRAM_OPTION},
+  {"concurrency", required_argument, NULL, CONCURRENCY_OPTION},
   {"dictionary-order", no_argument, NULL, 'd'},
   {"ignore-case", no_argument, NULL, 'f'},
   {"files0-from", required_argument, NULL, FILES0_FROM_OPTION},
@@ -516,6 +539,7 @@ struct sortfile
 {
   char const *name;
   pid_t pid;     /* If compressed, the pid of compressor, else zero */
+  int fd;       /* For communication between instances of sort. */
 };
 
 /* A table where we store compression process states.  We clean up all
@@ -755,12 +779,17 @@ create_temp_file (int *pfd)
    opening an ordinary FILE.  */
 
 static FILE *
-xfopen (const char *file, const char *how)
+xfopen (const char *file, const char *how, int fd)
 {
   FILE *fp;
 
   if (!file)
-    fp = stdout;
+    {
+      if (0 <= fd)
+       fp = fdopen (fd, how);
+      else
+       fp = stdout;
+    }
   else if (STREQ (file, "-") && *how == 'r')
     {
       have_read_stdin = true;
@@ -1068,6 +1097,43 @@ inittables (void)
 #endif
 }
 
+/* Specify how many sorters may be run in parallel.
+   This may be set on the command-line with the
+   --concurrency option. */
+static void
+specify_concurrency (int oi, char c, char const *s)
+{
+  uintmax_t n;
+  unsigned int max_max_concurrency = (unsigned int) MAX_NMERGE;
+  enum strtol_error e = xstrtoumax (s, NULL, 10, &n, NULL);
+
+  if (e == LONGINT_OK)
+    {
+      max_concurrency = n;
+      if (max_concurrency != n)
+       e = LONGINT_OVERFLOW;
+      else
+       {
+         if (max_concurrency < 1)
+           {
+             error (0, 0, _("invalid --%s argument %s"),
+                    long_options[oi].name, quote(s));
+             error (SORT_FAILURE, 0,
+                    _("minimum --%s argument is %s"),
+                    long_options[oi].name, quote("1"));
+           }
+         else if (max_max_concurrency < max_concurrency)
+           {
+             e = LONGINT_OVERFLOW;
+           }
+         else
+           return;
+       }
+    }
+
+  xstrtol_fatal (e, oi, c, long_options, s);
+}
+
 /* Specify how many inputs may be merged at once.
    This may be set on the command-line with the
    --batch-size option. */
@@ -2015,7 +2081,7 @@ compare (const struct line *a, const struct line *b)
 static bool
 check (char const *file_name, char checkonly)
 {
-  FILE *fp = xfopen (file_name, "r");
+  FILE *fp = xfopen (file_name, "r", -1);
   struct buffer buf;           /* Input buffer. */
   struct line temp;            /* Copy of previous line. */
   size_t alloc = 0;
@@ -2134,7 +2200,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t 
nfiles,
     {
       fps[i] = (files[i].pid
                ? open_temp (files[i].name, files[i].pid)
-               : xfopen (files[i].name, "r"));
+               : xfopen (files[i].name, "r", files[i].fd));
       initbuf (&buffer[i], sizeof (struct line),
               MAX (merge_buffer_size, sort_size / nfiles));
       if (fillbuf (&buffer[i], fps[i], files[i].name))
@@ -2161,7 +2227,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t 
nfiles,
     }
 
   if (! ofp)
-    ofp = xfopen (output_file, "w");
+    ofp = xfopen (output_file, "w", sorter_output_fd);
 
   /* Set up the ord table according to comparisons among input lines.
      Since this only reorders two items if one is strictly greater than
@@ -2547,26 +2613,176 @@ merge (struct sortfile *files, size_t ntemps, size_t 
nfiles,
 /* Sort NFILES FILES onto OUTPUT_FILE. */
 
 static void
-sort (char * const *files, size_t nfiles, char const *output_file)
+sort (char * const *incoming_files, size_t nfiles, char const *output_file)
 {
+  size_t i;
   struct buffer buf;
   size_t ntemps = 0;
   bool output_file_created = false;
+  char **files = (char **)incoming_files;
+  char *null_files[1] = { NULL };
+  size_t n_sorters = 1;
+  int sorter_input_fd = -1;
+  char *feeder_fp_string = "sort process";
+  bool is_sort_feeder = false;
+  size_t feeder_rotator = -1;
+  FILE **feeder_fps = NULL;
 
   buf.alloc = 0;
 
+  /* Each sorter needs to be able to merge temp files
+     if it creates any.  Therefore we can't chop NMERGE
+     up among more than NMERGE / 2 sorters. */
+  if (1 < max_concurrency)
+    n_sorters = MIN (max_concurrency, nmerge / 2);
+
+  /* If we can run multiple sorters in parallel, then we'll fork off
+     N_SORTERS children to do the work, plus one child to distribute
+     input among them.  We'll then merge sorter output ourselves. */
+  if (1 < n_sorters)
+    {
+      pid_t pid = 1; /* Primed for initial entry into the loop below. */
+      int pipe_to_fds[2];
+      int pipe_from_fds[2];
+      int *pipe_to_sorter = xnmalloc (n_sorters, sizeof *pipe_to_sorter);
+      int *pipe_from_sorter = xnmalloc (n_sorters, sizeof *pipe_from_sorter);
+
+      /* Fork off N_SORTERS children.  Only continue to fork if we're
+        the parent and we haven't failed a fork yet. */
+      for (i = 0; i < n_sorters && 0 < pid; i++)
+       {
+         if (pipe (pipe_from_fds) < 0)
+           error (SORT_FAILURE, errno, _("pipe failed"));
+
+         pid = pipe_fork (pipe_to_fds, MAX_FORK_TRIES_SORT);
+
+         if (0 < pid)
+           {
+             /* Parent. */
+             close(pipe_to_fds[0]);
+             close(pipe_from_fds[1]);
+             pipe_to_sorter[i] = pipe_to_fds[1];
+             pipe_from_sorter[i] = pipe_from_fds[0];
+             register_proc (pid);
+           }
+         else if (0 == pid)
+           {
+             size_t j;
+
+             /* Child. */
+             nprocs = 0;
+
+             /* Divide NMERGE here, but not SORT_SIZE, since
+                the result of SORT_BUFFER_SIZE takes this into
+                account and is divided below. */
+             nmerge /= n_sorters;
+
+             for (j = 0; j < i; j++)
+               close (pipe_to_sorter[j]);
+
+             close (pipe_to_fds[1]);
+             close (pipe_from_fds[0]);
+             sorter_input_fd = pipe_to_fds[0];
+             sorter_output_fd = pipe_from_fds[1];
+
+             nfiles = 1;
+             files = null_files;
+             output_file = NULL;
+           }
+         else
+           {
+             size_t j;
+
+             /* Failed to fork.  Now we'll just run single-process. */
+             n_sorters = 1;
+             for (j = 0; j < i; j++)
+               {
+                 close (pipe_to_sorter[j]);
+                 close (pipe_from_sorter[j]);
+               }
+
+           }
+
+       }
+
+      /* Only try to fork off a feeder if we're the parent and
+         we didn't fail a fork above. */
+      if (0 < pid)
+       {
+
+         /* We haven't created any temp files yet, so don't need
+            to worry about a critical section here. */
+         pid = fork ();
+         if (0 < pid)
+           {
+             /* Parent. */
+             struct sortfile *mfiles = xnmalloc (n_sorters, sizeof *mfiles);
+
+             nmerge = n_sorters;
+
+             register_proc (pid);
+             for (i = 0; i < n_sorters; i++)
+               {
+                 close (pipe_to_sorter[i]);
+                 mfiles[i].name = NULL;
+                 mfiles[i].pid = 0;
+                 mfiles[i].fd = pipe_from_sorter[i];
+               }
+             mergefps (mfiles, 0, n_sorters, NULL, output_file);
+             free (mfiles);
+             exit (EXIT_SUCCESS);
+           }
+         else if (0 == pid)
+           {
+             /* Child. */
+             is_sort_feeder = true;
+             feeder_fps = xnmalloc (n_sorters, sizeof *feeder_fps);
+             nprocs = 0;
+
+             for (i = 0; i < n_sorters; i++)
+               {
+                 close (pipe_from_sorter[i]);
+                 feeder_fps[i] = xfopen (NULL, "w", pipe_to_sorter[i]);
+               }
+           }
+         else
+           {
+             size_t j;
+
+             /* Failed to fork.  Now we'll just run single-process. */
+             n_sorters = 1;
+             for (j = 0; j < i; j++)
+               {
+                 close (pipe_to_sorter[j]);
+                 close (pipe_from_sorter[j]);
+               }
+
+           }
+
+
+       }
+
+      free (pipe_to_sorter);
+      free (pipe_from_sorter);
+    }
+
   while (nfiles)
     {
       char const *temp_output;
       char const *file = *files;
-      FILE *fp = xfopen (file, "r");
+      FILE *fp = xfopen (file, "r", sorter_input_fd);
       FILE *tfp;
       size_t bytes_per_line = (2 * sizeof (struct line)
                               - sizeof (struct line) / 2);
 
       if (! buf.alloc)
-       initbuf (&buf, bytes_per_line,
-                sort_buffer_size (&fp, 1, files, nfiles, bytes_per_line));
+       {
+         size_t size = sort_buffer_size (&fp, 1, files, nfiles,
+                                         bytes_per_line);
+         initbuf (&buf, bytes_per_line, is_sort_feeder
+                  ? FEEDER_BUF_SIZE
+                  : MAX (size / n_sorters, MIN_SORT_SIZE));
+       }
       buf.eof = false;
       files++;
       nfiles--;
@@ -2589,32 +2805,40 @@ sort (char * const *files, size_t nfiles, char const 
*output_file)
 
          line = buffer_linelim (&buf);
          linebase = line - buf.nlines;
-         if (1 < buf.nlines)
+         if (1 < buf.nlines && !is_sort_feeder)
            sortlines (line, buf.nlines, linebase);
-         if (buf.eof && !nfiles && !ntemps && !buf.left)
-           {
-             xfclose (fp, file);
-             tfp = xfopen (output_file, "w");
-             temp_output = output_file;
-             output_file_created = true;
-           }
-         else
+
+         if (!is_sort_feeder)
            {
-             ++ntemps;
-             temp_output = create_temp (&tfp, NULL);
+             if (buf.eof && !nfiles && !ntemps && !buf.left)
+               {
+                 xfclose (fp, file);
+                 tfp = xfopen (output_file, "w", sorter_output_fd);
+                 temp_output = output_file;
+                 output_file_created = true;
+               }
+             else
+               {
+                 ++ntemps;
+                 temp_output = create_temp (&tfp, NULL);
+               }
            }
 
+         feeder_rotator = (feeder_rotator + 1) % n_sorters;
          do
            {
              line--;
-             write_bytes (line->text, line->length, tfp, temp_output);
-             if (unique)
+             write_bytes (line->text, line->length,
+                          (is_sort_feeder ? feeder_fps[feeder_rotator] : tfp),
+                          (is_sort_feeder ? feeder_fp_string : temp_output));
+             if (unique && !is_sort_feeder)
                while (linebase < line && compare (line, line - 1) == 0)
                  line--;
            }
          while (linebase < line);
 
-         xfclose (tfp, temp_output);
+         if (!is_sort_feeder)
+           xfclose (tfp, temp_output);
 
          /* Free up some resources every once in a while.  */
          if (MAX_PROCS_BEFORE_REAP < nprocs)
@@ -2625,11 +2849,16 @@ sort (char * const *files, size_t nfiles, char const 
*output_file)
        }
       xfclose (fp, file);
     }
+  if (is_sort_feeder)
+    {
+      for (i = 0; i < n_sorters; i++)
+       xfclose (feeder_fps[i], feeder_fp_string);
+    }
 
  finish:
   free (buf.buf);
 
-  if (! output_file_created)
+  if (! output_file_created && ! is_sort_feeder)
     {
       size_t i;
       struct tempnode *node = temphead;
@@ -2638,11 +2867,13 @@ sort (char * const *files, size_t nfiles, char const 
*output_file)
        {
          tempfiles[i].name = node->name;
          tempfiles[i].pid = node->pid;
+         tempfiles[i].fd = -1;
          node = node->next;
        }
       merge (tempfiles, ntemps, ntemps, output_file);
       free (tempfiles);
     }
+  free (feeder_fps);
 }
 
 /* Insert a malloc'd copy of key KEY_ARG at the end of the key list.  */
@@ -3045,6 +3276,10 @@ main (int argc, char **argv)
          compress_program = optarg;
          break;
 
+       case CONCURRENCY_OPTION:
+         specify_concurrency (oi, c, optarg);
+         break;
+
        case FILES0_FROM_OPTION:
          files_from = optarg;
          break;
@@ -3342,7 +3577,10 @@ main (int argc, char **argv)
       size_t i;
 
       for (i = 0; i < nfiles; ++i)
-       sortfiles[i].name = files[i];
+       {
+         sortfiles[i].name = files[i];
+         sortfiles[i].fd = -1;
+       }
 
       merge (sortfiles, 0, nfiles, outfile);
       IF_LINT (free (sortfiles));
-- 
1.5.4.3

   100000|   0.1906228|   0.1556060|   0.1382010|   0.1847012| 81.63| 72.50| 
96.89
   200000|   0.4451538|   0.3412880|   0.2957615|   0.4051538| 76.67| 66.44| 
91.01
   300000|   0.7198182|   0.5199583|   0.4379207|   0.6067063| 72.23| 60.84| 
84.29
   400000|   1.0106702|   0.8450417|   0.7137242|   0.9182343| 83.61| 70.62| 
90.85
   500000|   1.3078968|   1.0394412|   0.8798115|   1.1439638| 79.47| 67.27| 
87.47
   600000|   1.6130032|   1.2450147|   1.0409660|   1.3031688| 77.19| 64.54| 
80.79
   700000|   1.9271835|   1.4166605|   1.1685928|   1.4753900| 73.51| 60.64| 
76.56
   800000|   2.2438350|   1.6335482|   1.3515583|   1.7054205| 72.80| 60.23| 
76.00
   900000|   2.5710715|   1.8231898|   1.5206178|   1.8906265| 70.91| 59.14| 
73.53
  1000000|   2.9012027|   2.0286207|   1.6971475|   2.0910478| 69.92| 58.50| 
72.08
  1100000|   3.2458153|   2.2611912|   1.8628848|   2.3280478| 69.66| 57.39| 
71.72
  1200000|   3.5856758|   2.5007925|   2.0568455|   2.5737242| 69.74| 57.36| 
71.78
  1300000|   3.9337555|   2.6653608|   2.2503940|   2.7952358| 67.76| 57.21| 
71.06
  1400000|   4.2854873|   2.8936715|   2.4100730|   3.4820888| 67.52| 56.24| 
81.25
  1500000|   4.6423165|   3.1215818|   2.6142138|   3.2383150| 67.24| 56.31| 
69.76
  1600000|   4.9924363|   3.3084982|   2.7967617|   3.6504693| 66.27| 56.02| 
73.12
  1700000|   5.3540507|   3.5569800|   3.0313347|   4.0734178| 66.44| 56.62| 
76.08
  1800000|   5.7141558|   3.7599663|   3.2499802|   4.4037373| 65.80| 56.88| 
77.07
  1900000|   6.0782415|   3.9790547|   3.6707142|   4.3455380| 65.46| 60.39| 
71.49
  2000000|   6.4462162|   4.2210610|   3.8272147|   4.5338680| 65.48| 59.37| 
70.33
  2100000|   6.8812700|   4.6267833|   4.3364473|   5.4376530| 67.24| 63.02| 
79.02
  2200000|   7.1887588|   4.6739705|   4.4296760|   5.4261320| 65.02| 61.62| 
75.48
  2300000|   7.5610612|   4.9202452|   4.4772447|   5.3311318| 65.07| 59.21| 
70.51
  2400000|   7.9319965|   5.0495007|   4.5420917|   5.7509852| 63.66| 57.26| 
72.50
  2500000|   8.3089555|   5.3271048|   4.7332642|   6.0868267| 64.11| 56.97| 
73.26
  2600000|   8.7012622|   5.6526515|   6.2418323|   5.9347538| 64.96| 71.73| 
68.21
  2700000|   9.0890400|   6.4151675|   5.3501668|   6.5119353| 70.58| 58.86| 
71.65
  2800000|   9.4808677|   6.2532420|   5.6631027|   7.8091037| 65.96| 59.73| 
82.37
  2900000|   9.8743502|   7.0634255|   6.2118825|   8.8201260| 71.53| 62.91| 
89.32
  3000000|  10.3256523|   6.9511538|   6.6591052|   8.3594027| 67.32| 64.49| 
80.96
  3100000|  10.6624638|   9.1044838|   6.9426505|   9.0756973| 85.39| 65.11| 
85.12
  3200000|  11.0391202|   9.3681565|   6.5995335|   8.3042878| 84.86| 59.78| 
75.23
  3300000|  11.4409238|   8.9037127|   7.9782872|   9.8895037| 77.82| 69.73| 
86.44
  3400000|  11.8337323|   9.3826530|   7.9003570|   8.8188125| 79.29| 66.76| 
74.52
  3500000|  12.2619288|   9.9854242|   8.4004758|  10.8277235| 81.43| 68.51| 
88.30
  3600000|  12.6350608|   9.2284535|   8.7684477|   9.6883167| 73.04| 69.40| 
76.68
  3700000|  13.0290287|  10.0219423|   7.9458920|   9.2172728| 76.92| 60.99| 
70.74
  3800000|  13.4175375|  10.2652730|   9.3283597|  10.5036505| 76.51| 69.52| 
78.28
  3900000|  13.8761202|  10.3337800|   9.0785883|  10.5145488| 74.47| 65.43| 
75.77
  4000000|  14.2307890|  10.2822863|   9.4879767|  10.2550468| 72.25| 66.67| 
72.06
  4100000|  14.6372533|  10.5736817|  10.1029532|  10.3891662| 72.24| 69.02| 
70.98
  4200000|  15.0336058|  10.5337068|  10.5134845|  10.2025398| 70.07| 69.93| 
67.86
  4300000|  15.5160577|  10.6842507|  10.9422165|  11.3655050| 68.86| 70.52| 
73.25
  4400000|  15.8729192|  11.0660497|  10.0416582|  11.1661598| 69.72| 63.26| 
70.35
  4500000|  16.2857555|  12.2741648|  10.9625747|  11.5246120| 75.37| 67.31| 
70.76
  4600000|  16.6928295|  12.8736073|  10.8035407|  12.2006385| 77.12| 64.72| 
73.09
  4700000|  17.1521597|  13.8311882|  11.0123413|  12.2692403| 80.64| 64.20| 
71.53
  4800000|  17.5410083|  13.0844618|  11.5400537|  12.0654205| 74.59| 65.79| 
68.78
  4900000|  17.9435752|  13.6814610|  11.5714498|  12.6910732| 76.25| 64.49| 
70.73
  5000000|  18.3648375|  14.0036917|  12.2294732|  12.8078838| 76.25| 66.59| 
69.74
  5100000|  18.9270780|  12.8158653|  12.4638828|  14.0577827| 67.71| 65.85| 
74.27
  5200000|  19.2467488|  13.3429303|  11.8614190|  13.7861637| 69.33| 61.63| 
71.63
  5300000|  19.6421107|  13.2796332|  13.6513180|  14.5850862| 67.61| 69.50| 
74.25
  5400000|  20.0669643|  13.4907910|  13.1331893|  14.4872580| 67.23| 65.45| 
72.19
  5500000|  20.5011755|  14.5732598|  13.1180265|  15.1439817| 71.08| 63.99| 
73.87
  5600000|  20.9285125|  14.1670923|  13.5733598|  14.3198320| 67.69| 64.86| 
68.42
  5700000|  21.3498558|  14.4627755|  13.6241375|  15.2364483| 67.74| 63.81| 
71.37
  5800000|  21.7924077|  15.7756390|  14.5123528|  15.1231967| 72.39| 66.59| 
69.40
  5900000|  22.1947355|  15.6677667|  14.3490183|  15.8053218| 70.59| 64.65| 
71.21
  6000000|  22.6459672|  16.2716808|  14.4820557|  16.1525207| 71.85| 63.95| 
71.33
  6100000|  23.0276338|  16.9281797|  14.4261415|  18.8581180| 73.51| 62.65| 
81.89
  6200000|  23.4669493|  17.9332017|  14.5637997|  19.5159758| 76.42| 62.06| 
83.16
  6300000|  23.8996508|  17.7366787|  15.0269500|  19.9112725| 74.21| 62.88| 
83.31
  6400000|  24.3140250|  18.0436733|  15.3132077|  19.5136237| 74.21| 62.98| 
80.26
  6500000|  24.7543227|  18.7219157|  16.0110135|  20.5100225| 75.63| 64.68| 
82.85
  6600000|  25.1870445|  18.8805880|  16.7374748|  20.8851000| 74.96| 66.45| 
82.92
  6700000|  25.6177127|  19.1286965|  16.2888668|  19.8220487| 74.67| 63.58| 
77.38
  6800000|  26.0713058|  18.7273907|  16.1651245|  21.2867268| 71.83| 62.00| 
81.65
  6900000|  26.4950808|  19.4449252|  17.0182128|  20.7438385| 73.39| 64.23| 
78.29
  7000000|  26.9275547|  19.8568485|  17.1617438|  21.3584727| 73.74| 63.73| 
79.32
   100000|   0.3978673|   0.2998532|   0.3078590|   0.3411495| 75.37| 77.38| 
85.74
   200000|   0.8789767|   0.6368077|   0.6576302|   0.7353088| 72.45| 74.82| 
83.66
   300000|   1.3983128|   1.0064648|   1.0500970|   1.1425463| 71.98| 75.10| 
81.71
   400000|   1.9359040|   1.3721943|   1.4085553|   1.5700445| 70.88| 72.76| 
81.10
   500000|   2.4800897|   1.7563010|   1.8060878|   1.9932325| 70.82| 72.82| 
80.37
   600000|   3.0490373|   2.3926592|   2.4304380|   2.6741178| 78.47| 79.71| 
87.70
   700000|   3.6197758|   2.7747943|   2.8082768|   3.0912290| 76.66| 77.58| 
85.40
   800000|   4.3521243|   3.1423017|   3.2139710|   3.7783640| 72.20| 73.85| 
86.82
   900000|   5.0628238|   4.1079845|   3.6200043|   3.9675730| 81.14| 71.50| 
78.37
  1000000|   5.6434655|   4.3169872|   4.6071705|   5.3096482| 76.50| 81.64| 
94.08
  1100000|   6.2047003|   4.4521780|   5.4187478|   5.0962573| 71.75| 87.33| 
82.14
  1200000|   6.7984438|   4.9726417|   5.3031598|   5.6738918| 73.14| 78.01| 
83.46
  1300000|   7.3482207|   5.3462308|   5.4965155|   6.1964967| 72.76| 74.80| 
84.33
  1400000|   8.0316267|   5.7622462|   5.9439435|   6.5836235| 71.74| 74.01| 
81.97
  1500000|   9.1656737|   6.2962348|   6.5649298|   7.1456588| 68.69| 71.63| 
77.96
  1600000|  10.3919693|   6.7178642|   7.1005757|   7.7116248| 64.64| 68.33| 
74.21
  1700000|  11.6117077|   7.8524675|   7.9899193|   9.7739287| 67.63| 68.81| 
84.17
  1800000|  11.4018983|   8.0053415|   8.5621322|   9.5183545| 70.21| 75.09| 
83.48
  1900000|  12.1405887|   8.4415043|   9.2024107|  10.0406660| 69.53| 75.80| 
82.70
  2000000|  12.3381950|  10.2043183|   9.8370225|  10.1819245| 82.71| 79.73| 
82.52
  2100000|  12.8672953|  10.2638367|  10.3439703|  10.5861912| 79.77| 80.39| 
82.27
  2200000|  13.4833542|  10.9447213|  10.8687502|  11.0462952| 81.17| 80.61| 
81.93
  2300000|  14.2852452|  11.1499217|  10.9930007|  14.1792202| 78.05| 76.95| 
99.26
  2400000|  14.8447325|  11.3651677|  11.3474408|  14.3109407| 76.56| 76.44| 
96.40
  2500000|  15.4321858|  11.7137833|  11.8522885|  14.3483687| 75.90| 76.80| 
92.98
  2600000|  16.1901970|  12.3081988|  12.1368495|  14.4641163| 76.02| 74.96| 
89.34
  2700000|  16.7516397|  12.4508170|  12.5707602|  14.9853693| 74.33| 75.04| 
89.46
  2800000|  17.4807155|  13.1052888|  16.7829440|  16.6672353| 74.97| 96.01| 
95.35
  2900000|  18.0867305|  13.4141880|  16.4978302|  17.1204150| 74.17| 91.22| 
94.66
  3000000|  19.0745635|  13.8309870|  16.1753772|  17.0255425| 72.51| 84.80| 
89.26
  3100000|  19.7385772|  14.7141372|  17.9776735|  17.6391917| 74.55| 91.08| 
89.36
  3200000|  20.7537483|  14.9303838|  17.8117857|  18.0187665| 71.94| 85.82| 
86.82
  3300000|  20.9705468|  15.2268568|  18.0852695|  17.8424908| 72.61| 86.24| 
85.08
  3400000|  21.8229760|  16.3108150|  18.4166485|  19.7988522| 74.74| 84.39| 
90.72
  3500000|  22.9095148|  16.8159358|  19.0675020|  19.8558135| 73.40| 83.23| 
86.67
  3600000|  23.5924702|  17.4861533|  19.5627023|  20.1191955| 74.12| 82.92| 
85.28
  3700000|  24.7362342|  17.9497067|  19.1692668|  20.6357860| 72.56| 77.49| 
83.42
  3800000|  26.5868427|  18.4400995|  20.6527830|  21.1587232| 69.36| 77.68| 
79.58
#!/usr/bin/perl -w

use strict;
use Time::HiRes qw(gettimeofday tv_interval);

my ($prog, $max_conc) = @ARGV;

$max_conc ||= 4;

$prog or die "usage: $0 SORT_PROGRAM [MAX_CONCURRENCY]\n";

my @points = map{$_*100_000}1..100;

# Number of samples (excluding outlayers) for each point.
my $n = 10;

my @conc = 1..$max_conc;

my @rot = 0..($max_conc - 1);

my @times = ();

my $t0;
for my $i (@points) {
  @times = map { [] } 1..$max_conc;

  # Make fresh data for each sample.  Rotate who gets first crack
  # at the data.
  for (0..$n+3) {
    system "cat /dev/urandom | base64 | head -$i | cut -da -f1 > bench_in";
    for my $j (0..($max_conc - 1)) {
      $t0 = [gettimeofday()];
      system "$prog --concurrency=$conc[$rot[$j]] bench_in > /dev/null";
      push @{$times[$rot[$j]]}, tv_interval($t0);
    }
    push @rot, shift @rot; # rotate who goes first
  }

  # Take the average time for each concurrency level.
  for my $j (0..($max_conc - 1)){
    my $total = 0;
    # Drop two outlayers on each end
    for my $val ((sort {$a<=>$b} @{$times[$j]})[2..$n-3]) {
      $total += $val;
    }
    $times[$j] = $total / ($n - 4);
  }

  # Output
  # 1. Number of records
  # 2. Average time for each concurrency (including single process)
  # 3. Percentage of single-process time each concurrency level took
  print join("|",
    sprintf ('%9d', $i),
    +(map { sprintf ('%12.7f', $_) address@hidden($max_conc - 1)]),
    +(map { sprintf ('%6.2f', $_/$times[0]*100) address@hidden($max_conc - 1)]),
  ),"\n";
}

reply via email to

[Prev in Thread] Current Thread [Next in Thread]