bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parallelizing sort


From: Jim Meyering
Subject: Re: Parallelizing sort
Date: Thu, 12 Feb 2009 11:14:59 +0100

Glen Lenker <address@hidden> wrote:
> On Wed, Feb 11, 2009 at 10:12:42PM -0800, Nima Nikzad wrote:
...
> > Specifically, we are looking at how we
> > can better utilize modern multi-core processors when sorting and merging
> > files.  My group is first taking a look at how to parallelize the external
> > sort phase of sort's operation.
...
> > Does anyone have any feedback on threading technologies
> > that would be appropriate for this project or does anyone think OpenMP is a
> > poor choice?
...
> As mentioned above, my group is primarily focused on parallelizing the
> "in-memory" sort, but at the moment we are still considering working to
> parallelize other aspects of the current sort as well. At the moment we
> are considering using the Gnulib thread module as our threading library.
>
> Jim: We heard from Paul Eggert today that you have looked into how sort
> would benefit from parallelization. I am particularly interested in your
> approach. If you don't mind, perhaps you could start the discussion on
> this.

Hello Nima and Glen,

I have a bias toward using pthreads.
If pthreads support is unavailable, then it's fine to fall back
on the existing single-threaded code... at least initially.

For an example of pthreads-using code, git comes to mind,
with its parallelized delta-finding code.  It has proven to
be quite portable.

You've probably already done this, but at a high level, starting
with all input already in memory, I imagined a function by which
each thread would identify the lines representing its portion:

    static void
    dice (unsigned int n_threads, unsigned int k, char eolchar,
          char const *buf, size_t buf_len,
          char const **b, size_t *b_len)

("dice" as in "slice and dice", not the dotted cubes)

Looking at the results from a global perspective,
if B[i] and B_LEN[i] describe the buffer of the I'th thread,
then B[0] == BUF, B[k]+B_LEN[k] == B[k+1], and
each B[k] begins on a new "line", and
B[k][B_LEN[k]-1] is either eolchar or one past the end of BUF.
Of course, some (or even all!) may be empty.

The code to extract the kth portion of a file might be
useful in its own right.  Consider a tool that lets you
efficiently extract from a regular file the K'th (of N) portion
(given an optional single-byte record delimiter) to standard output.
Then you could perform a parallel sort using only a shell
with process substitution like bash or zsh:

Given an input file, INPUT, you might do this on a quad-core system:

    sort -m \
        <(sort <(dice -k1,4 INPUT)) \
        <(sort <(dice -k2,4 INPUT)) \
        <(sort <(dice -k3,4 INPUT)) \
        <(sort <(dice -k4,4 INPUT)) \
      > out

Of course, this will work best when INPUT is already in memory,
say on a memory-backed file system, or on an SSD, where there is
no penalty for "seeking".

If nothing else, it would give you another easily-computed data point
when comparing parallelized-sort implementations.

On Wed, Feb 11, 2009 at 10:12:42PM -0800, Nima Nikzad wrote:
> Does anyone have any feedback on threading technologies
> that would be appropriate for this project or does anyone think OpenMP is a
> poor choice?

Starting with OpenMP is fine, especially if that's
what you're most familiar with, but I suspect that GNU sort
will settle on a minimalistic approach, even if that means
writing more low-level code.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]