bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] sort: parallel external sort implementation


From: Pádraig Brady
Subject: Re: [PATCH] sort: parallel external sort implementation
Date: Fri, 05 Mar 2010 01:13:12 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100216 Thunderbird/3.0.2

On 05/03/10 00:39, Joey Degges wrote:


2010/3/4 Pádraig Brady <address@hidden <mailto:address@hidden>>
    Have you considered the seek characteristics of SSDs
    and how they might affect things (with consideration
    that mechanical disks will be replaced by SSDs quite quickly).
    There still would be some benefit splitting per SSD,
    but it would be worth measuring to see.


I will post some results testing with various flash keys but I do not
have any proper SSD drives to play with. The extreme case here would be
sorting from multiple ramdisks in which case there is likely to be no
improvements whatsoever --

Right. In general it's worth posting the results for
counter cases like this to help with decisions.

supposing the underlying "do_sort" function
can process a single file in parallel. In this worst case it might be
useful to expose a "--no-multidisk" flag allowing the user to disable
this feature (or a "--multidisk" flag to enable it).

I'm not fond of options for this because if the user
needed to make that decision, then they could nearly
as easily and more generally do:

sort -m <(find /flash/ | xargs -P2 -n1 sort) \
        <(find /mech/  | xargs -n1 sort)

        +          unsigned long int np2 = num_processors (NPROC_CURRENT) / 2;

    You probably want NPROC_CURRENT_OVERRIDABLE ?

Would we want to use the OpenMP environmental variable to affect the
number of pthreads that are used? A more generic PARALLEL variable might
be better suited.

That would be a better name but non standard.
OMP_NUM_THREADS can be used to config all OpenMP programs,
and also the coreutils nproc command honors it.
So for consistency at least _OVERRIDABLE might be best.

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]