[Emacs-bug-tracker] bug#7182: closed (sort -R slow)

emacs-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Emacs-bug-tracker] bug#7182: closed (sort -R slow)

From:	GNU bug Tracking System
Subject:	[Emacs-bug-tracker] bug#7182: closed (sort -R slow)
Date:	Sun, 07 Aug 2011 20:44:02 +0000

Your message dated Sun, 07 Aug 2011 22:42:52 +0200
with message-id <address@hidden>
and subject line Re: bug#7182: sort -R slow
has caused the GNU bug report #7182,
regarding sort -R slow
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
7182: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=7182
GNU Bug Tracking System
Contact address@hidden with problems

--- Begin Message --- Subject: sort -R slow Date: Sat, 9 Oct 2010 14:52:41 +0200

I recently needed to randomize some lines. So I tried using 'sort -R'.
I was astonished how slow that was. So I tested how slow a competing
strategies are. GNU sort is two magnitudes slower than unsort and more
than one magnitude slower than perl:

$ time unsort file
real    0m1.388s

$ unsort --version
unsort 1.1.2

$ time perl -e 'print sort { rand() <=> rand() } <>' file
real    0m6.621s

$ time sort -R file
real    4m8.403s

$ sort --version
sort (GNU coreutils) 8.5

What is even scarier: sort without -R is faster than sort -R:

$ time sort file
real    0m53.553s

I would expect sort -R to be faster than sort and faster than Perl if
not as fast as unsort.


/Ole

--- End Message ---

--- Begin Message --- Subject: Re: bug#7182: sort -R slow Date: Sun, 07 Aug 2011 22:42:52 +0200

Davide Brini wrote:
> On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange <address@hidden> wrote:
>
>> I recently needed to randomize some lines. So I tried using 'sort -R'.
>> I was astonished how slow that was. So I tested how slow a competing
>> strategies are. GNU sort is two magnitudes slower than unsort and more
>> than one magnitude slower than perl:
>>
>> $ time unsort file
>> real    0m1.388s
>>
>> $ unsort --version
>> unsort 1.1.2
>>
>> $ time perl -e 'print sort { rand() <=> rand() } <>' file
>> real    0m6.621s
>>
>> $ time sort -R file
>> real    4m8.403s
>>
>> $ sort --version
>> sort (GNU coreutils) 8.5
>>
>> What is even scarier: sort without -R is faster than sort -R:
>>
>> $ time sort file
>> real    0m53.553s
>>
>> I would expect sort -R to be faster than sort and faster than Perl if
>> not as fast as unsort.
>
> On my system, locale settings seem to impact the runtime significantly:
>
> $ wc -l bigfile
> 1000000 bigfile
>
> $ time LC_ALL=en_US.utf8 sort -R bigfile > /dev/null
>
> real  1m29.302s
> user  1m21.009s
> sys   0m0.155s
>
> $ time LC_ALL=C sort -R bigfile > /dev/null
>
> real  0m38.881s
> user  0m35.276s
> sys   0m0.118s
>
>
> However, shuf is much faster, and seems mostly unaffected by the locale
> used:
>
> $ time shuf bigfile > /dev/null
>
> real  0m1.044s
> user  0m0.833s
> sys   0m0.042s

Thanks for the report.
I think the performance of sort -R will often be worse
than that of shuf (by design, since it accesses each byte of each line
once more, to compute the hash), except when the input size is larger
than available memory.

The info documentation for sort -R does refer to "shuf".

Any suggestions for improvements are welcome.
I'm closing this.

You're welcome to reopen or file a new report.

--- End Message ---

[Prev in Thread]

Current Thread

[Next in Thread]

[Emacs-bug-tracker] bug#7182: closed (sort -R slow), GNU bug Tracking System <=

Prev by Date: [Emacs-bug-tracker] Processed: close 7060
Next by Date: [Emacs-bug-tracker] Processed: close 8620
Previous by thread: [Emacs-bug-tracker] Processed: close 7060
Next by thread: [Emacs-bug-tracker] Processed: close 8620
Index(es):
- Date
- Thread