bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort --random-sort


From: Frederik Eaton
Subject: Re: sort --random-sort
Date: Sat, 26 Nov 2005 07:21:47 +0000
User-agent: mutt-ng/devel-r472 (Debian)

Actually, here is a new version which should hopefully fix some
indentation problems.

Frederik

On Sat, Nov 26, 2005 at 07:03:31AM +0000, Frederik Eaton wrote:
> Hi all,
> 
> This is my second (or third? or fourth?) attempt at a patch to sort to
> introduce shuffling behavior. However, I encourage people to take a
> good look at it because it's much more "polished" than the others. 
> I've included everything I think should be included - documentation,
> ChangeLog entries, etc., so hopefully there won't have to be many more
> changes.
> 
> I'm sorry it's taken me so long to return to this - I've moved
> countries, been occupied with grad school, etc.
> 
> There are two patches:
> 
> - shred-isaac-split.patch
>     splits ISAAC code out of shred.c into new files isaac.c, isaac.h
> 
> - sort-with-isaac.patch
>     adds --sort-random etc. to sort.c, using ISAAC RNG.
> 
> They should be applied in this order.
> 
> Notes:
> 
> - The randomization is "cryptographically secure" due to use of ISAAC,
> as has been requested by Paul Eggert. Earlier versions used a separate
> random number implementation but it was pointed out that shred.c
> already contained the necessary code (based on a library from Bob
> Jenkins called ISAAC), so the main change in this version has been to
> separate out our version of ISAAC from shred.c and use it instead. 
> Ultimately I hope this can be moved to glibc, so that making things
> cryptographically secure doesn't require any extra effort, but that's
> a separate issue.
> 
> - The second patch changes a constant ISAAC_LOG in isaac.h from 8 to
> 3. The code explicitly says that this can be done, so, going by that
> I'm assuming that the change is safe. Theoretically this could make
> "shred" slower but I haven't noticed any effect, if there is one it is
> surely less than 1%. The reason for the change is that the code I
> added to 'sort' has to copy the ISAAC state twice with every
> comparison, so when the ISAAC state is too large it slows things down
> a bit. The change speeds "sort -R" up by a factor of 14 on a 1e5-line
> test file.
> 
> $ wc -l /usr/share/dict/words
> 96274 /usr/share/dict/words
> $ time ./inst-log-3/bin/sort -R < /usr/share/dict/words >/dev/null
> ./inst-log-3/bin/sort -R < /usr/share/dict/words > /dev/null  1.00s user 
> 0.01s system 99% cpu 1.016 total
> $ time ./inst-log-8/bin/sort -R < /usr/share/dict/words >/dev/null
> ./inst-log-8/bin/sort -R < /usr/share/dict/words > /dev/null  13.98s user 
> 0.01s system 99% cpu 14.010 total
> 
> Now the speed of random-sort is comparable to normal sort, it is only
> about 50% slower (in my earlier patch it was about 5 times slower).
> 
> $ time ./inst-log-3/bin/sort < /usr/share/dict/words >/dev/null
> ./inst-log-3/bin/sort < /usr/share/dict/words > /dev/null  0.68s user 0.01s 
> system 99% cpu 0.695 total
> 
> - I recall some argument about randomness vs. pseudorandomness. I do
> not mention pseudorandomness in the documentation, because I maintain
> that it is not part of the interface. It is not mentioned in the
> "shred" documentation either (only the word "random" is used), as I
> believe is proper.
> 
> - I'm using "--random-sort" as the long option name, rather than
> "--random" as in the previous versions, to be consistent with
> "--numeric-sort", "--month-sort", etc.
> 
> Thanks to everyone who participated in the discussions regarding this
> feature, and who critiqued the earlier versions.
> 
> Frederik

Attachment: shred-isaac-split.patch
Description: Text document

Attachment: sort-with-isaac.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]