[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
sort --random-sort
From: |
Frederik Eaton |
Subject: |
sort --random-sort |
Date: |
Sat, 26 Nov 2005 07:03:31 +0000 |
User-agent: |
mutt-ng/devel-r472 (Debian) |
Hi all,
This is my second (or third? or fourth?) attempt at a patch to sort to
introduce shuffling behavior. However, I encourage people to take a
good look at it because it's much more "polished" than the others.
I've included everything I think should be included - documentation,
ChangeLog entries, etc., so hopefully there won't have to be many more
changes.
I'm sorry it's taken me so long to return to this - I've moved
countries, been occupied with grad school, etc.
There are two patches:
- shred-isaac-split.patch
splits ISAAC code out of shred.c into new files isaac.c, isaac.h
- sort-with-isaac.patch
adds --sort-random etc. to sort.c, using ISAAC RNG.
They should be applied in this order.
Notes:
- The randomization is "cryptographically secure" due to use of ISAAC,
as has been requested by Paul Eggert. Earlier versions used a separate
random number implementation but it was pointed out that shred.c
already contained the necessary code (based on a library from Bob
Jenkins called ISAAC), so the main change in this version has been to
separate out our version of ISAAC from shred.c and use it instead.
Ultimately I hope this can be moved to glibc, so that making things
cryptographically secure doesn't require any extra effort, but that's
a separate issue.
- The second patch changes a constant ISAAC_LOG in isaac.h from 8 to
3. The code explicitly says that this can be done, so, going by that
I'm assuming that the change is safe. Theoretically this could make
"shred" slower but I haven't noticed any effect, if there is one it is
surely less than 1%. The reason for the change is that the code I
added to 'sort' has to copy the ISAAC state twice with every
comparison, so when the ISAAC state is too large it slows things down
a bit. The change speeds "sort -R" up by a factor of 14 on a 1e5-line
test file.
$ wc -l /usr/share/dict/words
96274 /usr/share/dict/words
$ time ./inst-log-3/bin/sort -R < /usr/share/dict/words >/dev/null
./inst-log-3/bin/sort -R < /usr/share/dict/words > /dev/null 1.00s user 0.01s
system 99% cpu 1.016 total
$ time ./inst-log-8/bin/sort -R < /usr/share/dict/words >/dev/null
./inst-log-8/bin/sort -R < /usr/share/dict/words > /dev/null 13.98s user 0.01s
system 99% cpu 14.010 total
Now the speed of random-sort is comparable to normal sort, it is only
about 50% slower (in my earlier patch it was about 5 times slower).
$ time ./inst-log-3/bin/sort < /usr/share/dict/words >/dev/null
./inst-log-3/bin/sort < /usr/share/dict/words > /dev/null 0.68s user 0.01s
system 99% cpu 0.695 total
- I recall some argument about randomness vs. pseudorandomness. I do
not mention pseudorandomness in the documentation, because I maintain
that it is not part of the interface. It is not mentioned in the
"shred" documentation either (only the word "random" is used), as I
believe is proper.
- I'm using "--random-sort" as the long option name, rather than
"--random" as in the previous versions, to be consistent with
"--numeric-sort", "--month-sort", etc.
Thanks to everyone who participated in the discussions regarding this
feature, and who critiqued the earlier versions.
Frederik
shred-isaac-split.patch
Description: Text document
sort-with-isaac.patch
Description: Text document
- sort --random-sort,
Frederik Eaton <=
- Re: sort --random-sort, Frederik Eaton, 2005/11/26
- Re: sort --random-sort, Jim Meyering, 2005/11/26
- Re: sort --random-sort, Andreas Schwab, 2005/11/26
- Re: sort --random-sort, Frederik Eaton, 2005/11/26
- Re: sort --random-sort, Jim Meyering, 2005/11/29
- Re: sort --random-sort, Paul Eggert, 2005/11/29
- Re: sort --random-sort, Frederik Eaton, 2005/11/30
- Re: sort --random-sort, Jim Meyering, 2005/11/30
- Re: sort --random-sort, Frederik Eaton, 2005/11/30
- Re: sort --random-sort, Paul Eggert, 2005/11/28