coreutils doc improvements for sort -R

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

coreutils doc improvements for sort -R

From:	Paul Eggert
Subject:	coreutils doc improvements for sort -R
Date:	Mon, 12 Dec 2005 15:22:08 -0800
User-agent:	Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

I installed this.  The most controversial part here, perhaps, is to
document how reproducible the results are across different platforms
when you use sort -R --seed=STRING.  The current documentation says
you can't rely on the results, which is true for the current
implementation (in theory, anyway: if people stick to common
architectures the results should be reproducible, right?), but perhaps
we'd like to fix and/or document this better.

One other issue is whether to document the internal limitations of
"sort -R".  Currently it uses an internal random state of 8192 bits,
which in practice is overkill but in theory is only enough to generate
a random permutation of a few hundred distinct keys: after that, the
permutation won't be completely random.  Also, I can't imagine people
specifying --seed=STRING where STRING contains 8192 bits of
information, at least not if we don't document what's going on....

If we don't document this issue, I suspect we'll get email from
pedants every now and then saying we aren't really sorting at random,
so it is probably worth addressing it.  Any volunteers?


2005-12-12  Paul Eggert  <address@hidden>

        * doc/coreutils.texi (sort invocation): Clarify explanation of
        --random-sort, and use a simpler example.

Index: doc/coreutils.texi
===================================================================
RCS file: /fetish/cu/doc/coreutils.texi,v
retrieving revision 1.299
retrieving revision 1.300
diff -p -u -r1.299 -r1.300
--- doc/coreutils.texi  10 Dec 2005 08:10:20 -0000      1.299
+++ doc/coreutils.texi  12 Dec 2005 22:42:16 -0000      1.300
@@ -3401,9 +3401,10 @@ appear earlier in the output instead of 
 @opindex -R
 @opindex --random-sort
 @cindex random sort
-
-Sort by random hash, i.e. perform a shuffle. This is done by hashing
-the input keys and sorting based on the results.
+Sort by hashing the input keys and then sorting the hash values.  This
+is much like a random shuffle of the inputs, except that keys with the
+same value sort together.  Normally the hash function is chosen at
+random, but this can be overridden with the @option{--seed} option.
 
 @end table
 
@@ -3538,10 +3539,16 @@ This option can be useful in conjunction
 reliably handle arbitrary file names (even those containing blanks
 or other special characters).
 
address@hidden --seed @var{tempdir}
address@hidden address@hidden
 @opindex --seed
address@hidden specify seed for random hash
-Specify a seed for the @option{--random-sort} option.
address@hidden seed for random hash
+Use data from @var{string} to choose the hash function used by the
address@hidden option.  This option can be used to reproduce
+results of earlier invocations of @command{sort} with
address@hidden  However, results are not necessarily
+reproducible across different @command{sort} implementations (e.g.,
address@hidden on little-endian versus big-endian architectures, or
+from one version of @command{sort} to the next).
 
 @end table
 
@@ -3716,7 +3723,7 @@ playlist in which albums are shuffled bu
 played in order.
 
 @example
-find . -maxdepth 2 -type f | sort -t / -k2,2R -k3,3
+ls */* | sort -t / -k 1,1R -k 2,2
 @end example
 
 @end itemize

[Prev in Thread]

Current Thread

[Next in Thread]

coreutils doc improvements for sort -R, Paul Eggert <=

Prev by Date: sort -R: a more-conservative approach
Next by Date: Re: sort -R: a more-conservative approach
Previous by thread: sort -R: a more-conservative approach
Next by thread: more stdbool portability fixes for coreutils
Index(es):
- Date
- Thread