[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
coreutils doc improvements for sort -R
From: |
Paul Eggert |
Subject: |
coreutils doc improvements for sort -R |
Date: |
Mon, 12 Dec 2005 15:22:08 -0800 |
User-agent: |
Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) |
I installed this. The most controversial part here, perhaps, is to
document how reproducible the results are across different platforms
when you use sort -R --seed=STRING. The current documentation says
you can't rely on the results, which is true for the current
implementation (in theory, anyway: if people stick to common
architectures the results should be reproducible, right?), but perhaps
we'd like to fix and/or document this better.
One other issue is whether to document the internal limitations of
"sort -R". Currently it uses an internal random state of 8192 bits,
which in practice is overkill but in theory is only enough to generate
a random permutation of a few hundred distinct keys: after that, the
permutation won't be completely random. Also, I can't imagine people
specifying --seed=STRING where STRING contains 8192 bits of
information, at least not if we don't document what's going on....
If we don't document this issue, I suspect we'll get email from
pedants every now and then saying we aren't really sorting at random,
so it is probably worth addressing it. Any volunteers?
2005-12-12 Paul Eggert <address@hidden>
* doc/coreutils.texi (sort invocation): Clarify explanation of
--random-sort, and use a simpler example.
Index: doc/coreutils.texi
===================================================================
RCS file: /fetish/cu/doc/coreutils.texi,v
retrieving revision 1.299
retrieving revision 1.300
diff -p -u -r1.299 -r1.300
--- doc/coreutils.texi 10 Dec 2005 08:10:20 -0000 1.299
+++ doc/coreutils.texi 12 Dec 2005 22:42:16 -0000 1.300
@@ -3401,9 +3401,10 @@ appear earlier in the output instead of
@opindex -R
@opindex --random-sort
@cindex random sort
-
-Sort by random hash, i.e. perform a shuffle. This is done by hashing
-the input keys and sorting based on the results.
+Sort by hashing the input keys and then sorting the hash values. This
+is much like a random shuffle of the inputs, except that keys with the
+same value sort together. Normally the hash function is chosen at
+random, but this can be overridden with the @option{--seed} option.
@end table
@@ -3538,10 +3539,16 @@ This option can be useful in conjunction
reliably handle arbitrary file names (even those containing blanks
or other special characters).
address@hidden --seed @var{tempdir}
address@hidden address@hidden
@opindex --seed
address@hidden specify seed for random hash
-Specify a seed for the @option{--random-sort} option.
address@hidden seed for random hash
+Use data from @var{string} to choose the hash function used by the
address@hidden option. This option can be used to reproduce
+results of earlier invocations of @command{sort} with
address@hidden However, results are not necessarily
+reproducible across different @command{sort} implementations (e.g.,
address@hidden on little-endian versus big-endian architectures, or
+from one version of @command{sort} to the next).
@end table
@@ -3716,7 +3723,7 @@ playlist in which albums are shuffled bu
played in order.
@example
-find . -maxdepth 2 -type f | sort -t / -k2,2R -k3,3
+ls */* | sort -t / -k 1,1R -k 2,2
@end example
@end itemize
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- coreutils doc improvements for sort -R,
Paul Eggert <=