bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Makes sort create random order


From: Paul Jarc
Subject: Re: [PATCH] Makes sort create random order
Date: Thu, 02 Sep 2004 10:52:44 -0400
User-agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3 (gnu/linux)

Paul Eggert <address@hidden> wrote:
> Thomas Habets <address@hidden> writes:
>
>>>   sort: Add an ordering option -R that causes 'sort' to sort according
>>>     to a random permutation of the correct sort order.
>>
>> This means that two different files, that happen to sort to the same output,
>> should give the same output when randomized with the same SEED. Is that
>> right? [*]
>
> Sort of, but not quite.

I couldn't find the "not quite" part of your explanation.

>> Is there a good reason for wanting this?
>
> By "this" do you mean "a fairly-formal definition", or "this
> particular definition of random sorting"?  [...]  If the latter,
> then because we want sort -R to have the usual properties that
> people expect from "sort", e.g., "sort -rR" should output in the
> reverse order of "sort -R".

Nit: they shouldn't expect that unless they also specify a seed.  But
sort -R can still provide this just by permuting the original input
order, rather than the correct sort order.  If we have a file A, and
we do:
$ sort -R A > B
$ sort -R --seed=deadbeef A > A1
$ sort -R --seed=deadbeef A > A2
$ sort -R --seed=deadbeef B > B1
$ sort -R --seed=deadbeef B > B2

Then we should expect that A1 and A2 have the same contents, and that
B1 and B2 have the same contents.  But the TODO requirement would also
ensure that A1/A2 have the same contents as B1/B2.  Is that really
needed?

I'm also not sure that clustering lines with equivalent sort keys is
desirable.

>>>     if you sort a permutation of the same input file
>>>     with the same --random-seed=SEED option twice, you'll get the same
>>>     output. [**]
>>
>> Here however it does not explicitly say what I said above about two different
>> files.
>
> If two  files sort  to the same  output, then they're  permutations of
> each other.  So  [**] implies [*].  (The converse  does not hold.  See
> what I mean about the logic being tricky here?...)

No, I think [*] implies [**] only.  [*] is the more general case
placing a requirement on all permutations of the same input; [**] is
the special case where the two files are the same permutation of the
same input.


paul




reply via email to

[Prev in Thread] Current Thread [Next in Thread]