bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new coreutil? shuffle - randomize file contents


From: Frederik Eaton
Subject: Re: new coreutil? shuffle - randomize file contents
Date: Thu, 2 Jun 2005 21:16:13 -0700
User-agent: Mutt/1.5.9i

> >Phil> Is it that the app must guarantee all lines of a
> >Phil> non-seekable stdin must have an equal chance of any sort order?
> >
> >See my comment to James above. I think one need not make this
> >guarantee, since only a tiny fraction of possible sort orders will be
> >able to be tried by the user. However, I think it would be true for my
> >proposal (and the one that James suggested to Davis) if one took the
> >random number generator or hash function to be variable or unspecified
> >during analysis of the algorithm.
> 
> Thinking further on this, I don't think it matters to the guts of sort
> whether the ultimate random key is based on position hash or PRNG.

No, it doesn't ... but I don't understand why you bring it up in this
context.

> >One possibility for an efficient random permutation of a large file is
> >a program which scans the file once and records the location of each
> >line in an index, then applies a uniformly distributed random
> >permutation to this line index by stepping through it in order and
> >swapping the current entry with an entry chosen at random from the set
> >of later entries, and finally steps through the index again and
> >dereferences each line entry and prints out the corresponding line in
> >the original file.
> 
> That approach is fine on seekable files, but the user may wish to
> shuffle from stdin. sort already knows how to do this.
> 
> Here's a concrete example of Paul's suggestion as I understand it:

I understand Paul's suggestion. I was throwing out the other algorithm
since Davis was looking for something more efficient.

Regards,

Frederik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]