[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new coreutil? shuffle - randomize file contents
From: |
Frederik Eaton |
Subject: |
Re: new coreutil? shuffle - randomize file contents |
Date: |
Thu, 2 Jun 2005 21:16:13 -0700 |
User-agent: |
Mutt/1.5.9i |
> >Phil> Is it that the app must guarantee all lines of a
> >Phil> non-seekable stdin must have an equal chance of any sort order?
> >
> >See my comment to James above. I think one need not make this
> >guarantee, since only a tiny fraction of possible sort orders will be
> >able to be tried by the user. However, I think it would be true for my
> >proposal (and the one that James suggested to Davis) if one took the
> >random number generator or hash function to be variable or unspecified
> >during analysis of the algorithm.
>
> Thinking further on this, I don't think it matters to the guts of sort
> whether the ultimate random key is based on position hash or PRNG.
No, it doesn't ... but I don't understand why you bring it up in this
context.
> >One possibility for an efficient random permutation of a large file is
> >a program which scans the file once and records the location of each
> >line in an index, then applies a uniformly distributed random
> >permutation to this line index by stepping through it in order and
> >swapping the current entry with an entry chosen at random from the set
> >of later entries, and finally steps through the index again and
> >dereferences each line entry and prints out the corresponding line in
> >the original file.
>
> That approach is fine on seekable files, but the user may wish to
> shuffle from stdin. sort already knows how to do this.
>
> Here's a concrete example of Paul's suggestion as I understand it:
I understand Paul's suggestion. I was throwing out the other algorithm
since Davis was looking for something more efficient.
Regards,
Frederik
- Re: new coreutil? shuffle - randomize file contents, Frederik Eaton, 2005/06/01
- Re: new coreutil? shuffle - randomize file contents, James Youngman, 2005/06/02
- Re: new coreutil? shuffle - randomize file contents, Philip Rowlands, 2005/06/02
- Re: new coreutil? shuffle - randomize file contents, Frederik Eaton, 2005/06/02
- Re: new coreutil? shuffle - randomize file contents, Philip Rowlands, 2005/06/02
- Re: new coreutil? shuffle - randomize file contents,
Frederik Eaton <=
- Re: new coreutil? shuffle - randomize file contents, Paul Eggert, 2005/06/03
- Re: new coreutil? shuffle - randomize file contents, Frederik Eaton, 2005/06/03
- Re: new coreutil? shuffle - randomize file contents, Paul Eggert, 2005/06/03
- Re: new coreutil? shuffle - randomize file contents, Frederik Eaton, 2005/06/03
- Re: new coreutil? shuffle - randomize file contents, Paul Eggert, 2005/06/04
- Re: new coreutil? shuffle - randomize file contents, David Feuer, 2005/06/04
- Re: new coreutil? shuffle - randomize file contents, Frederik Eaton, 2005/06/04
- Re: new coreutil? shuffle - randomize file contents, James Youngman, 2005/06/03
- Re: new coreutil? shuffle - randomize file contents, Davis Houlton, 2005/06/03
- Re: new coreutil? shuffle - randomize file contents, Frederik Eaton, 2005/06/04