[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#72445: shuf with both input-range and head-count biased
From: |
Daniel Carpenter |
Subject: |
bug#72445: shuf with both input-range and head-count biased |
Date: |
Sat, 3 Aug 2024 10:19:09 +0200 |
The above options allow me to use shuf to efficiently simulate a dice roll,
but there is a clear bias when I do so, for example:
$ for i in {1..10000}; do shuf --input-range=1-6 --head-count=1; done |
sort | uniq --count
1730 1
1411 2
1882 3
1809 4
1520 5
1648 6
Using seq instead of input-range does not appear biased:
$ for i in {1..10000}; do seq 6 | shuf --head-count=1; done | sort | uniq
--count
1652 1
1696 2
1674 3
1638 4
1713 5
1627 6
Same for head:
$ for i in {1..10000}; do shuf --input-range=1-6 | head --lines=1; done |
sort | uniq --count
1639 1
1674 2
1655 3
1669 4
1688 5
1675 6
It seems that somehow combining both options affects the distribution. I
assume there's some performance optimization in that case since shuf
doesn't need to permute the entire input range.
- bug#72445: shuf with both input-range and head-count biased,
Daniel Carpenter <=