Algorithmics / random stuff

From Helpful
Jump to navigation Jump to search


Random choice

Random subset

Shuffling

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

You'ld think this isn't very hard, but the first thing you'll probably think of, the random-swaps approach, is demonstraby not uniform shuffling. In that all possible outputs do not appear equally. This is okay for some uses, but not for others. Since the better variants is just as fast, at least be aware of it.


Random swaps

for i in 1..length
  swap two random items

Pro:

  • simple code to write

Con:

  • not a uniform shuffling - roughly because his demonstrably isn't enough swaps. The number at which it becomes decent depends on the length (apparently n lg n?(verify))

Sort by random

Attach a random number to each entry, then sort by that

Pro:

  • uniform shuffling (assuming no duplicate values in what we generate)

Con

  • complexity is that of the sort, often O(n lg n)(verify)


Fisher-Yates / Knuth shuffle

There is an original pen-and-paper version which strikes through numbers. The modern computer variant puts them on the end of the list instead - easier bookkeeping.


The modern variant comes down to:

for i in 0..length-1
  swap(i, rand_inclusive(0,i))

This can be done from both directions, this is equivalent.


Pro:

  • uniform
  • linear time

Con:

  • Assuming you're using a PRNG, there is a sometimes-viable attack that could predict the sorted array
related to seeding, and the amount of possible orderings versus the repeat interval.
this only really matters when security or gambling are involved

Notes:

  • common bug: random range needs to be inclusive of the one you're at. This can be thrown off by the implementation of your random function (so check its docs) and even the way you use the value (e.g. consider that int(rand()*length) effectively hides a floor(). And floating point details.)
  • common bug: choosing from entire array - is actually the naive random-swaps variant, and works worse (can be explained semi-intuitively)


Notes

Bonus: physical shuffling https://www.youtube.com/watch?v=AxJubaijQbI