Sampling without replacement
updated:
Efraimidis-Spirakis algorithm
Reference:
Weighted random sampling with a reservoir - ScienceDirect
Algorithm
- Input: weight and the number of samples to draw
- Output: samples without duplicates
- Operation:
- Draw a uniform random variable for each item
- Compute a random variable
- Find the indices of the largest values.
- Return the indices
Implementation tips
Since can be arbitrarily large and cause overfloating, it would be preferred to transform by
The logarithmic transformation preserves the order . Thus, instead of using , we can use to find what indices to return.
Code