# reservoir sampling

(algorithm)

**Definition:**
Randomly select k items from a stream of items of unknown length. Save the first k items in an *array* of size k. For each item j, j > k, choose a random integer M from 1 to j (inclusive). If M ≤ k, replace item M of the array with item j.

**Generalization** (I am a kind of ...)

*randomized algorithm*.

*Note:
Each possible selection of k items has an equal probability of occurring. *

*
** This algorithm is an in-memory variant of Knuth's Algorithm 3.4.2R [Knuth97, 2:144]. He credits Alan G. Waterman. This algorithm is suggested in exercise 10.*

Author: PEB

## Implementation

(Perl).
## More information

Sahil's explanation of correctness and extension to distributed reservoir sampling.

Vitter's algorithms X, Y, and Z use far fewer random numbers by choosing how many items to skip, rather than deciding whether or not to skip each item.

**Jeffrey Scott Vitter**, *Random Sampling with a Reservoir*, ACM Transactions on Mathematical Software (TOMS), 11(1):37-57, March 1985.

Available on-line, for instance at http://www.cs.umd.edu/~samir/498/vitter.pdf.

Go to the
Dictionary of Algorithms and Data
Structures home page.

If you have suggestions, corrections, or comments, please get in touch
with Paul Black.

Entry modified 8 October 2010.

HTML page formatted Mon Nov 18 10:44:10 2013.

Cite this as:

Paul E. Black, "reservoir sampling", in
*Dictionary of Algorithms and Data Structures* [online], Vreda Pieterse and Paul E. Black, eds. 8 October 2010. (accessed TODAY)
Available from: http://www.nist.gov/dads/HTML/reservoirSampling.html