Reservoir Sampling (online)

Given a array A read in as a data stream (one by one) and a sample size k , return a sample of size k in the data stream.

Example

Input: A = [3, 5, 11, 7], k = 2
Output: When read in A = [3, 5], samples are [3, 5]. Next when read in 11, samples could be [3, 5] or [3, 11] or [5, 11].

Solution

Idea:

# Assumption: there are at least k elements in the stream
def online_random_sample (A, k) :
    # set up default values
    sample = A[:k]

    for i in range(k, len(A)) :
        temp = random.randrange(k+1)
        if temp < k:
            sample[temp] = A[i]

    return sample

Last updated