Reservoir Sampling (online)
Given a array A
read in as a data stream (one by one) and a sample size k
, return a sample of size k
in the data stream.
Example
Input: A = [3, 5, 11, 7], k = 2
Output: When read in A = [3, 5], samples are [3, 5]. Next when read in 11, samples could be [3, 5] or [3, 11] or [5, 11].
Solution
Idea:
Assume we already choose k samples from n data, now we read in the n+1 data. The new data has probability to be sampled. And the existing samples have the same probability to be removed.
Time complexity:
Space complexity:
# Assumption: there are at least k elements in the stream
def online_random_sample (A, k) :
# set up default values
sample = A[:k]
for i in range(k, len(A)) :
temp = random.randrange(k+1)
if temp < k:
sample[temp] = A[i]
return sample
Last updated
Was this helpful?