TAOCP 3.4.2 Exercise 3
There is no contradiction because the two probabilities refer to different events.
Section 3.4.2: Random Sampling and Shuffling
Exercise 3. ▶ [22] The $(t+1)$st item in Algorithm S is selected with probability $(n-m)/(N-t)$, not $n/N$, yet the text claims that the sample is unbiased; thus each item should be selected with the same probability. How can both of these statements be true?
Verified: yes
Solve time: 6m36s
Solution
There is no contradiction because the two probabilities refer to different events.
At step S5 of Algorithm S, suppose that $t$ records have been examined and $m$ of them have been selected. Since the final sample must contain exactly $n$ records, there remain $n-m$ selections to be made from the remaining $N-t$ records. Therefore the conditional probability that record $t+1$ is selected when it is encountered is
$$ \frac{n-m}{N-t}. $$
This quantity depends on the random variable $m$, so it is generally not equal to $n/N$.
The claim that the sample is unbiased concerns the unconditional probability that a given record belongs to the final sample. We shall show that this probability is $n/N$ for every record.
Let
$$ P_m=\Pr{\text{exactly }m\text{ selections among the first }t\text{ records}}. $$
Then
$$ \Pr{\text{record }t+1\text{ is selected}} =\sum_m P_m\frac{n-m}{N-t}. $$
Using linearity of expectation,
$$ \Pr{\text{record }t+1\text{ is selected}} =\frac{n-E[m]}{N-t}. $$
It remains to determine $E[m]$.
Let $I_j$ be the indicator variable of the event that record $j$ is selected, for $1\le j\le t$. Then
$$ m=\sum_{j=1}^{t} I_j, $$
hence
$$ E[m] =\sum_{j=1}^{t} E[I_j] =\sum_{j=1}^{t} \Pr{\text{record }j\text{ is selected}}. $$
Now prove by induction on the position of the record that every record has selection probability $n/N$.
For $t=0$, the first record is selected with probability
$$ \frac{n}{N}, $$
since $m=0$ at the beginning.
Assume that each of the first $t$ records is selected with probability $n/N$. Then
$$ E[m] =\sum_{j=1}^{t}\frac{n}{N} =\frac{nt}{N}. $$
Substituting into the previous formula gives
$$ \Pr{\text{record }t+1\text{ is selected}} =\frac{n-\frac{nt}{N}}{N-t} =\frac{n(N-t)}{N(N-t)} =\frac{n}{N}. $$
Thus record $t+1$ also has selection probability $n/N$. By induction, every record has probability $n/N$ of appearing in the final sample.
Therefore
$$ \frac{n-m}{N-t} $$
is a conditional probability that depends on the history of the algorithm, whereas
$$ \frac{n}{N} $$
is the unconditional probability that a given record belongs to the final sample. The former varies from run to run, but after averaging over all possible histories, every record has the same inclusion probability $n/N$.
Hence Algorithm S is unbiased even though the selection probability at a particular step is $(n-m)/(N-t)$.
$\square$