TAOCP 3.4.2 Exercise 4
Let $p(m,t)$ denote the probability that exactly $m$ items have been selected from the first $t$ items processed by Algorithm S.
Section 3.4.2: Random Sampling and Shuffling
Exercise 4. [M23] Let $p(m, t)$ be the probability that exactly $m$ items are selected from among the first $t$ in the selection sampling technique. Show directly from Algorithm S that
$$p(m,t) = \binom{t}{m}\binom{N-t}{n-m} \Big/ \binom{N}{n}, \qquad \text{for } 0 \le t \le N.$$
Verified: yes
Solve time: 4m33s
Solution
Let $p(m,t)$ denote the probability that exactly $m$ items have been selected from the first $t$ items processed by Algorithm S.
We shall prove by induction on $t$ that
$$ p(m,t)=\frac{\binom{t}{m}\binom{N-t}{,n-m,}}{\binom{N}{n}}, \qquad 0\le t\le N. $$
Basis
When $t=0$, no items have been examined. Hence
$$ p(0,0)=1,\qquad p(m,0)=0 \ \ (m>0). $$
The proposed formula gives
$$ \frac{\binom00\binom Nn}{\binom Nn}=1 $$
for $m=0$, and
$$ \frac{\binom0m\binom N{n-m}}{\binom Nn}=0 $$
for $m>0$. Thus the formula is correct when $t=0$.
Inductive step
Assume the formula holds for a fixed $t$, and consider item $t+1$.
To have exactly $m$ selected items among the first $t+1$ items, one of two mutually exclusive events must occur.
- Exactly $m$ items were selected among the first $t$, and item $t+1$ is not selected.
- Exactly $m-1$ items were selected among the first $t$, and item $t+1$ is selected.
If $m$ items have already been selected among the first $t$, Algorithm S selects the next item with probability
$$ \frac{n-m}{N-t}. $$
Hence the probability that item $t+1$ is not selected is
$$ 1-\frac{n-m}{N-t} = \frac{N-t-n+m}{N-t}. $$
Similarly, if $m-1$ items have already been selected, the probability that item $t+1$ is selected is
$$ \frac{n-(m-1)}{N-t} = \frac{n-m+1}{N-t}. $$
Therefore
$$ p(m,t+1) = p(m,t)\frac{N-t-n+m}{N-t} + p(m-1,t)\frac{n-m+1}{N-t}. $$
Substituting the induction hypothesis,
$$ \begin{aligned} p(m,t+1) &= \frac1{\binom Nn} \Biggl[ \binom tm\binom{N-t}{n-m} \frac{N-t-n+m}{N-t} \ &\qquad\qquad + \binom t{m-1}\binom{N-t}{n-m+1} \frac{n-m+1}{N-t} \Biggr]. \end{aligned} $$
We now simplify the two terms.
For the first term, let $a=N-t$ and $b=n-m$. Then
$$ \binom ab\frac{a-b}{a} = \frac{a!}{b!(a-b)!}\frac{a-b}{a} = \frac{(a-1)!}{b!(a-b-1)!} = \binom{a-1}{b}. $$
Hence
$$ \binom{N-t}{n-m} \frac{N-t-n+m}{N-t} = \binom{N-t-1}{n-m}. $$
For the second term, using
$$ \binom ab\frac{b}{a} = \frac{a!}{b!(a-b)!}\frac{b}{a} = \frac{(a-1)!}{(b-1)!(a-b)!} = \binom{a-1}{b-1}, $$
with $a=N-t$ and $b=n-m+1$, we obtain
$$ \binom{N-t}{n-m+1} \frac{n-m+1}{N-t} = \binom{N-t-1}{n-m}. $$
Therefore
$$ \begin{aligned} p(m,t+1) &= \frac1{\binom Nn} \Bigl[ \binom tm\binom{N-t-1}{n-m} + \binom t{m-1}\binom{N-t-1}{n-m} \Bigr] \ &= \frac{\binom{N-t-1}{n-m}}{\binom Nn} \Bigl[ \binom tm+\binom t{m-1} \Bigr]. \end{aligned} $$
Using Pascal's identity,
$$ \binom tm+\binom t{m-1} = \binom{t+1}{m}, $$
we get
$$ p(m,t+1) = \frac{\binom{t+1}{m}\binom{N-t-1}{n-m}} {\binom Nn}. $$
Since $N-(t+1)=N-t-1$, this is exactly
$$ p(m,t+1) = \frac{\binom{t+1}{m}\binom{N-(t+1)}{,n-m,}} {\binom Nn}. $$
Thus the formula holds for $t+1$.
By induction,
$$ \boxed{ p(m,t) = \frac{\binom{t}{m}\binom{N-t}{,n-m,}} {\binom{N}{n}} } $$
for all $0\le t\le N$. ∎