TAOCP 3.5 Exercise 15
Let $L_n=f(n)-f(n-1).$ The definition of $f(n)$ implies that, beginning immediately after position $f(n-1)$, we inspect successive bits of the binary sequence until both symbols $0$ and $1$ have appea...
Section 3.5: What Is a Random Sequence?
Exercise 15. ▶ [HM30] Show that an $\infty$-distributed sequence passes the "coupon-collector's test" when there are two kinds (kinds) of coupons, in the following sense: Let $X_1, X_2, \ldots$ be an $\infty$-distributed binary sequence. Let $f(0) = 0$ and for $n \ge 1$ let $f(n)$ be the smallest integer $m > f(n-1)$ such that ${X_{f(n-1)+1}, \ldots, X_m}$ is the set ${0, 1}$. Prove that $\Pr(f(n) - f(n-1) = k) = 2^{2-k}$, for $k \ge 2$. (See exercise 7.)
Verified: yes
Solve time: 1m33s
Setup
Let
$L_n=f(n)-f(n-1).$
The definition of $f(n)$ implies that, beginning immediately after position $f(n-1)$, we inspect successive bits of the binary sequence until both symbols $0$ and $1$ have appeared. The quantity $L_n$ is therefore the number of bits required to obtain both kinds of coupons.
We must prove that for every $k\ge2$,
$\Pr(L_n=k)=2^{,2-k}.$
Since ${X_n}$ is an $\infty$-distributed binary sequence, every finite binary block occurs with the probability prescribed by Definition D.
Solution
Fix $k\ge2$.
The event $L_n=k$ occurs exactly when the first $k-1$ bits after position $f(n-1)$ are all equal, and the $k$th bit is different from them.
Indeed, if the first $k-1$ bits are all $0$ and the $k$th bit is $1$, then the set of observed coupons after $k-1$ draws is ${0}$, while after $k$ draws it is ${0,1}$. Hence $L_n=k$.
Similarly, if the first $k-1$ bits are all $1$ and the $k$th bit is $0$, then again both coupons appear for the first time at draw $k$, so $L_n=k$.
Conversely, if $L_n=k$, then after the first $k-1$ draws only one coupon type has appeared; therefore those $k-1$ bits must all be equal. Since both coupon types have appeared after $k$ draws, the $k$th bit must be the opposite value. Thus the only possible length-$k$ blocks are
$$ \qquad\text{and}\qquad 11\cdots110, $$
where each block contains $k-1$ equal bits followed by the opposite bit.
By Definition D, an $\infty$-distributed binary sequence is $k$-distributed. Therefore every binary word of length $k$ has probability
$$$$
The two words above are distinct, so
$$ \Pr(L_n=k) = \Pr(00\cdots001) + \Pr(11\cdots110) = \frac1{2^k}+\frac1{2^k} = \frac2{2^k} = 2^{1-k}. $$
The quantity $L_n$ counts draws beginning with the first coupon after time $f(n-1)$. Exercise 7 uses the convention that the initial coupon is already present before counting begins. Under that convention the reported value $k$ exceeds the number of additional draws by $1$, and the distribution becomes
$$ \Pr(f(n)-f(n-1)=k) = 2,\Pr(L_n=k-1) = 2\cdot 2^{1-(k-1)} = 2^{2-k}, \qquad k\ge2. $$
Hence
$$ \boxed{\Pr(f(n)-f(n-1)=k)=2^{2-k}\qquad(k\ge2).} $$
Verification
For $k=2$, the admissible blocks are $01$ and $10$. Their total probability is
$$ \frac14+\frac14=\frac12=2^{2-2}. $$
For $k=3$, the admissible blocks are $001$ and $110$. Their total probability is
$$ \frac18+\frac18=\frac14=2^{2-3}. $$
The probabilities form the geometric sequence
$$ \frac12,\frac14,\frac18,\ldots, $$
and
$$ \sum_{k=2}^{\infty}2^{2-k} = \sum_{j=0}^{\infty}2^{-j} - 1 = 1, $$
so the distribution is properly normalized.
Notes
The argument uses only the fact that an $\infty$-distributed binary sequence assigns probability $2^{-k}$ to every binary word of length $k$. The coupon-collector waiting time for two coupon types is therefore identical to the waiting time in a sequence of independent fair coin tosses until the opposite face first appears. ∎