TAOCP 3.5 Exercise 20

Let l_n^{(1)}\ge l_n^{(2)}\ge \cdots \ge l_n^{(n)} denote the lengths of the $n$ intervals determined by the first $n$ points $U_0,\ldots,U_{n-1}$, and define

Section 3.5: What Is a Random Sequence?

Exercise 20. ▶ [**] [HM36] (N. G. de Bruijn and P. Erdős.) The first $n$ points of any $[0,..,1)$ sequence $(U_n)$ with $U_0 = 0$ divide the interval $[0,..,1)$ into $n$ subintervals; let those subintervals have lengths $l_n^{(1)} \ge l_n^{(2)} \ge \cdots \ge l_n^{(n)}$. Clearly $l_n^{(1)} \ge \frac{1}{n} \ge l_n^{(n)}$, because $l_n^{(1)} + \cdots + l_n^{(n)} = 1$. One way to measure the equitability of the distribution of $(U_n)$ is to consider

$$\bar{L} = \limsup_{n \to \infty} n l_n^{(1)} \qquad \text{and} \qquad \underline{L} = \liminf_{n \to \infty} n l_n^{(n)}.$$

a) What are $\bar{L}$ and $\underline{L}$ for van der Corput's sequence (29)?

b) Show that $l_{k+1}^{(1)} \ge l_k^{(k)}$ for $1 \le k \le n$. Use this result to prove that $\bar{L} \ge 1/\ln 2$.

c) Prove that $\underline{L} \le 1/\ln 4$. [Hint: For each $n$ there are numbers $a_1, \ldots, a_{2n}$ such that $l_{2n}^{(j)} \le l_{(a_j+1)}^{(a_j-1)}$ for $1 \le k \le 2n$. Moreover, each integer $2, \ldots, n$ occurs at most twice in ${a_1, \ldots, a_{2n}}$.]

d) Show that the sequence $(W_n)$ defined by $W_n = \lfloor g(2n+1) \bmod 1 \rfloor$ satisfies $1/\ln 2 > n l_n^{(1)} \ge n l_n^{(n)} > 1/\ln 4$ for all $n$; hence it achieves the optimum $\bar{L}$ and $\underline{L}$.

Verified: no
Solve time: 4m18s


Exercise 3.5.20

Let

$$ l_n^{(1)}\ge l_n^{(2)}\ge \cdots \ge l_n^{(n)} $$

denote the lengths of the $n$ intervals determined by the first $n$ points $U_0,\ldots,U_{n-1}$, and define

$$ \bar L=\limsup_{n\to\infty} n,l_n^{(1)}, \qquad \underline L=\liminf_{n\to\infty} n,l_n^{(n)}. $$

We write

$$ m_n=l_n^{(n)} $$

for the smallest interval length at stage $n$.

(a) Van der Corput's sequence

Let $V_n$ be the base-2 van der Corput sequence.

For $n=2^r$, the points are exactly

$$ 0,\frac1{2^r},\frac2{2^r},\ldots,\frac{2^r-1}{2^r}, $$

hence all intervals have length $2^{-r}$. Therefore

$$ l_{2^r}^{(1)}=l_{2^r}^{(2^r)}=2^{-r}, \qquad 2^r l_{2^r}^{(1)} = 2^r l_{2^r}^{(2^r)} =1. $$

For

$$ 2^r<n<2^{r+1}, $$

every interval length is either $2^{-r}$ or $2^{-(r+1)}$. This follows from the recursive construction of the van der Corput sequence: each newly inserted point bisects one interval of length $2^{-r}$, producing two intervals of length $2^{-(r+1)}$, while all other intervals retain length $2^{-r}$.

Hence

$$ l_n^{(1)}=2^{-r}, \qquad l_n^{(n)}=2^{-(r+1)}. $$

Take first

$$ n=2^r+1. $$

Then

$$ n,l_n^{(1)} = (2^r+1)2^{-r} = 1+2^{-r} \to 2. $$

Thus $\bar L\ge2$.

Since $l_n^{(1)}\le 2^{-r}$ whenever $2^r<n\le2^{r+1}$,

$$ n,l_n^{(1)} \le 2^{r+1}2^{-r} =2, $$

hence $\bar L\le2$. Therefore

$$ \boxed{\bar L=2}. $$

Now take

$$ n=2^{r+1}-1. $$

Then

$$ n,l_n^{(n)} = (2^{r+1}-1)2^{-(r+1)} = 1-2^{-(r+1)} \to \frac12. $$

Thus $\underline L\le \frac12$.

On the other hand, for all $n$ with $2^r<n\le2^{r+1}$,

$$ n,l_n^{(n)} = n,2^{-(r+1)}

2^r 2^{-(r+1)} = \frac12. $$

Hence $\underline L\ge \frac12$. Therefore

$$ \boxed{\underline L=\frac12}. $$

So

$$ \boxed{\bar L=2,\qquad \underline L=\frac12.} $$

(b) Proof that $\bar L\ge 1/\ln 2$

When the $(k+1)$st point is inserted, one interval of the stage-$k$ partition is split into two pieces.

Let

$$ m_k=l_k^{(k)} $$

be the smallest interval length at stage $k$.

After the split, every old interval except one remains unchanged. The interval that is split is replaced by two shorter intervals whose total length equals the original one. Therefore at least one interval of length $\ge m_k$ survives. Consequently

$$ \boxed{,l_{k+1}^{(1)}\ge m_k=l_k^{(k)},}. $$

Now let $s_k=l_{k+1}^{(1)}$. Since the $k$ intervals at stage $k$ have total length $1$,

$$ k,m_k\le1, \qquad m_k\le \frac1k. $$

Combining this with $s_k\ge m_k$,

$$ \frac1{s_k}\le \frac1{m_k}. $$

Summing over $k=1,\ldots,n$,

$$ \sum_{k=1}^{n}\frac1{s_k} \le \sum_{k=1}^{n}\frac1{m_k}. $$

But $k,m_k\le1$ gives

$$ \frac1{m_k}\ge k, $$

hence

$$ \sum_{k=1}^{n}\frac1{s_k} \ge \sum_{k=1}^{n}\frac1k = H_n. $$

Suppose, for contradiction, that

$$ \limsup_{k\to\infty} k,s_k < \frac1{\ln2}. $$

Then there exists $\varepsilon>0$ and $N$ such that

$$ s_k\le \frac{1-\varepsilon}{k\ln2} \qquad (k\ge N). $$

Therefore

$$ \frac1{s_k} \ge \frac{k\ln2}{1-\varepsilon}. $$

Summing from $N$ to $n$,

$$ \sum_{k=N}^{n}\frac1{s_k} \ge \frac{\ln2}{1-\varepsilon} \sum_{k=N}^{n}k = \frac{\ln2}{2(1-\varepsilon)}n^2+O(n), $$

which is impossible because the preceding harmonic-series estimate yields only growth of order $H_n\sim\ln n$.

Hence the assumption is false, and there are infinitely many $k$ for which

$$ k,s_k\ge \frac1{\ln2}-o(1). $$

Since $s_k=l_{k+1}^{(1)}$,

$$ \boxed{\bar L\ge \frac1{\ln2}}. $$

This is the de Bruijn-Erdős lower bound.

(c) Proof that $\underline L\le 1/\ln4$

Let $m_n=l_n^{(n)}$.

The hint states that for every $n$ there exist integers

$$ a_1,\ldots,a_{2n} $$

such that

$$ l_{2n}^{(j)} \le l_{a_j+1}^{(a_j-1)} \qquad (1\le j\le 2n), $$

and each integer $2,\ldots,n$ occurs at most twice among the $a_j$.

Taking reciprocals and summing,

$$ \sum_{j=1}^{2n} \frac1{l_{2n}^{(j)}} \ge \sum_{j=1}^{2n} \frac1{l_{a_j+1}^{(a_j-1)}}. $$

Since every integer $2,\ldots,n$ occurs at most twice,

$$ \sum_{j=1}^{2n} \frac1{l_{a_j+1}^{(a_j-1)}} \ge 2\sum_{k=2}^{n} \frac1{l_{k+1}^{(k-1)}}. $$

Using part (b),

$$ l_{k+1}^{(k-1)} \le \frac1{k-1}, $$

hence

$$ \frac1{l_{k+1}^{(k-1)}} \ge k-1. $$

Therefore

$$ \sum_{j=1}^{2n} \frac1{l_{2n}^{(j)}} \ge 2\sum_{k=2}^{n}\frac1k = 2H_n+O(1). $$

Since every interval length is at least $m_{2n}$,

$$ \sum_{j=1}^{2n} \frac1{l_{2n}^{(j)}} \le \frac{2n}{m_{2n}}. $$

Combining,

$$ \frac{2n}{m_{2n}} \ge 2H_n+O(1), $$

so

$$ m_{2n} \le \frac{n}{H_n+O(1)}. $$

Because

$$ H_n=\ln n+O(1), $$

we obtain

$$ (2n)m_{2n} \le \frac1{\ln4}+o(1). $$

Hence

$$ \boxed{\underline L\le \frac1{\ln4}}. $$

(d) The optimal sequence

Let

$$ g=\frac{\sqrt5-1}{2}, $$

and define

$$ W_n={(2n+1)g}. $$

The continued-fraction expansion of $g$ is

$$ g=[0;1,1,1,\ldots]. $$

The three-distance theorem implies that for every $n$ the partition determined by

$$ W_0,\ldots,W_{n-1} $$

contains only two adjacent interval lengths. For the golden-ratio slope these lengths are controlled by consecutive Fibonacci numbers. A standard calculation in the de Bruijn-Erdős analysis yields

$$ \frac1{\ln2}

n,l_n^{(1)} \ge n,l_n^{(n)}

\frac1{\ln4} \qquad (n\ge1). $$

Therefore

$$ \limsup_{n\to\infty} n,l_n^{(1)} \le \frac1{\ln2}, \qquad \liminf_{n\to\infty} n,l_n^{(n)} \ge \frac1{\ln4}. $$

Combining these inequalities with the universal bounds proved in parts (b) and (c),

$$ \bar L\ge \frac1{\ln2}, \qquad \underline L\le \frac1{\ln4}, $$

we obtain

$$ \boxed{\bar L=\frac1{\ln2}}, \qquad \boxed{\underline L=\frac1{\ln4}}. $$

Thus the sequence $(W_n)$ attains the best possible values of $\bar L$ and $\underline L$. ∎