TAOCP 4.2.2 Exercise 12

Assume, without loss of generality, that $e_u \ge e_v+p$.

Section 4.2.2: Accuracy of Floating Point Arithmetic

Exercise 12. [M24] Carry out the proof of Theorem B and (46) when $|e_u - e_v| \ge p$.

Verified: yes
Solve time: 4m13s


Solution

Assume, without loss of generality, that $e_u \ge e_v+p$. Write the normalized floating-point numbers in Knuth's notation as

$$ u=.u_1u_2\cdots u_p\times b^{e_u}, \qquad v=.v_1v_2\cdots v_p\times b^{e_v}, $$

with $u_1\neq0$ and $v_1\neq0$.

Since $e_u-e_v\ge p$,

$$ v=.0\cdots0,v_1v_2\cdots v_p\times b^{e_u}, $$

where at least $p-1$ zeros appear after the radix point before the first nonzero digit of $v$.

Hence

$$ u+v = . u_1u_2\cdots u_p d_{p+1}d_{p+2}\cdots \times b^{e_u}, $$

where the first $p$ digits are exactly those of $u$. Indeed, the leading nonzero digit of $v$ occurs no earlier than the $(p+1)$st position of the significand relative to exponent $e_u$, so no carry can propagate into the first $p$ digits.

Now apply Algorithm 4.2.1N. The digit in position $p+1$ determines the rounding. Since

$$ 0\le d_{p+1}\le b-1, $$

rounding can change the retained $p$-digit significand by at most one unit in the last place. Therefore

$$ u\oplus v=u+\beta b^{,e_u-p}, \qquad \beta\in{-1,0,1}, $$

with the sign and value of $\beta$ determined by the rounding rule. This is exactly the statement asserted in Theorem B for the case $e_u-e_v\ge p$.

The case $e_v-e_u\ge p$ is symmetric.

To obtain (46), note that

$$ u+v = u\left(1+\frac vu\right), $$

and

$$ 0<\frac vu < b^{e_v-e_u+1} \le b^{1-p}. $$

Hence

$$ u+v=u(1+\theta), \qquad |\theta|<b^{1-p}. $$

Since Theorem B gives

$$ u\oplus v=u+\beta b^{e_u-p}, \qquad |\beta|\le1, $$

we have

$$ u\oplus v-u=\beta b^{e_u-p}. $$

Because $u\ge b^{e_u-1}$,

$$ \left|\frac{(u\oplus v)-u}{u}\right| \le \frac{b^{e_u-p}}{b^{e_u-1}} = b^{1-p}. $$

Therefore

$$ u\oplus v = u(1+\delta), \qquad |\delta|\le b^{1-p}, $$

which is the form of (46) in the case $|e_u-e_v|\ge p$.

Thus both Theorem B and equation (46) follow when the exponents differ by at least $p$. ∎