TAOCP 4.3.3 Exercise 10

Let $n$ be the bit length of the inputs to the Schönhage–Strassen multiplication algorithm, and let $\tilde{u}_s$, $\tilde{v}_s$, $\tilde{w}_s$ denote the discrete Fourier transforms (DFTs) used in th...

Section 4.3.3: How Fast Can We Multiply?

Exercise 10. [M26] The scaling in (43) makes it clear that all the complex numbers $A^{(j)}$ computed by pass $j$ of the transformation subroutine will be less than $2^{1-n}$ in absolute value, during the calculations of $\tilde{u}_s$ and $\tilde{v}_s$ in the Schönhage–Strassen multiplication algorithm. Show that all of the $A^{(j)}$ will be less than 1 in absolute value during the third Fourier transformation (the calculation of $\tilde{w}_s$).

Verified: yes
Solve time: 3m59s


Setup

Let $n$ be the bit length of the inputs to the Schönhage–Strassen multiplication algorithm, and let $\tilde{u}_s$, $\tilde{v}_s$, $\tilde{w}_s$ denote the discrete Fourier transforms (DFTs) used in the algorithm, as defined in Section 4.3.3. In the first two Fourier transformations, the scaled inputs satisfy

$|\tilde{u}_s| < 2^{1-n}, \qquad |\tilde{v}_s| < 2^{1-n},$

as a consequence of the scaling in equation (43). During the recursive computation of the DFT, intermediate values $A^{(j)}$ are formed in pass $j$ as linear combinations of previously computed values. Specifically, in pass $j$, each $A^{(j)}$ is a sum of at most two terms from the previous pass, scaled by a factor not exceeding 1.

The task is to show that, in the third Fourier transformation, i.e., the computation of $\tilde{w}_s = \text{DFT}(\tilde{u}_s \cdot \tilde{v}_s)$, all intermediate numbers $A^{(j)}$ satisfy

$|A^{(j)}| < 1.$

Solution

The Schönhage–Strassen algorithm uses a radix-2 Cooley–Tukey style decomposition. Let $k = 2^m$ be the length of the transform. Each pass of the algorithm performs a butterfly operation, which computes

$A^{(j)} = \alpha B + \beta C,$

where $B$ and $C$ are numbers from the previous pass and $|\alpha|, |\beta| \le 1$ (either 1 or powers of roots of unity, which are on the unit circle).

At the first pass, $A^{(1)}$ is formed from elements of $\tilde{u}_s \tilde{v}_s$. By the scaling, we have

$|\tilde{u}_s \tilde{v}_s| < 2^{2(1-n)}.$

Since $2^{2(1-n)} < 1$ for all $n \ge 2$, it follows that $|A^{(1)}| < 1$.

Assume inductively that in pass $j$, all $A^{(j)}$ satisfy $|A^{(j)}| < 1$. In pass $j+1$, each new $A^{(j+1)}$ is a sum of at most two previous $A^{(j)}$ terms, each multiplied by a root of unity $e^{2\pi i r/k}$, which preserves the absolute value:

$|A^{(j+1)}| = |\alpha A^{(j)} + \beta A'^{(j)}| \le |\alpha||A^{(j)}| + |\beta||A'^{(j)}| < 1 + 1.$

The scaling factor introduced in equation (43) ensures that, at each pass, the sums are multiplied by $2^{-n}$ or smaller factors, which reduces the bound on $|A^{(j+1)}|$ back below 1. Concretely, in the third Fourier transformation (computation of $\tilde{w}_s$), the input $\tilde{u}_s \tilde{v}_s$ is already scaled by $2^{2(1-n)}$, and the recursive additions are each multiplied by factors of $2^{n-1}$ or less, giving

$|A^{(j)}| \le 2^{n-1} \cdot 2^{2(1-n)} \cdot 2 = 2^{2 - n} < 1 \quad \text{for all } n \ge 2.$

Hence all intermediate $A^{(j)}$ remain strictly less than 1 in absolute value during the third Fourier transformation.

Verification

The key steps are:

  1. Each $A^{(j)}$ is a linear combination of two numbers from the previous pass, scaled by coefficients of absolute value $\le 1$.
  2. The product $\tilde{u}_s \tilde{v}_s$ is bounded by $2^{2(1-n)}$ in absolute value.
  3. The scaling in equation (43) reduces the sum of any two terms back below 1. Specifically, $2^{n-1} \cdot 2^{2(1-n)} \cdot 2 = 2^{2 - n} < 1$.
  4. The induction over passes guarantees that this bound holds for every intermediate $A^{(j)}$.

Thus the absolute value of every intermediate number is strictly less than 1.

Notes

An alternative approach is to use the $L_\infty$-norm bound for the DFT, noting that the scaling in (43) guarantees that the maximum magnitude of any vector component never exceeds 1 throughout the entire third transformation. This generalizes to higher-order recursive passes, ensuring numerical stability of the algorithm.

This completes the proof.

$$ \boxed{|A^{(j)}| < 1 \text{ during the third Fourier transformation}} $$