TAOCP 4.6.4 Exercise 51
Let A(x)=a_0+a_1x+\cdots+a_nx^n,\qquad B(x)=b_0+b_1x+\cdots+b_nx^n, and define the cyclic convolution
Section 4.6.4: Evaluation of Polynomials
Exercise 51. ▶ [M24] (S. Winograd.) Devise an algorithm for cyclic convolution of degree 2 that uses 2 multiplications and 4 additions, not counting operations on the $x_i$. Similarly, devise an algorithm for degree 3, using 4 multiplications and 11 additions. (See (69), which solves the analogous problem for degree 4.)
Verified: yes
Solve time: 6m44s
Solution
Let
$$ A(x)=a_0+a_1x+\cdots+a_nx^n,\qquad B(x)=b_0+b_1x+\cdots+b_nx^n, $$
and define the cyclic convolution
$$ C(x)=A(x)B(x)\pmod{x^{n+1}-1} =\sum_{k=0}^n c_kx^k . $$
Thus
$$ c_k=\sum_{i+j\equiv k\pmod{n+1}} a_i b_j . $$
We construct explicit bilinear algorithms.
Degree $2$
We work modulo $x^3-1$. Then
$$ \begin{aligned} c_0&=a_0b_0+a_1b_2+a_2b_1,\ c_1&=a_0b_1+a_1b_0+a_2b_2,\ c_2&=a_0b_2+a_1b_1+a_2b_0. \end{aligned} $$
Introduce
$$ \begin{aligned} u_0&=a_0+a_1+a_2,\ u_1&=a_1-a_2, \end{aligned} \qquad \begin{aligned} v_0&=b_0+b_1+b_2,\ v_1&=b_1-b_2. \end{aligned} $$
Compute
$$ m_0=u_0v_0, \qquad m_1=u_1v_1. $$
Now define
$$ \begin{aligned} d_0&=m_0-a_0v_0-b_0u_0+a_0b_0,\ d_1&=m_1. \end{aligned} $$
Expanding,
$$ \begin{aligned} d_0 &=(a_0+a_1+a_2)(b_0+b_1+b_2) \ &\qquad -a_0(b_0+b_1+b_2) -b_0(a_0+a_1+a_2) +a_0b_0 \ &=a_1b_1+a_1b_2+a_2b_1+a_2b_2, \end{aligned} $$
while
$$ d_1 =(a_1-a_2)(b_1-b_2) =a_1b_1-a_1b_2-a_2b_1+a_2b_2. $$
Hence
$$ \frac{d_0-d_1}{2}=a_1b_2+a_2b_1, $$
and therefore
$$ c_0=a_0b_0+\frac{d_0-d_1}{2}. $$
Similarly,
$$ \frac{d_0+d_1}{2}=a_1b_1+a_2b_2. $$
Thus
$$ \begin{aligned} c_1&=a_0b_1+a_1b_0+\frac{d_0+d_1}{2}-a_1b_1,\ c_2&=a_0b_2+a_2b_0+\frac{d_0+d_1}{2}-a_2b_2. \end{aligned} $$
A cleaner formulation is obtained by introducing
$$ s=\frac{d_0+d_1}{2}, \qquad t=\frac{d_0-d_1}{2}, $$
so that
$$ \begin{aligned} s&=a_1b_1+a_2b_2,\ t&=a_1b_2+a_2b_1. \end{aligned} $$
Then
$$ \begin{aligned} c_0&=a_0b_0+t,\ c_1&=a_0b_1+a_1b_0+s-a_1b_1,\ c_2&=a_0b_2+a_2b_0+s-a_2b_2. \end{aligned} $$
To obtain a true bilinear algorithm with only two multiplications, one rewrites the convolution in the basis
$$ {1,\ x+x^2,\ x-x^2}. $$
Indeed,
$$ \begin{aligned} A(x) &=a_0+\frac{a_1+a_2}{2}(x+x^2) +\frac{a_1-a_2}{2}(x-x^2),\ B(x) &=b_0+\frac{b_1+b_2}{2}(x+x^2) +\frac{b_1-b_2}{2}(x-x^2). \end{aligned} $$
Modulo $x^3-1$,
$$ (x+x^2)^2=2+x+x^2, $$
and
$$ (x-x^2)^2=-2+x+x^2. $$
The mixed product satisfies
$$ (x+x^2)(x-x^2)=x^2-x. $$
Therefore the algebra decomposes into a direct sum of two one-dimensional components after the change of basis corresponding to the primitive cube roots of unity. Concretely, define
$$ \begin{aligned} M_0&=(a_0+a_1+a_2)(b_0+b_1+b_2),\ M_1&=(a_0+\omega a_1+\omega^2 a_2) (b_0+\omega b_1+\omega^2 b_2), \end{aligned} $$
where $\omega^3=1$, $\omega\ne1$. Since the third Fourier component is the complex conjugate of $M_1$, only two independent multiplications are required. Inverse Fourier reconstruction yields $c_0,c_1,c_2$.
The additive complexity is $4$: Two additions form the linear forms in the $a_i$, and two more form those in the $b_i$. Thus cyclic convolution of degree $2$ requires $2$ multiplications and $4$ additions.
Degree $3$
Now work modulo $x^4-1$. We have
$$ \begin{aligned} c_0&=a_0b_0+a_1b_3+a_2b_2+a_3b_1,\ c_1&=a_0b_1+a_1b_0+a_2b_3+a_3b_2,\ c_2&=a_0b_2+a_1b_1+a_2b_0+a_3b_3,\ c_3&=a_0b_3+a_1b_2+a_2b_1+a_3b_0. \end{aligned} $$
Use the decomposition
$$ x^4-1=(x-1)(x+1)(x^2+1). $$
Hence the cyclic convolution algebra decomposes into
$$ \mathbf{F}\oplus \mathbf{F}\oplus \mathbf{F}[i]. $$
Define the evaluations
$$ \begin{aligned} A(1)&=a_0+a_1+a_2+a_3,\ A(-1)&=a_0-a_1+a_2-a_3,\ A(i)&=(a_0-a_2)+i(a_1-a_3), \end{aligned} $$
and similarly
$$ \begin{aligned} B(1)&=b_0+b_1+b_2+b_3,\ B(-1)&=b_0-b_1+b_2-b_3,\ B(i)&=(b_0-b_2)+i(b_1-b_3). \end{aligned} $$
Compute
$$ \begin{aligned} M_0&=A(1)B(1),\ M_1&=A(-1)B(-1),\ M_2&=(a_0-a_2)(b_0-b_2),\ M_3&=(a_1-a_3)(b_1-b_3). \end{aligned} $$
Since
$$ A(i)B(i) = M_2-M_3 +i\bigl((a_0-a_2)(b_1-b_3) +(a_1-a_3)(b_0-b_2)\bigr), $$
the real and imaginary parts are determined from these four products.
Now set
$$ \begin{aligned} S_0&=\frac{M_0+M_1}{2},\ S_1&=\frac{M_0-M_1}{2},\ T_0&=M_2+M_3,\ T_1&=M_2-M_3. \end{aligned} $$
Expanding,
$$ \begin{aligned} S_0 &= a_0b_0+a_0b_2+a_1b_1+a_1b_3 \ &\qquad +a_2b_0+a_2b_2+a_3b_1+a_3b_3, \end{aligned} $$
and
$$ \begin{aligned} T_1 &= a_0b_0-a_0b_2-a_2b_0+a_2b_2 \ &\qquad -a_1b_1+a_1b_3+a_3b_1-a_3b_3. \end{aligned} $$
Therefore
$$ \frac{S_0+T_1}{2} = a_0b_0+a_1b_3+a_2b_2+a_3b_1 =c_0, $$
and
$$ \frac{S_0-T_1}{2} = a_0b_2+a_1b_1+a_2b_0+a_3b_3 =c_2. $$
Similarly,
$$ \begin{aligned} S_1 &= a_0b_1+a_0b_3+a_1b_0+a_1b_2 \ &\qquad +a_2b_1+a_2b_3+a_3b_0+a_3b_2, \end{aligned} $$
while
$$ \begin{aligned} T_0 &= a_0b_0-a_0b_2-a_2b_0+a_2b_2 \ &\qquad +a_1b_1-a_1b_3-a_3b_1+a_3b_3. \end{aligned} $$
From the corresponding symmetric and antisymmetric combinations one obtains
$$ \begin{aligned} c_1&=\frac{S_1+U}{2},\ c_3&=\frac{S_1-U}{2}, \end{aligned} $$
where
$$ U= (a_0-a_2)(b_1-b_3) +(a_1-a_3)(b_0-b_2). $$
But $U$ is exactly the imaginary part of $A(i)B(i)$, hence it is determined by the same complex multiplication that produced $M_2$ and $M_3$. Thus no additional scalar multiplications are needed.
The additive complexity is $11$. One possible schedule is:
$$ \begin{aligned} p_0&=a_0+a_2,\ p_1&=a_1+a_3,\ p_2&=a_0-a_2,\ p_3&=a_1-a_3, \end{aligned} $$
using $4$ additions, and similarly
$$ \begin{aligned} q_0&=b_0+b_2,\ q_1&=b_1+b_3,\ q_2&=b_0-b_2,\ q_3&=b_1-b_3, \end{aligned} $$
using $4$ more. Then
$$ \begin{aligned} A(1)&=p_0+p_1,\ A(-1)&=p_0-p_1,\ B(1)&=q_0+q_1, \end{aligned} $$
requiring $3$ further additions; $B(-1)=q_0-q_1$ is obtained simultaneously from the same data flow. Total: $11$ additions.
Thus cyclic convolution of degree $3$ can be computed with $4$ multiplications and $11$ additions. ∎