How To Multiply: 5.5 Integer Multiplication

Complex Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi. Grade-school. x = ac - bd, y = bc + ad.

4 multiplications, 2 additions
How to Multiply
integers, matrices, and polynomials
Q. Is it possible to do with fewer multiplications?
Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Complex Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi. Grade-school. x = ac - bd, y = bc + ad.
5.5 Integer Multiplication
Q. Is it possible to do with fewer multiplications? A. Yes. [Gauss] x = ac - bd, y = (a + b) (c + d) - ac - bd.

Remark. Improvement if no hardware multiply.
Integer Addition
Addition. Given two n-bit integers a and b, compute a + b. Grade-school. (n) bit operations.
Integer Multiplication
Multiplication. Given two n-bit integers a and b, compute a b. Grade-school. (n2) bit operations.
1 1 0 1 0 1 0 1
1 1
1 1 1 1
1 0 1 0
1 1 1 1
1 0 1 0
0 1 1 0
1 0 0 1 1 1 0
0 1 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0
+ 1
0 0
Remark. Grade-school addition algorithm is optimal.
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
Q. Is grade-school multiplication algorithm optimal?

5 6
Divide-and-Conquer Multiplication: Warmup

To multiply two n-bit integers a and b: Multiply four n-bit integers, recursively. Add and shift to obtain result.

Recursion Tree
" 0 if n = 0 T (n) = # $ 4T (n /2) + n otherwise
lg n $ 21+ lg n #1 ' 2 T (n) = " n 2 k = n & ) = 2n # n % 2 #1 ( k=0
! T(n)
a = 2 " a1 + a0 b = 2 n / 2 " b1 + b0 a b = 2 n / 2 " a1 + a0 2 n / 2 " b1 + b0 = 2 n " a1b1 + 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0
n /2
)(
T(n/2)
T(n/2)
T(n/2)
T(n/2)
4(n/2)
Ex. !
a = 10001101
a1 a0
b = 11100001
b1 b0
T(n/4) T(n/4)
T(n/4)
T(n/4)
... ... T(n / 2k) ...
T(n/4)
T(n/4) T(n/4)
T(n/4)
16(n/4) ... 4k (n / 2k) ...
T (n) = 4T (n /2) + "(n) # T (n) = "(n 2 ) 123 1 24 4 3

recursive calls add, shift
T(2)
T(2)
T(2)
T(2)
...
T(2)
T(2)
T(2)
T(2)
lg n
(1)
8
Karatsuba Multiplication
To multiply two n-bit integers a and b: Add two n bit integers. Multiply three n-bit integers, recursively. Add, subtract, and shift to obtain result.

Karatsuba Multiplication
To multiply two n-bit integers a and b: Add two n bit integers. Multiply three n-bit integers, recursively. Add, subtract, and shift to obtain result.

a = 2 n / 2 " a1 b = 2 n / 2 " b1 ab = 2 n " a1b1 = 2 n " a1b1

1
+ + + +
a0 b0 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0

2 1 3 3
a = 2 n / 2 " a1 b = 2 n / 2 " b1 ab = 2 n " a1b1 = 2 n " a1b1

1
+ + + +
a0 b0 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0

2 1 3 3
Theorem. [Karatsuba-Ofman 1962] Can multiply two n-bit integers in O(n1.585) bit operations.
T (n) " T ( #n /2$ ) + T ( %n /2& ) + T ( 1+ %n /2& ) + '(n) ( T (n) = O(n lg 3 ) = O(n1.585 ) 1 24 4 3 1444444 24444444 4 3
recursive calls add, subtract, shift
10
Karatsuba: Recursion Tree

" 0 if n = 0 T (n) = # $ 3T (n /2) + n otherwise
lg n k
Fast Integer Division Too (!)

Integer division. Given two n-bit (or less) integers s and t, compute quotient q = s / t and remainder r = s mod t.
T (n) = " n
k=0
3 (2)
$ 3 1+ lg n #1 ' () ) = 3n lg 3 # 2n = n& 2 3 & ) % 2 #1 (
T(n) !
T(n/2)
T(n/2)
T(n/2)
3(n/2)
Fact. Complexity of integer division is same as integer multiplication. To compute quotient q: 2 Approximate x = 1 / t using Newton's method: xi+1 = 2xi " t xi

After log n iterations, either q = s x or q = s x.

!
using fast multiplication
T(n/4) T(n/4)
T(n/4)
T(n/4) T(n/4) ... T(n / 2k) ...
T(n/4)
T(n/4) T(n/4)
T(n/4)
9(n/4) ... 3k (n / 2k) ...
T(2)
T(2)
T(2)
T(2)
...
T(2)
T(2)
T(2)
T(2)
lg n
(1)
11 12
Dot Product
Matrix Multiplication
Dot product. Given two length n vectors a and b, compute c = a b. Grade-school. (n) arithmetic operations.
a"b =
# ai bi
i=1
a = [ .70 .20 .10 ] b = [ .30 .40 .30 ]
a " b = (.70 # .30) + (.20 # .40) + (.10 # .30) = .32
Remark. Grade-school dot product algorithm is optimal.

14
Matrix Multiplication
Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. Grade-school. (n3) arithmetic operations.
cij =
"c11 $ $c21 $M $c # n1 "a11 c12 L c1n % ' $ c22 L c2n ' a = $ 21 $M M O M' $a cn2 L cnn ' & # n1 "b11 b12 a12 L a1n % ' $ a 22 L a 2n ' b b ( $ 21 22 $M M O M ' M $b b a n2 L a nn ' & # n1 n2 L b1n % ' L b2n ' O M' L bnn ' &
Block Matrix Multiplication
" aik bkj

k=1
C11
A11
A12
B11
" 152 158 164 170 % " 0 1 2 3 % "16 $ ' $ ' $ 504 526 548 570 ' 4 5 6 7 ' $20 $ = $ ( $ 856 894 932 970 ' $ 8 9 10 11' $24 $1208 1262 1316 1370' $12 13 14 15' $28 # & # & #
17 18 19 % ' 21 22 23' 25 26 27' 29 30 31' &
!
".59 .32 .41% $ ' $.31 .36 .25' $ #.45 .31 .42' & = ".70 .20 .10% " .80 .30 .50% $ ' $ ' .30 .60 .10' ( $ .10 .40 .10' $ $ $ ' #.50 .10 .40' & # .10 .30 .40&
B11
#0 1& #16 17& #2 3& #24 25& #152 158 & C 11 = A11 " B11 + A12 " B21 = % (" % ( + % (" % ( = % ( $4 5' $20 21' $6 7' $28 29' $504 526'
Q. Is grade-school matrix multiplication algorithm optimal?

!
15 16
Matrix Multiplication: Warmup

To multiply two n-by-n matrices A and B: Divide: partition A and B into n-by-n blocks. Conquer: multiply 8 pairs of n-by-n matrices, recursively. Combine: add appropriate products using 4 matrix additions.

Fast Matrix Multiplication

Key idea. multiply 2-by-2 blocks with only 7 multiplications.
"C11 $ #C21 " A11 C12 % ' = $ C22 & # A21 A12 % " B11 ' ( $ A22 & # B21 B12 % ' B22 &
P 1 P2 P3 P4 P5 P6 P7
= A11 " ( B12 # B22 ) = ( A11 + A12 ) " B22 = ( A21 + A22 ) " B11 = A22 " ( B21 # B11 ) = ( A11 + A22 ) " ( B11 + B22 ) = ( A12 # A22 ) " ( B21 + B22 ) = ( A11 # A21 ) " ( B11 + B12 )
"C11 C12 % " A11 $ ' = $ #C21 C22 & # A21
A12 % " B11 '( $ A22 & # B21
B12 % ' B22 &
C11 C12 C21 C22
= = = =
( A11 " B11 ) + ( A12 " B21 ) ( A11 " B12 ) + ( A12 " B22 ) ( A21 " B11 ) + ( A22 " B21 ) ( A21 " B12 ) + ( A22 " B22 )
C11 C12 C21 C22
= = = =
P5 + P4 " P2 + P6 P + P2 1 P3 + P4 P5 + P " P3 " P7 1
!
T (n) = 8T (n /2 ) + "(n 2 ) # T (n) = "(n 3 ) 144 44 2 3 1 24 4 3
recursive calls add, form submatrices
7 ! ! multiplications. 18 = 8 + 10 additions and subtractions.
!
17 18
Fast Matrix Multiplication

To multiply two n-by-n matrices A and B: [Strassen 1969] Divide: partition A and B into n-by-n blocks. Compute: 14 n-by-n matrices via 10 matrix additions. Conquer: multiply 7 pairs of n-by-n matrices, recursively. Combine: 7 products into 4 terms using 8 matrix additions.

Fast Matrix Multiplication: Practice

Implementation issues. Sparsity. Caching effects. Numerical stability. Odd matrix dimensions. Crossover to classical algorithm around n = 128.

Analysis. Assume n is a power of 2. T(n) = # arithmetic operations.

Common misperception. Strassen is only a theoretical curiosity. Apple reports 8x speedup on G4 Velocity Engine when n 2,500. Range of instances where it's useful is a subject of controversy.

T (n) = 7T (n /2 ) + "(n 2 ) # T (n) = "(n log2 7 ) = O(n 2.81 ) 2 3 1 24 14 4 4 3

recursive calls add, subtract
Remark. Can "Strassenize" Ax = b, determinant, eigenvalues, SVD, .
19
20
Fast Matrix Multiplication: Theory

Q. Multiply two 2-by-2 matrices with 7 scalar multiplications? A. Yes! [Strassen 1969] "(n log2 7 ) = O(n 2.807 ) Q. Multiply two 2-by-2 matrices with 6 scalar multiplications? ! A. Impossible. [Hopcroft and Kerr 1971] log 2 6
"(n
Fast Matrix Multiplication: Theory
) = O(n 2.59 )
Q. Two 3-by-3 matrices with 21 scalar multiplications? A. Also impossible. ! " (n log3 21 ) = O(n 2.77 ) Begun, the decimal wars have. [Pan, Bini et al, Schnhage, ] ! Two 20-by-20 matrices with 4,460 scalar multiplications. Two 48-by-48 matrices with 47,217 scalar multiplications. A year later. ! December, 1979. ! January, 1980.

O(n 2.805 )
Best known. O(n2.376) [Coppersmith-Winograd, 1987] Conjecture. O(n2+) for any > 0. Caveat. Theoretical improvements to Strassen are progressively less practical.
21 22
O(n
2.7801
O(n 2.7799 )
O(n 2.521813)
O(n 2.521801)
!
!
Fourier Analysis
5.6 Convolution and FFT
Fourier theorem. [Fourier, Dirichlet, Riemann] Any periodic function can be expressed as the sum of a series of sinusoids. sufficiently smooth
y(t) =
2 N sin kt # " k=1 k
N=1 100 10 5
!
24
Euler's Identity
Sinusoids. Sum of sine an cosines.
Time Domain vs. Frequency Domain

Signal. [touch tone button 1]
y(t) =
1 2
sin(2" # 697 t) +
1 2
sin(2" # 1209 t)
Time domain.
!
sound pressure
eix = cos x + i sin x

Euler's identity
time (seconds)
Sinusoids. Sum of complex exponentials.
Frequency domain.
0.5 amplitude
frequency (Hz)
Reference: Cleve Moler, Numerical Computing with MATLAB

25 26
Time Domain vs. Frequency Domain

Signal. [recording, 8192 samples per second]
Fast Fourier Transform

FFT. Fast way to convert between time-domain and frequency-domain. Alternate viewpoint. Fast way to multiply and evaluate polynomials.
we take this approach
Magnitude of discrete Fourier transform.
If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. -Numerical Recipes
Reference: Cleve Moler, Numerical Computing with MATLAB

27 28
Fast Fourier Transform: Applications

Applications. Optics, acoustics, quantum physics, telecommunications, radar, control systems, signal processing, speech recognition, data compression, image processing, seismology, mass spectrometry Digital media. [DVD, JPEG, MP3, H.264] Medical diagnostics. [MRI, CT, PET scans, ultrasound] Numerical solutions to Poisson's equation. Shor's quantum factoring algorithm.

Fast Fourier Transform: Brief History

Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres. Runge-Knig (1924). Laid theoretical groundwork. Danielson-Lanczos (1942). Efficient algorithm, x-ray crystallography. Cooley-Tukey (1965). Monitoring nuclear tests in Soviet Union and tracking submarines. Rediscovered and popularized FFT.
The FFT is one of the truly great computational developments of [the 20th] century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. -Charles van Loan
Importance not fully realized until advent of digital computers.
29
30
Polynomials: Coefficient Representation

Polynomial. [coefficient representation]
A(x) = a0 + a1x + a2 x 2 +L+ an"1x n"1 B(x) = b0 + b1x + b2 x +L+ bn"1x
2 n"1
A Modest PhD Dissertation Title
"New Proof of the Theorem That Every Algebraic Rational Integral Function In One Variable can be Resolved into Real Factors of the First or the Second Degree." - PhD dissertation, 1799 the University of Helmstedt
! Add. O(n) arithmetic operations. ! A(x) + B(x) = (a0 + b0 ) + (a1 + b1 )x +L+ (an"1 + bn"1 )x n"1
Evaluate. O(n) using Horner's method. ! A(x) = a0 + (x (a1 + x (a2 +L+ x (an"2 + x (an"1 ))L)) Multiply (convolve). O(n2) using brute force. !
A(x) " B(x) =
2n#2 i =0
$ ci x i , where ci = $ a j bi# j
j =0
31 32
Polynomials: Point-Value Representation

Fundamental theorem of algebra. [Gauss, PhD thesis] A degree n polynomial with complex coefficients has exactly n complex roots. Corollary. A degree n-1 polynomial A(x) is uniquely specified by its evaluation at n distinct values of x.
Polynomials: Point-Value Representation

Polynomial. [point-value representation]
A(x) : (x 0 , y0 ), K, (x n-1 , yn"1 ) B(x) : (x 0 , z0 ), K, (x n-1 , zn"1 )
Add. O(n) arithmetic operations.

!
A(x) + B(x) : (x 0 , y0 + z0 ), K, (x n-1 , yn"1 + zn"1 )
Multiply (convolve). O(n), but need 2n-1 points.

! A(x) " B(x) : (x 0 , y0 " z0 ), K, (x 2n-1 , y2n#1 " z2n#1 )
yj = A(xj )
Evaluate. O(n2) using Lagrange's formula.

!
n"1
$ (x " x j )
j#k
xj
x
33
A(x) = % yk
k=0
$ (xk " x j )
j#k
34
Converting Between Two Polynomial Representations

Tradeoff. Fast evaluation or fast multiplication. We want both!
Converting Between Two Representations: Brute Force

Coefficient point-value. Given a polynomial a0 + a1 x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1.
representation coefficient point-value
multiply O(n2) O(n)
evaluate O(n) O(n2)
Goal. Efficient conversion between two representations all ops fast.

!
# % % % % % % $
y0 y1 y2 M yn"1
#1 x & 0 % ( % 1 x1 ( ( = % 1 x2 % ( M %M ( ( % 1 xn"1 ' $
2 x0 2 x1 2 x2 M 2 xn"1
n"1 L x0 & ( L x1n"1 ( n"1 L x2 ( ( O M ( n"1 L xn"1 ( '
# % % % % % % $
a0 a1 a2 M an"1
& ( ( ( ( ( ( '
a0 , a1 , ..., an-1
coefficient representation
(x0 , y0 ), K, (xn"1 , yn"1 )

point-value representation
Running time. O(n2) for matrix-vector multiply (or n Horner's).
!
35 36
Converting Between Two Representations: Brute Force

Point-value coefficient. Given n distinct points x0, ... , xn-1 and values y0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has given values at given points.
# % % % % % % $ y0 y1 y2 M yn"1 #1 x & 0 % ( % 1 x1 ( ( = % 1 x2 % ( M %M ( ( % 1 xn"1 ' $
2 x0 2 x1 2 x2
Divide-and-Conquer
Decimation in frequency. Break up polynomial into low and high powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Alow(x) = a0 + a1x + a2x2 + a3x3. Ahigh (x) = a4 + a5x + a6x2 + a7x3. A(x) = Alow(x) + x4 Ahigh(x).

M
2 xn"1
L L L O M n"1 L xn"1
n"1 x0 x1n"1 n"1 x2
& ( ( ( ( ( ( '
# % % % % % % $
a0 a1 a2 M an"1
& ( ( ( ( ( ( '
Vandermonde matrix is invertible iff xi distinct
Decimation in time. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2).

Running time. O(n3) for Gaussian elimination.

or O(n2.376) via fast matrix multiplication
37
38
Coefficient to Point-Value Representation: Intuition

Coefficient point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1.
we get to choose which ones!
Coefficient to Point-Value Representation: Intuition

Coefficient point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1.
we get to choose which ones!
Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).

Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).

Intuition. Choose two points to be 1. A( 1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree n

at 2 points by evaluating two polynomials of degree n at 1 point.
Intuition. Choose four complex points to be 1, i. A(1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree n at 4 points by evaluating two polynomials A( i ) = Aeven(-1) + i Aodd(-1). of degree n at 2 points. A( -i ) = Aeven(-1) - i Aodd(-1).

39
40
Discrete Fourier Transform

Coefficient point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1. Key idea. Choose xk = k where is principal nth root of unity.
Roots of Unity
Def. An nth root of unity is a complex number x such that xn = 1. Fact. The nth roots of unity are: 0, 1, , n-1 where = e 2 i / n. Pf. (k)n = (e 2 i k / n) n = (e i ) 2k = (-1) 2k = 1. Fact. The nth roots of unity are: 0, 1, , n/2-1 where = 2 = e 4 i / n.
# y0 & % ( % y1 ( % y2 ( % ( = % y3 ( % M ( % ( $yn"1'
DFT
#1 1 1 % 1 )1 )2 % % 1 )2 )4 % 3 )6 %1 ) %M M M % 1 ) n"1 )2(n"1) $
1 )3 )6 )9 M ) 3(n"1)
L L L L O L
& ( ( )2(n"1) ( ( ) 3(n"1) ( ( M (n"1)(n"1) ( ) ' 1 ) n"1
# a0 & % ( % a1 ( % a2 ( % ( % a3 ( % M ( % ( $an"1'
2 = 1 = i 3 4 = 2 = -1 1 n=8 0 = 0 = 1
Fourier matrix Fn 5 6 = 3 = -i
41 42
Fast Fourier Transform

Goal. Evaluate a degree n-1 polynomial A(x) = a0 + ... + an-1 xn-1 at its nth roots of unity: 0, 1, , n-1. Divide. Break up polynomial into even and odd powers. Aeven(x) = a0 + a2x + a4x2 + + an-2 x n/2 - 1. Aodd (x) = a1 + a3x + a5x2 + + an-1 x n/2 - 1. A(x) = Aeven(x2) + x Aodd(x2).

FFT Algorithm
fft(n, a0,a1,,an-1) { if (n == 1) return a0 (e0,e1,,en/2-1) FFT(n/2, a0,a2,a4,,an-2) (d0,d1,,dn/2-1) FFT(n/2, a1,a3,a5,,an-1) for k = k yk+n/2 yk+n/2 } 0 to n/2 - 1 { e2ik/n ek + k dk ek - k dk
Conquer. Evaluate Aeven(x) and Aodd(x) at the nth roots of unity: 0, 1, , n/2-1. Combine. A( k) = Aeven( k) + k Aodd ( k), 0 k < n/2 k+ n) = A k k k A( even( ) Aodd ( ), 0 k < n/2

k = (k )2
return (y0,y1,,yn-1) }
k = (k + n )2
k+ n = -k
43
44
FFT Summary
Theorem. FFT algorithm evaluates a degree n-1 polynomial at each of the nth roots of unity in O(n log n) steps.
assumes n is a power of 2
Recursion Tree
a0, a1, a2, a3, a4, a5, a6, a7
Running time.
perfect shuffle
T (n) = 2T (n /2) + "(n) # T (n) = "(n log n)

a0, a2, a4, a6 a1, a3, a5, a7
!
a0, a4 a2, a6 a1, a5 a3, a7
O(n log n)
a0 , a1 , ..., an-1
(" 0 , y0 ), ..., (" n#1 , yn#1 )

???
a0 000
a4 100
a2 010
a6 110
a1 001
a5 101
a3 011
a7 111
!
45
"bit-reversed" order
46
Inverse Discrete Fourier Transform

Point-value coefficient. Given n distinct points x0, ... , xn-1 and values y0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has given values at given points.
Inverse DFT
Claim. Inverse of Fourier matrix Fn is given by following formula.
# a0 & % ( % a1 ( % a2 ( % ( = % a3 ( % M ( % ( $an"1'
#1 1 1 % 1 )1 )2 % % 1 )2 )4 % 3 )6 %1 ) %M M M % n"1 2(n"1) ) $1 )
1 )3 )6 )9 M )
3(n"1)
L L L L O L
& "1 # y0 & ( % ( ( % y1 ( 2(n"1) ( ) % y2 ( % ( 3(n"1) ( ) ( % y3 ( ( M % M ( (n"1)(n"1) ( % ( ) $ yn"1' ' 1 ) n"1
Gn
$1 1 & 1 "#1 & #2 1 &1 " = & #3 n &1 " &M M & 1 "#(n#1) %
1 n
1 "#2 "#4 " M

#6
1 "#3 "#6 "#9 M "#3(n#1)
L L L L O L
"#2(n#1)
' 1 ) "#(n#1) ) "#2(n#1) ) ) "#3(n#1) ) ) M #(n#1)(n#1) ) " (
Fn is unitary
Inverse DFT
Fourier matrix inverse (Fn) -1
Consequence. To compute inverse FFT, apply same algorithm but use -1 = e -2 i / n as principal nth root of unity (and divide by n).
47
48
Inverse FFT: Proof of Correctness

Claim. Fn and Gn are inverses. Pf.
Inverse FFT: Algorithm
(F
Gn
k k"
1 1 " " % # k j #$ j k = % #(k$k ) j n j=0 n j=0
n$1
n$1
& 1 if k = k" = ' ( 0 otherwise

summation lemma
ifft(n, a0,a1,,an-1) { if (n == 1) return a0 (e0,e1,,en/2-1) FFT(n/2, a0,a2,a4,,an-2) (d0,d1,,dn/2-1) FFT(n/2, a1,a3,a5,,an-1) for k = k yk+n/2 yk+n/2 } 0 to n/2 - 1 { e-2ik/n (ek + k dk) / n (ek - k dk) / n
! Summation lemma. Let be a principal nth root of unity. Then

n#1
& n if k % 0 mod n $ "kj = ' ( 0 otherwise j=0
Pf.

If k is a! multiple of n then k = 1 series sums to n. th root of unity k is a root of xn - 1 = (x - 1) (1 + x + x2 + ... + xn-1). Each n if k 1 we have: 1 + k + k(2) + + k(n-1) = 0 series sums to 0.
49
return (y0,y1,,yn-1) }
50
Inverse FFT Summary

Theorem. Inverse FFT algorithm interpolates a degree n-1 polynomial given values at each of the nth roots of unity in O(n log n) steps.
assumes n is a power of 2 coefficient representation
Polynomial Multiplication
Theorem. Can multiply two degree n-1 polynomials in O(n log n) steps.
pad with 0s to make n a power of 2
a0 , a1 , K, an-1 b0 , b1 , K, bn-1
c0 , c1 , K, c2n-2
!
O(n log n)
2 FFTs
O(n log n)
inverse FFT
O(n log n)
a0 , a1 , K, an-1
(" 0 , y0 ), K, (" n#1 , yn#1 )

O(n log n)
A(" 0 ), ..., A(" 2n#1 ) B(" ), ..., B("

51
point-value multiplication
2n#1
O(n)
C(" 0 ), ..., C(" 2n#1 )
52
FFT in Practice ?
FFT in Practice
Fastest Fourier transform in the West. [Frigo and Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendor-tuned code.

Implementation details. Instead of executing predetermined algorithm, it evaluates your hardware and uses a special-purpose compiler to generate an optimized algorithm catered to "shape" of the problem. Core algorithm is nonrecursive version of Cooley-Tukey. O(n log n), even for prime sizes.

Reference: http://www.fftw.org
53
54
Integer Multiplication, Redux

Integer multiplication. Given two n bit integers a = an-1 a1a0 and b = bn-1 b1b0, compute their product a b. Convolution algorithm. Form two polynomials. Note: a = A(2), b = B(2). Compute C(x) = A(x) B(x). Evaluate C(2) = a b. !

Integer Multiplication, Redux

Integer multiplication. Given two n bit integers a = an-1 a1a0 and b = bn-1 b1b0, compute their product a b.
A(x) = a0 + a1x + a2 x 2 +L+ an"1x n"1
"the fastest bignum library on the planet"
B(x) = b0 + b1x + b2 x 2 +L+ bn"1x n"1
Practice. [GNU Multiple Precision Arithmetic Library] It uses brute force, Karatsuba, and FFT, depending on the size of n.
Running time: O(n log n) complex arithmetic operations. !
Theory. [Schnhage-Strassen 1971] O(n log n log log n) bit operations. Theory. [Frer 2007] O(n log n 2O(log *n)) bit operations.
55
56
Integer Arithmetic
Fundamental open question. What is complexity of arithmetic?
Factoring
Factoring. Given an n-bit integer, find its prime factorization.
2773 = 47 59 Operation addition multiplication division Upper Bound O(n) O(n log n 2O(log*n)) Lower Bound (n) (n) (n) 267-1 = 147573952589676412927 = 193707721 761838257287
a disproof of Mersenne's conjecture that 267 - 1 is prime
O(n log n 2O(log*n))
740375634795617128280467960974295731425931888892312890849 362326389727650340282662768919964196251178439958943305021 275853701189680982867331732731089309005525051168770632990 72396380786710086096962537934650563796359

RSA-704 ($30,000 prize if you can factor)
57
58
Factoring and RSA

Primality. Given an n-bit integer, is it prime? Factoring. Given an n-bit integer, find its prime factorization. Significance. Efficient primality testing can implement RSA. Significance. Efficient factoring can break RSA. Theorem. [AKS 2002] Poly-time algorithm for primality testing.
Shor's Algorithm
Shor's algorithm. Can factor an n-bit integer in O(n3) time on a quantum computer.
algorithm uses quantum QFT !
Ramification. At least one of the following is wrong: RSA is secure. Textbook quantum mechanics. Extending Church-Turing thesis.

59
60
Shor's Factoring Algorithm

Period finding.
2
i
Extra Slides
1 1 1 2 2 2 4 4 4 8 8 8 16 1 16 32 2 11 64 4 1 128 8 2
period = 4 period = 6
2 i mod 15 2 i mod 21
Theorem. [Euler] Let p and q be prime, and let N = p q. Then, the following sequence repeats with a period divisible by (p-1) (q-1): x mod N, x2 mod N, x3 mod N, x4 mod N, Consequence. If we can learn something about the period of the sequence, we can learn something about the divisors of (p-1) (q-1).
by using random values of x, we get the divisors of (p-1) (q-1), and from this, can get the divisors of N = p q
61
Fourier Matrix Decomposition
Fn =
$1 1 1 & 1 "1 "2 & & 1 "2 "4 & 1 "3 "6 & &M M M & 1 " n#1 "2(n#1) %
1 "3 "6 "9 M
L L L L
O " 3(n#1) L
' ) ) "2(n#1) ) ) " 3(n#1) ) ) M ) "(n#1)(n#1) ( 1 " n#1
I4
"1 $ 0 = $ $0 $ #0
0 1 0 0
0 0 1 0
0% ' 0' 0' ' 1&
D4
#"0 % 0 = % %0 % $0
0 "1 0 0
0 0 "2 0
0 & ( 0 ( 0 ( ( "3 '
" $ a = $ $ $ #
a0 a1 a2 a3
% ' ' ' ' &
#I y = Fn a = ! n /2 % $ I n /2
Dn /2 & # Fn /2 aeven & (% ( ! " Dn /2 ' $ Fn /2 aodd '

63

How To Multiply: 5.5 Integer Multiplication

Uploaded by

Copyright:

Available Formats

How To Multiply: 5.5 Integer Multiplication

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Multiply: 5.5 Integer Multiplication

Uploaded by

Copyright:

Available Formats

Complex Multiplication

Complex multiplication. (a + bi) (c + di) = x + yi. Grade-school. x = ac - bd, y = bc + ad.

Q. Is it possible to do with fewer multiplications?

5.5 Integer Multiplication

Q. Is it possible to do with fewer multiplications? A. Yes. [Gauss] x = ac - bd, y = (a + b) (c + d) - ac - bd.

Remark. Improvement if no hardware multiply.

Remark. Grade-school addition algorithm is optimal.

Q. Is grade-school multiplication algorithm optimal?

Divide-and-Conquer Multiplication: Warmup

a = 2 " a1 + a0 b = 2 n / 2 " b1 + b0 a b = 2 n / 2 " a1 + a0 2 n / 2 " b1 + b0 = 2 n " a1b1 + 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0

... ... T(n / 2k) ...

16(n/4) ... 4k (n / 2k) ...

T (n) = 4T (n /2) + "(n) # T (n) = "(n 2 ) 123 1 24 4 3

a = 2 n / 2 " a1 b = 2 n / 2 " b1 ab = 2 n " a1b1 = 2 n " a1b1

a0 b0 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0

a = 2 n / 2 " a1 b = 2 n / 2 " b1 ab = 2 n " a1b1 = 2 n " a1b1

a0 b0 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0

Karatsuba: Recursion Tree

Fast Integer Division Too (!)

$ 3 1+ lg n #1 ' () ) = 3n lg 3 # 2n = n& 2 3 & ) % 2 #1 (

After log n iterations, either q = s x or q = s x.

using fast multiplication

T(n/4) T(n/4) ... T(n / 2k) ...

9(n/4) ... 3k (n / 2k) ...

a = [ .70 .20 .10 ] b = [ .30 .40 .30 ]

a " b = (.70 # .30) + (.20 # .40) + (.10 # .30) = .32

Remark. Grade-school dot product algorithm is optimal.

Block Matrix Multiplication

" aik bkj

17 18 19 % ' 21 22 23' 25 26 27' 29 30 31' &

Q. Is grade-school matrix multiplication algorithm optimal?

Matrix Multiplication: Warmup

Fast Matrix Multiplication

"C11 C12 % " A11 $ ' = $ #C21 C22 & # A21

A12 % " B11 '( $ A22 & # B21

B12 % ' B22 &

C11 C12 C21 C22

C11 C12 C21 C22

P5 + P4 " P2 + P6 P + P2 1 P3 + P4 P5 + P " P3 " P7 1

7 ! ! multiplications. 18 = 8 + 10 additions and subtractions.

Fast Matrix Multiplication

Fast Matrix Multiplication: Practice

Analysis. Assume n is a power of 2. T(n) = # arithmetic operations.

T (n) = 7T (n /2 ) + "(n 2 ) # T (n) = "(n log2 7 ) = O(n 2.81 ) 2 3 1 24 14 4 4 3

Remark. Can "Strassenize" Ax = b, determinant, eigenvalues, SVD, .

Fast Matrix Multiplication: Theory

Fast Matrix Multiplication: Theory

5.6 Convolution and FFT

2 N sin kt # " k=1 k

Time Domain vs. Frequency Domain

eix = cos x + i sin x

Sinusoids. Sum of complex exponentials.

Reference: Cleve Moler, Numerical Computing with MATLAB

Time Domain vs. Frequency Domain

Fast Fourier Transform

Magnitude of discrete Fourier transform.

Reference: Cleve Moler, Numerical Computing with MATLAB

Fast Fourier Transform: Applications

Fast Fourier Transform: Brief History