How To Multiply: 5.5 Integer Multiplication

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Complex Multiplication

Complex multiplication. (a + bi) (c + di) = x + yi. Grade-school. x = ac - bd, y = bc + ad.


4 multiplications, 2 additions

How to Multiply
integers, matrices, and polynomials

Q. Is it possible to do with fewer multiplications?

Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Complex Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi. Grade-school. x = ac - bd, y = bc + ad.
4 multiplications, 2 additions

5.5 Integer Multiplication

Q. Is it possible to do with fewer multiplications? A. Yes. [Gauss] x = ac - bd, y = (a + b) (c + d) - ac - bd.


3 multiplications, 5 additions

Remark. Improvement if no hardware multiply.

Integer Addition
Addition. Given two n-bit integers a and b, compute a + b. Grade-school. (n) bit operations.

Integer Multiplication
Multiplication. Given two n-bit integers a and b, compute a b. Grade-school. (n2) bit operations.
1 1 0 1 0 1 0 1

1 1

1 1 1 1

1 0 1 0

1 1 1 1

1 0 1 0

0 1 1 0

1 0 0 1 1 1 0

0 1 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0

+ 1

0 0

Remark. Grade-school addition algorithm is optimal.

1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1

Q. Is grade-school multiplication algorithm optimal?


5 6

Divide-and-Conquer Multiplication: Warmup


To multiply two n-bit integers a and b: Multiply four n-bit integers, recursively. Add and shift to obtain result.

Recursion Tree
" 0 if n = 0 T (n) = # $ 4T (n /2) + n otherwise
lg n $ 21+ lg n #1 ' 2 T (n) = " n 2 k = n & ) = 2n # n % 2 #1 ( k=0

! T(n)

a = 2 " a1 + a0 b = 2 n / 2 " b1 + b0 a b = 2 n / 2 " a1 + a0 2 n / 2 " b1 + b0 = 2 n " a1b1 + 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0

n /2

)(

T(n/2)

T(n/2)

T(n/2)

T(n/2)

4(n/2)

Ex. !

a = 10001101
a1 a0

b = 11100001
b1 b0

T(n/4) T(n/4)

T(n/4)

T(n/4)

... ... T(n / 2k) ...

T(n/4)

T(n/4) T(n/4)

T(n/4)

16(n/4) ... 4k (n / 2k) ...

T (n) = 4T (n /2) + "(n) # T (n) = "(n 2 ) 123 1 24 4 3


recursive calls add, shift

T(2)

T(2)

T(2)

T(2)

...

T(2)

T(2)

T(2)

T(2)

lg n

(1)
8

Karatsuba Multiplication
To multiply two n-bit integers a and b: Add two n bit integers. Multiply three n-bit integers, recursively. Add, subtract, and shift to obtain result.

Karatsuba Multiplication
To multiply two n-bit integers a and b: Add two n bit integers. Multiply three n-bit integers, recursively. Add, subtract, and shift to obtain result.

a = 2 n / 2 " a1 b = 2 n / 2 " b1 ab = 2 n " a1b1 = 2 n " a1b1


1

+ + + +

a0 b0 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0


2 1 3 3

a = 2 n / 2 " a1 b = 2 n / 2 " b1 ab = 2 n " a1b1 = 2 n " a1b1


1

+ + + +

a0 b0 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0


2 1 3 3

Theorem. [Karatsuba-Ofman 1962] Can multiply two n-bit integers in O(n1.585) bit operations.
T (n) " T ( #n /2$ ) + T ( %n /2& ) + T ( 1+ %n /2& ) + '(n) ( T (n) = O(n lg 3 ) = O(n1.585 ) 1 24 4 3 1444444 24444444 4 3
recursive calls add, subtract, shift

10

Karatsuba: Recursion Tree


" 0 if n = 0 T (n) = # $ 3T (n /2) + n otherwise
lg n k

Fast Integer Division Too (!)


Integer division. Given two n-bit (or less) integers s and t, compute quotient q = s / t and remainder r = s mod t.

T (n) = " n
k=0

3 (2)

$ 3 1+ lg n #1 ' () ) = 3n lg 3 # 2n = n& 2 3 & ) % 2 #1 (

T(n) !

T(n/2)

T(n/2)

T(n/2)

3(n/2)

Fact. Complexity of integer division is same as integer multiplication. To compute quotient q: 2 Approximate x = 1 / t using Newton's method: xi+1 = 2xi " t xi

After log n iterations, either q = s x or q = s x.


!

using fast multiplication

T(n/4) T(n/4)

T(n/4)

T(n/4) T(n/4) ... T(n / 2k) ...

T(n/4)

T(n/4) T(n/4)

T(n/4)

9(n/4) ... 3k (n / 2k) ...

T(2)

T(2)

T(2)

T(2)

...

T(2)

T(2)

T(2)

T(2)

lg n

(1)
11 12

Dot Product

Matrix Multiplication

Dot product. Given two length n vectors a and b, compute c = a b. Grade-school. (n) arithmetic operations.
a"b =

# ai bi
i=1

a = [ .70 .20 .10 ] b = [ .30 .40 .30 ]

a " b = (.70 # .30) + (.20 # .40) + (.10 # .30) = .32

Remark. Grade-school dot product algorithm is optimal.


14

Matrix Multiplication
Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. Grade-school. (n3) arithmetic operations.
cij =
"c11 $ $c21 $M $c # n1 "a11 c12 L c1n % ' $ c22 L c2n ' a = $ 21 $M M O M' $a cn2 L cnn ' & # n1 "b11 b12 a12 L a1n % ' $ a 22 L a 2n ' b b ( $ 21 22 $M M O M ' M $b b a n2 L a nn ' & # n1 n2 L b1n % ' L b2n ' O M' L bnn ' &

Block Matrix Multiplication

" aik bkj


k=1

C11

A11

A12

B11

" 152 158 164 170 % " 0 1 2 3 % "16 $ ' $ ' $ 504 526 548 570 ' 4 5 6 7 ' $20 $ = $ ( $ 856 894 932 970 ' $ 8 9 10 11' $24 $1208 1262 1316 1370' $12 13 14 15' $28 # & # & #

17 18 19 % ' 21 22 23' 25 26 27' 29 30 31' &

!
".59 .32 .41% $ ' $.31 .36 .25' $ #.45 .31 .42' & = ".70 .20 .10% " .80 .30 .50% $ ' $ ' .30 .60 .10' ( $ .10 .40 .10' $ $ $ ' #.50 .10 .40' & # .10 .30 .40&

B11

#0 1& #16 17& #2 3& #24 25& #152 158 & C 11 = A11 " B11 + A12 " B21 = % (" % ( + % (" % ( = % ( $4 5' $20 21' $6 7' $28 29' $504 526'

Q. Is grade-school matrix multiplication algorithm optimal?


!
15 16

Matrix Multiplication: Warmup


To multiply two n-by-n matrices A and B: Divide: partition A and B into n-by-n blocks. Conquer: multiply 8 pairs of n-by-n matrices, recursively. Combine: add appropriate products using 4 matrix additions.

Fast Matrix Multiplication


Key idea. multiply 2-by-2 blocks with only 7 multiplications.
"C11 $ #C21 " A11 C12 % ' = $ C22 & # A21 A12 % " B11 ' ( $ A22 & # B21 B12 % ' B22 &

P 1 P2 P3 P4 P5 P6 P7

= A11 " ( B12 # B22 ) = ( A11 + A12 ) " B22 = ( A21 + A22 ) " B11 = A22 " ( B21 # B11 ) = ( A11 + A22 ) " ( B11 + B22 ) = ( A12 # A22 ) " ( B21 + B22 ) = ( A11 # A21 ) " ( B11 + B12 )

"C11 C12 % " A11 $ ' = $ #C21 C22 & # A21

A12 % " B11 '( $ A22 & # B21

B12 % ' B22 &

C11 C12 C21 C22

= = = =

( A11 " B11 ) + ( A12 " B21 ) ( A11 " B12 ) + ( A12 " B22 ) ( A21 " B11 ) + ( A22 " B21 ) ( A21 " B12 ) + ( A22 " B22 )

C11 C12 C21 C22

= = = =

P5 + P4 " P2 + P6 P + P2 1 P3 + P4 P5 + P " P3 " P7 1

!
T (n) = 8T (n /2 ) + "(n 2 ) # T (n) = "(n 3 ) 144 44 2 3 1 24 4 3
recursive calls add, form submatrices

7 ! ! multiplications. 18 = 8 + 10 additions and subtractions.

!
17 18

Fast Matrix Multiplication


To multiply two n-by-n matrices A and B: [Strassen 1969] Divide: partition A and B into n-by-n blocks. Compute: 14 n-by-n matrices via 10 matrix additions. Conquer: multiply 7 pairs of n-by-n matrices, recursively. Combine: 7 products into 4 terms using 8 matrix additions.

Fast Matrix Multiplication: Practice


Implementation issues. Sparsity. Caching effects. Numerical stability. Odd matrix dimensions. Crossover to classical algorithm around n = 128.

Analysis. Assume n is a power of 2. T(n) = # arithmetic operations.


Common misperception. Strassen is only a theoretical curiosity. Apple reports 8x speedup on G4 Velocity Engine when n 2,500. Range of instances where it's useful is a subject of controversy.

T (n) = 7T (n /2 ) + "(n 2 ) # T (n) = "(n log2 7 ) = O(n 2.81 ) 2 3 1 24 14 4 4 3


recursive calls add, subtract

Remark. Can "Strassenize" Ax = b, determinant, eigenvalues, SVD, .

19

20

Fast Matrix Multiplication: Theory


Q. Multiply two 2-by-2 matrices with 7 scalar multiplications? A. Yes! [Strassen 1969] "(n log2 7 ) = O(n 2.807 ) Q. Multiply two 2-by-2 matrices with 6 scalar multiplications? ! A. Impossible. [Hopcroft and Kerr 1971] log 2 6
"(n

Fast Matrix Multiplication: Theory

) = O(n 2.59 )

Q. Two 3-by-3 matrices with 21 scalar multiplications? A. Also impossible. ! " (n log3 21 ) = O(n 2.77 ) Begun, the decimal wars have. [Pan, Bini et al, Schnhage, ] ! Two 20-by-20 matrices with 4,460 scalar multiplications. Two 48-by-48 matrices with 47,217 scalar multiplications. A year later. ! December, 1979. ! January, 1980.

O(n 2.805 )

Best known. O(n2.376) [Coppersmith-Winograd, 1987] Conjecture. O(n2+) for any > 0. Caveat. Theoretical improvements to Strassen are progressively less practical.
21 22

O(n

2.7801

O(n 2.7799 )

O(n 2.521813)
O(n 2.521801)

!
!

Fourier Analysis

5.6 Convolution and FFT

Fourier theorem. [Fourier, Dirichlet, Riemann] Any periodic function can be expressed as the sum of a series of sinusoids. sufficiently smooth

y(t) =

2 N sin kt # " k=1 k

N=1 100 10 5

!
24

Euler's Identity
Sinusoids. Sum of sine an cosines.

Time Domain vs. Frequency Domain


Signal. [touch tone button 1]
y(t) =
1 2

sin(2" # 697 t) +

1 2

sin(2" # 1209 t)

Time domain.

!
sound pressure

eix = cos x + i sin x


Euler's identity

time (seconds)

Sinusoids. Sum of complex exponentials.

Frequency domain.
0.5 amplitude

frequency (Hz)

Reference: Cleve Moler, Numerical Computing with MATLAB


25 26

Time Domain vs. Frequency Domain


Signal. [recording, 8192 samples per second]

Fast Fourier Transform


FFT. Fast way to convert between time-domain and frequency-domain. Alternate viewpoint. Fast way to multiply and evaluate polynomials.
we take this approach

Magnitude of discrete Fourier transform.

If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. -Numerical Recipes

Reference: Cleve Moler, Numerical Computing with MATLAB


27 28

Fast Fourier Transform: Applications


Applications. Optics, acoustics, quantum physics, telecommunications, radar, control systems, signal processing, speech recognition, data compression, image processing, seismology, mass spectrometry Digital media. [DVD, JPEG, MP3, H.264] Medical diagnostics. [MRI, CT, PET scans, ultrasound] Numerical solutions to Poisson's equation. Shor's quantum factoring algorithm.

Fast Fourier Transform: Brief History


Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres. Runge-Knig (1924). Laid theoretical groundwork. Danielson-Lanczos (1942). Efficient algorithm, x-ray crystallography. Cooley-Tukey (1965). Monitoring nuclear tests in Soviet Union and tracking submarines. Rediscovered and popularized FFT.

The FFT is one of the truly great computational developments of [the 20th] century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. -Charles van Loan

Importance not fully realized until advent of digital computers.

29

30

Polynomials: Coefficient Representation


Polynomial. [coefficient representation]
A(x) = a0 + a1x + a2 x 2 +L+ an"1x n"1 B(x) = b0 + b1x + b2 x +L+ bn"1x
2 n"1

A Modest PhD Dissertation Title

"New Proof of the Theorem That Every Algebraic Rational Integral Function In One Variable can be Resolved into Real Factors of the First or the Second Degree." - PhD dissertation, 1799 the University of Helmstedt

! Add. O(n) arithmetic operations. ! A(x) + B(x) = (a0 + b0 ) + (a1 + b1 )x +L+ (an"1 + bn"1 )x n"1

Evaluate. O(n) using Horner's method. ! A(x) = a0 + (x (a1 + x (a2 +L+ x (an"2 + x (an"1 ))L)) Multiply (convolve). O(n2) using brute force. !
A(x) " B(x) =
2n#2 i =0

$ ci x i , where ci = $ a j bi# j
j =0
31 32

Polynomials: Point-Value Representation


Fundamental theorem of algebra. [Gauss, PhD thesis] A degree n polynomial with complex coefficients has exactly n complex roots. Corollary. A degree n-1 polynomial A(x) is uniquely specified by its evaluation at n distinct values of x.

Polynomials: Point-Value Representation


Polynomial. [point-value representation]
A(x) : (x 0 , y0 ), K, (x n-1 , yn"1 ) B(x) : (x 0 , z0 ), K, (x n-1 , zn"1 )

Add. O(n) arithmetic operations.


!

A(x) + B(x) : (x 0 , y0 + z0 ), K, (x n-1 , yn"1 + zn"1 )

Multiply (convolve). O(n), but need 2n-1 points.


! A(x) " B(x) : (x 0 , y0 " z0 ), K, (x 2n-1 , y2n#1 " z2n#1 )
yj = A(xj )

Evaluate. O(n2) using Lagrange's formula.


!
n"1

$ (x " x j )
j#k

xj

x
33

A(x) = % yk
k=0

$ (xk " x j )
j#k
34

Converting Between Two Polynomial Representations


Tradeoff. Fast evaluation or fast multiplication. We want both!

Converting Between Two Representations: Brute Force


Coefficient point-value. Given a polynomial a0 + a1 x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1.

representation coefficient point-value

multiply O(n2) O(n)

evaluate O(n) O(n2)

Goal. Efficient conversion between two representations all ops fast.


!

# % % % % % % $

y0 y1 y2 M yn"1

#1 x & 0 % ( % 1 x1 ( ( = % 1 x2 % ( M %M ( ( % 1 xn"1 ' $

2 x0 2 x1 2 x2 M 2 xn"1

n"1 L x0 & ( L x1n"1 ( n"1 L x2 ( ( O M ( n"1 L xn"1 ( '

# % % % % % % $

a0 a1 a2 M an"1

& ( ( ( ( ( ( '

a0 , a1 , ..., an-1
coefficient representation

(x0 , y0 ), K, (xn"1 , yn"1 )


point-value representation

Running time. O(n2) for matrix-vector multiply (or n Horner's).

!
35 36

Converting Between Two Representations: Brute Force


Point-value coefficient. Given n distinct points x0, ... , xn-1 and values y0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has given values at given points.
# % % % % % % $ y0 y1 y2 M yn"1 #1 x & 0 % ( % 1 x1 ( ( = % 1 x2 % ( M %M ( ( % 1 xn"1 ' $
2 x0 2 x1 2 x2

Divide-and-Conquer
Decimation in frequency. Break up polynomial into low and high powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Alow(x) = a0 + a1x + a2x2 + a3x3. Ahigh (x) = a4 + a5x + a6x2 + a7x3. A(x) = Alow(x) + x4 Ahigh(x).

M
2 xn"1

L L L O M n"1 L xn"1

n"1 x0 x1n"1 n"1 x2

& ( ( ( ( ( ( '

# % % % % % % $

a0 a1 a2 M an"1

& ( ( ( ( ( ( '

Vandermonde matrix is invertible iff xi distinct

Decimation in time. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2).

Running time. O(n3) for Gaussian elimination.


or O(n2.376) via fast matrix multiplication

37

38

Coefficient to Point-Value Representation: Intuition


Coefficient point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1.
we get to choose which ones!

Coefficient to Point-Value Representation: Intuition


Coefficient point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1.
we get to choose which ones!

Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).

Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).

Intuition. Choose two points to be 1. A( 1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree n

at 2 points by evaluating two polynomials of degree n at 1 point.

Intuition. Choose four complex points to be 1, i. A(1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree n at 4 points by evaluating two polynomials A( i ) = Aeven(-1) + i Aodd(-1). of degree n at 2 points. A( -i ) = Aeven(-1) - i Aodd(-1).

39

40

Discrete Fourier Transform


Coefficient point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1, evaluate it at n distinct points x0 , ..., xn-1. Key idea. Choose xk = k where is principal nth root of unity.

Roots of Unity
Def. An nth root of unity is a complex number x such that xn = 1. Fact. The nth roots of unity are: 0, 1, , n-1 where = e 2 i / n. Pf. (k)n = (e 2 i k / n) n = (e i ) 2k = (-1) 2k = 1. Fact. The nth roots of unity are: 0, 1, , n/2-1 where = 2 = e 4 i / n.

# y0 & % ( % y1 ( % y2 ( % ( = % y3 ( % M ( % ( $yn"1'
DFT

#1 1 1 % 1 )1 )2 % % 1 )2 )4 % 3 )6 %1 ) %M M M % 1 ) n"1 )2(n"1) $

1 )3 )6 )9 M ) 3(n"1)

L L L L O L

& ( ( )2(n"1) ( ( ) 3(n"1) ( ( M (n"1)(n"1) ( ) ' 1 ) n"1

# a0 & % ( % a1 ( % a2 ( % ( % a3 ( % M ( % ( $an"1'

2 = 1 = i 3 4 = 2 = -1 1 n=8 0 = 0 = 1

Fourier matrix Fn 5 6 = 3 = -i
41 42

Fast Fourier Transform


Goal. Evaluate a degree n-1 polynomial A(x) = a0 + ... + an-1 xn-1 at its nth roots of unity: 0, 1, , n-1. Divide. Break up polynomial into even and odd powers. Aeven(x) = a0 + a2x + a4x2 + + an-2 x n/2 - 1. Aodd (x) = a1 + a3x + a5x2 + + an-1 x n/2 - 1. A(x) = Aeven(x2) + x Aodd(x2).

FFT Algorithm

fft(n, a0,a1,,an-1) { if (n == 1) return a0 (e0,e1,,en/2-1) FFT(n/2, a0,a2,a4,,an-2) (d0,d1,,dn/2-1) FFT(n/2, a1,a3,a5,,an-1) for k = k yk+n/2 yk+n/2 } 0 to n/2 - 1 { e2ik/n ek + k dk ek - k dk

Conquer. Evaluate Aeven(x) and Aodd(x) at the nth roots of unity: 0, 1, , n/2-1. Combine. A( k) = Aeven( k) + k Aodd ( k), 0 k < n/2 k+ n) = A k k k A( even( ) Aodd ( ), 0 k < n/2

k = (k )2

return (y0,y1,,yn-1) }

k = (k + n )2

k+ n = -k

43

44

FFT Summary
Theorem. FFT algorithm evaluates a degree n-1 polynomial at each of the nth roots of unity in O(n log n) steps.
assumes n is a power of 2

Recursion Tree

a0, a1, a2, a3, a4, a5, a6, a7

Running time.

perfect shuffle

T (n) = 2T (n /2) + "(n) # T (n) = "(n log n)


a0, a2, a4, a6 a1, a3, a5, a7

!
a0, a4 a2, a6 a1, a5 a3, a7

O(n log n)

a0 , a1 , ..., an-1
coefficient representation

(" 0 , y0 ), ..., (" n#1 , yn#1 )


???
point-value representation

a0 000

a4 100

a2 010

a6 110

a1 001

a5 101

a3 011

a7 111

!
45

"bit-reversed" order
46

Inverse Discrete Fourier Transform


Point-value coefficient. Given n distinct points x0, ... , xn-1 and values y0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has given values at given points.

Inverse DFT
Claim. Inverse of Fourier matrix Fn is given by following formula.

# a0 & % ( % a1 ( % a2 ( % ( = % a3 ( % M ( % ( $an"1'

#1 1 1 % 1 )1 )2 % % 1 )2 )4 % 3 )6 %1 ) %M M M % n"1 2(n"1) ) $1 )

1 )3 )6 )9 M )
3(n"1)

L L L L O L

& "1 # y0 & ( % ( ( % y1 ( 2(n"1) ( ) % y2 ( % ( 3(n"1) ( ) ( % y3 ( ( M % M ( (n"1)(n"1) ( % ( ) $ yn"1' ' 1 ) n"1

Gn

$1 1 & 1 "#1 & #2 1 &1 " = & #3 n &1 " &M M & 1 "#(n#1) %
1 n

1 "#2 "#4 " M


#6

1 "#3 "#6 "#9 M "#3(n#1)

L L L L O L

"#2(n#1)

' 1 ) "#(n#1) ) "#2(n#1) ) ) "#3(n#1) ) ) M #(n#1)(n#1) ) " (

Fn is unitary

Inverse DFT

Fourier matrix inverse (Fn) -1

Consequence. To compute inverse FFT, apply same algorithm but use -1 = e -2 i / n as principal nth root of unity (and divide by n).

47

48

Inverse FFT: Proof of Correctness


Claim. Fn and Gn are inverses. Pf.

Inverse FFT: Algorithm

(F

Gn

k k"

1 1 " " % # k j #$ j k = % #(k$k ) j n j=0 n j=0

n$1

n$1

& 1 if k = k" = ' ( 0 otherwise


summation lemma

ifft(n, a0,a1,,an-1) { if (n == 1) return a0 (e0,e1,,en/2-1) FFT(n/2, a0,a2,a4,,an-2) (d0,d1,,dn/2-1) FFT(n/2, a1,a3,a5,,an-1) for k = k yk+n/2 yk+n/2 } 0 to n/2 - 1 { e-2ik/n (ek + k dk) / n (ek - k dk) / n

! Summation lemma. Let be a principal nth root of unity. Then


n#1

& n if k % 0 mod n $ "kj = ' ( 0 otherwise j=0

Pf.

If k is a! multiple of n then k = 1 series sums to n. th root of unity k is a root of xn - 1 = (x - 1) (1 + x + x2 + ... + xn-1). Each n if k 1 we have: 1 + k + k(2) + + k(n-1) = 0 series sums to 0.
49

return (y0,y1,,yn-1) }

50

Inverse FFT Summary


Theorem. Inverse FFT algorithm interpolates a degree n-1 polynomial given values at each of the nth roots of unity in O(n log n) steps.
assumes n is a power of 2 coefficient representation

Polynomial Multiplication
Theorem. Can multiply two degree n-1 polynomials in O(n log n) steps.
pad with 0s to make n a power of 2

coefficient representation

a0 , a1 , K, an-1 b0 , b1 , K, bn-1

c0 , c1 , K, c2n-2

!
O(n log n)

2 FFTs

O(n log n)

inverse FFT

O(n log n)

a0 , a1 , K, an-1
coefficient representation

(" 0 , y0 ), K, (" n#1 , yn#1 )


O(n log n)
point-value representation

A(" 0 ), ..., A(" 2n#1 ) B(" ), ..., B("


51

point-value multiplication

2n#1

O(n)

C(" 0 ), ..., C(" 2n#1 )

52

FFT in Practice ?

FFT in Practice
Fastest Fourier transform in the West. [Frigo and Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendor-tuned code.

Implementation details. Instead of executing predetermined algorithm, it evaluates your hardware and uses a special-purpose compiler to generate an optimized algorithm catered to "shape" of the problem. Core algorithm is nonrecursive version of Cooley-Tukey. O(n log n), even for prime sizes.

Reference: http://www.fftw.org

53

54

Integer Multiplication, Redux


Integer multiplication. Given two n bit integers a = an-1 a1a0 and b = bn-1 b1b0, compute their product a b. Convolution algorithm. Form two polynomials. Note: a = A(2), b = B(2). Compute C(x) = A(x) B(x). Evaluate C(2) = a b. !

Integer Multiplication, Redux


Integer multiplication. Given two n bit integers a = an-1 a1a0 and b = bn-1 b1b0, compute their product a b.

A(x) = a0 + a1x + a2 x 2 +L+ an"1x n"1

"the fastest bignum library on the planet"

B(x) = b0 + b1x + b2 x 2 +L+ bn"1x n"1

Practice. [GNU Multiple Precision Arithmetic Library] It uses brute force, Karatsuba, and FFT, depending on the size of n.

Running time: O(n log n) complex arithmetic operations. !

Theory. [Schnhage-Strassen 1971] O(n log n log log n) bit operations. Theory. [Frer 2007] O(n log n 2O(log *n)) bit operations.

55

56

Integer Arithmetic
Fundamental open question. What is complexity of arithmetic?

Factoring
Factoring. Given an n-bit integer, find its prime factorization.

2773 = 47 59 Operation addition multiplication division Upper Bound O(n) O(n log n 2O(log*n)) Lower Bound (n) (n) (n) 267-1 = 147573952589676412927 = 193707721 761838257287
a disproof of Mersenne's conjecture that 267 - 1 is prime

O(n log n 2O(log*n))

740375634795617128280467960974295731425931888892312890849 362326389727650340282662768919964196251178439958943305021 275853701189680982867331732731089309005525051168770632990 72396380786710086096962537934650563796359


RSA-704 ($30,000 prize if you can factor)

57

58

Factoring and RSA


Primality. Given an n-bit integer, is it prime? Factoring. Given an n-bit integer, find its prime factorization. Significance. Efficient primality testing can implement RSA. Significance. Efficient factoring can break RSA. Theorem. [AKS 2002] Poly-time algorithm for primality testing.

Shor's Algorithm
Shor's algorithm. Can factor an n-bit integer in O(n3) time on a quantum computer.
algorithm uses quantum QFT !

Ramification. At least one of the following is wrong: RSA is secure. Textbook quantum mechanics. Extending Church-Turing thesis.

59

60

Shor's Factoring Algorithm


Period finding.
2
i

Extra Slides
1 1 1 2 2 2 4 4 4 8 8 8 16 1 16 32 2 11 64 4 1 128 8 2
period = 4 period = 6

2 i mod 15 2 i mod 21

Theorem. [Euler] Let p and q be prime, and let N = p q. Then, the following sequence repeats with a period divisible by (p-1) (q-1): x mod N, x2 mod N, x3 mod N, x4 mod N, Consequence. If we can learn something about the period of the sequence, we can learn something about the divisors of (p-1) (q-1).
by using random values of x, we get the divisors of (p-1) (q-1), and from this, can get the divisors of N = p q

61

Fourier Matrix Decomposition

Fn =

$1 1 1 & 1 "1 "2 & & 1 "2 "4 & 1 "3 "6 & &M M M & 1 " n#1 "2(n#1) %

1 "3 "6 "9 M

L L L L

O " 3(n#1) L

' ) ) "2(n#1) ) ) " 3(n#1) ) ) M ) "(n#1)(n#1) ( 1 " n#1

I4

"1 $ 0 = $ $0 $ #0

0 1 0 0

0 0 1 0

0% ' 0' 0' ' 1&

D4

#"0 % 0 = % %0 % $0

0 "1 0 0

0 0 "2 0

0 & ( 0 ( 0 ( ( "3 '

" $ a = $ $ $ #

a0 a1 a2 a3

% ' ' ' ' &

#I y = Fn a = ! n /2 % $ I n /2

Dn /2 & # Fn /2 aeven & (% ( ! " Dn /2 ' $ Fn /2 aodd '


63

You might also like