How To Multiply: 5.5 Integer Multiplication
How To Multiply: 5.5 Integer Multiplication
How To Multiply: 5.5 Integer Multiplication
How to Multiply
integers, matrices, and polynomials
Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Complex Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi. Grade-school. x = ac - bd, y = bc + ad.
4 multiplications, 2 additions
Integer Addition
Addition. Given two n-bit integers a and b, compute a + b. Grade-school. (n) bit operations.
Integer Multiplication
Multiplication. Given two n-bit integers a and b, compute a b. Grade-school. (n2) bit operations.
1 1 0 1 0 1 0 1
1 1
1 1 1 1
1 0 1 0
1 1 1 1
1 0 1 0
0 1 1 0
1 0 0 1 1 1 0
0 1 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0
+ 1
0 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
Recursion Tree
" 0 if n = 0 T (n) = # $ 4T (n /2) + n otherwise
lg n $ 21+ lg n #1 ' 2 T (n) = " n 2 k = n & ) = 2n # n % 2 #1 ( k=0
! T(n)
n /2
)(
T(n/2)
T(n/2)
T(n/2)
T(n/2)
4(n/2)
Ex. !
a = 10001101
a1 a0
b = 11100001
b1 b0
T(n/4) T(n/4)
T(n/4)
T(n/4)
T(n/4)
T(n/4) T(n/4)
T(n/4)
T(2)
T(2)
T(2)
T(2)
...
T(2)
T(2)
T(2)
T(2)
lg n
(1)
8
Karatsuba Multiplication
To multiply two n-bit integers a and b: Add two n bit integers. Multiply three n-bit integers, recursively. Add, subtract, and shift to obtain result.
Karatsuba Multiplication
To multiply two n-bit integers a and b: Add two n bit integers. Multiply three n-bit integers, recursively. Add, subtract, and shift to obtain result.
+ + + +
+ + + +
Theorem. [Karatsuba-Ofman 1962] Can multiply two n-bit integers in O(n1.585) bit operations.
T (n) " T ( #n /2$ ) + T ( %n /2& ) + T ( 1+ %n /2& ) + '(n) ( T (n) = O(n lg 3 ) = O(n1.585 ) 1 24 4 3 1444444 24444444 4 3
recursive calls add, subtract, shift
10
T (n) = " n
k=0
3 (2)
T(n) !
T(n/2)
T(n/2)
T(n/2)
3(n/2)
Fact. Complexity of integer division is same as integer multiplication. To compute quotient q: 2 Approximate x = 1 / t using Newton's method: xi+1 = 2xi " t xi
T(n/4) T(n/4)
T(n/4)
T(n/4)
T(n/4) T(n/4)
T(n/4)
T(2)
T(2)
T(2)
T(2)
...
T(2)
T(2)
T(2)
T(2)
lg n
(1)
11 12
Dot Product
Matrix Multiplication
Dot product. Given two length n vectors a and b, compute c = a b. Grade-school. (n) arithmetic operations.
a"b =
# ai bi
i=1
Matrix Multiplication
Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. Grade-school. (n3) arithmetic operations.
cij =
"c11 $ $c21 $M $c # n1 "a11 c12 L c1n % ' $ c22 L c2n ' a = $ 21 $M M O M' $a cn2 L cnn ' & # n1 "b11 b12 a12 L a1n % ' $ a 22 L a 2n ' b b ( $ 21 22 $M M O M ' M $b b a n2 L a nn ' & # n1 n2 L b1n % ' L b2n ' O M' L bnn ' &
C11
A11
A12
B11
" 152 158 164 170 % " 0 1 2 3 % "16 $ ' $ ' $ 504 526 548 570 ' 4 5 6 7 ' $20 $ = $ ( $ 856 894 932 970 ' $ 8 9 10 11' $24 $1208 1262 1316 1370' $12 13 14 15' $28 # & # & #
!
".59 .32 .41% $ ' $.31 .36 .25' $ #.45 .31 .42' & = ".70 .20 .10% " .80 .30 .50% $ ' $ ' .30 .60 .10' ( $ .10 .40 .10' $ $ $ ' #.50 .10 .40' & # .10 .30 .40&
B11
#0 1& #16 17& #2 3& #24 25& #152 158 & C 11 = A11 " B11 + A12 " B21 = % (" % ( + % (" % ( = % ( $4 5' $20 21' $6 7' $28 29' $504 526'
P 1 P2 P3 P4 P5 P6 P7
= A11 " ( B12 # B22 ) = ( A11 + A12 ) " B22 = ( A21 + A22 ) " B11 = A22 " ( B21 # B11 ) = ( A11 + A22 ) " ( B11 + B22 ) = ( A12 # A22 ) " ( B21 + B22 ) = ( A11 # A21 ) " ( B11 + B12 )
= = = =
( A11 " B11 ) + ( A12 " B21 ) ( A11 " B12 ) + ( A12 " B22 ) ( A21 " B11 ) + ( A22 " B21 ) ( A21 " B12 ) + ( A22 " B22 )
= = = =
!
T (n) = 8T (n /2 ) + "(n 2 ) # T (n) = "(n 3 ) 144 44 2 3 1 24 4 3
recursive calls add, form submatrices
!
17 18
Common misperception. Strassen is only a theoretical curiosity. Apple reports 8x speedup on G4 Velocity Engine when n 2,500. Range of instances where it's useful is a subject of controversy.
19
20
) = O(n 2.59 )
Q. Two 3-by-3 matrices with 21 scalar multiplications? A. Also impossible. ! " (n log3 21 ) = O(n 2.77 ) Begun, the decimal wars have. [Pan, Bini et al, Schnhage, ] ! Two 20-by-20 matrices with 4,460 scalar multiplications. Two 48-by-48 matrices with 47,217 scalar multiplications. A year later. ! December, 1979. ! January, 1980.
O(n 2.805 )
Best known. O(n2.376) [Coppersmith-Winograd, 1987] Conjecture. O(n2+) for any > 0. Caveat. Theoretical improvements to Strassen are progressively less practical.
21 22
O(n
2.7801
O(n 2.7799 )
O(n 2.521813)
O(n 2.521801)
!
!
Fourier Analysis
Fourier theorem. [Fourier, Dirichlet, Riemann] Any periodic function can be expressed as the sum of a series of sinusoids. sufficiently smooth
y(t) =
N=1 100 10 5
!
24
Euler's Identity
Sinusoids. Sum of sine an cosines.
sin(2" # 697 t) +
1 2
sin(2" # 1209 t)
Time domain.
!
sound pressure
time (seconds)
Frequency domain.
0.5 amplitude
frequency (Hz)
If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. -Numerical Recipes
The FFT is one of the truly great computational developments of [the 20th] century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. -Charles van Loan
29
30
"New Proof of the Theorem That Every Algebraic Rational Integral Function In One Variable can be Resolved into Real Factors of the First or the Second Degree." - PhD dissertation, 1799 the University of Helmstedt
! Add. O(n) arithmetic operations. ! A(x) + B(x) = (a0 + b0 ) + (a1 + b1 )x +L+ (an"1 + bn"1 )x n"1
Evaluate. O(n) using Horner's method. ! A(x) = a0 + (x (a1 + x (a2 +L+ x (an"2 + x (an"1 ))L)) Multiply (convolve). O(n2) using brute force. !
A(x) " B(x) =
2n#2 i =0
$ ci x i , where ci = $ a j bi# j
j =0
31 32
$ (x " x j )
j#k
xj
x
33
A(x) = % yk
k=0
$ (xk " x j )
j#k
34
# % % % % % % $
y0 y1 y2 M yn"1
2 x0 2 x1 2 x2 M 2 xn"1
# % % % % % % $
a0 a1 a2 M an"1
& ( ( ( ( ( ( '
a0 , a1 , ..., an-1
coefficient representation
!
35 36
Divide-and-Conquer
Decimation in frequency. Break up polynomial into low and high powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Alow(x) = a0 + a1x + a2x2 + a3x3. Ahigh (x) = a4 + a5x + a6x2 + a7x3. A(x) = Alow(x) + x4 Ahigh(x).
M
2 xn"1
L L L O M n"1 L xn"1
& ( ( ( ( ( ( '
# % % % % % % $
a0 a1 a2 M an"1
& ( ( ( ( ( ( '
Decimation in time. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2).
37
38
Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).
Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).
Intuition. Choose two points to be 1. A( 1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree n
Intuition. Choose four complex points to be 1, i. A(1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree n at 4 points by evaluating two polynomials A( i ) = Aeven(-1) + i Aodd(-1). of degree n at 2 points. A( -i ) = Aeven(-1) - i Aodd(-1).
39
40
Roots of Unity
Def. An nth root of unity is a complex number x such that xn = 1. Fact. The nth roots of unity are: 0, 1, , n-1 where = e 2 i / n. Pf. (k)n = (e 2 i k / n) n = (e i ) 2k = (-1) 2k = 1. Fact. The nth roots of unity are: 0, 1, , n/2-1 where = 2 = e 4 i / n.
# y0 & % ( % y1 ( % y2 ( % ( = % y3 ( % M ( % ( $yn"1'
DFT
#1 1 1 % 1 )1 )2 % % 1 )2 )4 % 3 )6 %1 ) %M M M % 1 ) n"1 )2(n"1) $
1 )3 )6 )9 M ) 3(n"1)
L L L L O L
# a0 & % ( % a1 ( % a2 ( % ( % a3 ( % M ( % ( $an"1'
2 = 1 = i 3 4 = 2 = -1 1 n=8 0 = 0 = 1
Fourier matrix Fn 5 6 = 3 = -i
41 42
FFT Algorithm
fft(n, a0,a1,,an-1) { if (n == 1) return a0 (e0,e1,,en/2-1) FFT(n/2, a0,a2,a4,,an-2) (d0,d1,,dn/2-1) FFT(n/2, a1,a3,a5,,an-1) for k = k yk+n/2 yk+n/2 } 0 to n/2 - 1 { e2ik/n ek + k dk ek - k dk
Conquer. Evaluate Aeven(x) and Aodd(x) at the nth roots of unity: 0, 1, , n/2-1. Combine. A( k) = Aeven( k) + k Aodd ( k), 0 k < n/2 k+ n) = A k k k A( even( ) Aodd ( ), 0 k < n/2
k = (k )2
return (y0,y1,,yn-1) }
k = (k + n )2
k+ n = -k
43
44
FFT Summary
Theorem. FFT algorithm evaluates a degree n-1 polynomial at each of the nth roots of unity in O(n log n) steps.
assumes n is a power of 2
Recursion Tree
Running time.
perfect shuffle
!
a0, a4 a2, a6 a1, a5 a3, a7
O(n log n)
a0 , a1 , ..., an-1
coefficient representation
a0 000
a4 100
a2 010
a6 110
a1 001
a5 101
a3 011
a7 111
!
45
"bit-reversed" order
46
Inverse DFT
Claim. Inverse of Fourier matrix Fn is given by following formula.
# a0 & % ( % a1 ( % a2 ( % ( = % a3 ( % M ( % ( $an"1'
#1 1 1 % 1 )1 )2 % % 1 )2 )4 % 3 )6 %1 ) %M M M % n"1 2(n"1) ) $1 )
1 )3 )6 )9 M )
3(n"1)
L L L L O L
Gn
$1 1 & 1 "#1 & #2 1 &1 " = & #3 n &1 " &M M & 1 "#(n#1) %
1 n
L L L L O L
"#2(n#1)
Fn is unitary
Inverse DFT
Consequence. To compute inverse FFT, apply same algorithm but use -1 = e -2 i / n as principal nth root of unity (and divide by n).
47
48
(F
Gn
k k"
n$1
n$1
ifft(n, a0,a1,,an-1) { if (n == 1) return a0 (e0,e1,,en/2-1) FFT(n/2, a0,a2,a4,,an-2) (d0,d1,,dn/2-1) FFT(n/2, a1,a3,a5,,an-1) for k = k yk+n/2 yk+n/2 } 0 to n/2 - 1 { e-2ik/n (ek + k dk) / n (ek - k dk) / n
Pf.
If k is a! multiple of n then k = 1 series sums to n. th root of unity k is a root of xn - 1 = (x - 1) (1 + x + x2 + ... + xn-1). Each n if k 1 we have: 1 + k + k(2) + + k(n-1) = 0 series sums to 0.
49
return (y0,y1,,yn-1) }
50
Polynomial Multiplication
Theorem. Can multiply two degree n-1 polynomials in O(n log n) steps.
pad with 0s to make n a power of 2
coefficient representation
a0 , a1 , K, an-1 b0 , b1 , K, bn-1
c0 , c1 , K, c2n-2
!
O(n log n)
2 FFTs
O(n log n)
inverse FFT
O(n log n)
a0 , a1 , K, an-1
coefficient representation
point-value multiplication
2n#1
O(n)
52
FFT in Practice ?
FFT in Practice
Fastest Fourier transform in the West. [Frigo and Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendor-tuned code.
Implementation details. Instead of executing predetermined algorithm, it evaluates your hardware and uses a special-purpose compiler to generate an optimized algorithm catered to "shape" of the problem. Core algorithm is nonrecursive version of Cooley-Tukey. O(n log n), even for prime sizes.
Reference: http://www.fftw.org
53
54
Practice. [GNU Multiple Precision Arithmetic Library] It uses brute force, Karatsuba, and FFT, depending on the size of n.
Theory. [Schnhage-Strassen 1971] O(n log n log log n) bit operations. Theory. [Frer 2007] O(n log n 2O(log *n)) bit operations.
55
56
Integer Arithmetic
Fundamental open question. What is complexity of arithmetic?
Factoring
Factoring. Given an n-bit integer, find its prime factorization.
2773 = 47 59 Operation addition multiplication division Upper Bound O(n) O(n log n 2O(log*n)) Lower Bound (n) (n) (n) 267-1 = 147573952589676412927 = 193707721 761838257287
a disproof of Mersenne's conjecture that 267 - 1 is prime
57
58
Shor's Algorithm
Shor's algorithm. Can factor an n-bit integer in O(n3) time on a quantum computer.
algorithm uses quantum QFT !
Ramification. At least one of the following is wrong: RSA is secure. Textbook quantum mechanics. Extending Church-Turing thesis.
59
60
Extra Slides
1 1 1 2 2 2 4 4 4 8 8 8 16 1 16 32 2 11 64 4 1 128 8 2
period = 4 period = 6
2 i mod 15 2 i mod 21
Theorem. [Euler] Let p and q be prime, and let N = p q. Then, the following sequence repeats with a period divisible by (p-1) (q-1): x mod N, x2 mod N, x3 mod N, x4 mod N, Consequence. If we can learn something about the period of the sequence, we can learn something about the divisors of (p-1) (q-1).
by using random values of x, we get the divisors of (p-1) (q-1), and from this, can get the divisors of N = p q
61
Fn =
$1 1 1 & 1 "1 "2 & & 1 "2 "4 & 1 "3 "6 & &M M M & 1 " n#1 "2(n#1) %
L L L L
O " 3(n#1) L
I4
"1 $ 0 = $ $0 $ #0
0 1 0 0
0 0 1 0
D4
#"0 % 0 = % %0 % $0
0 "1 0 0
0 0 "2 0
" $ a = $ $ $ #
a0 a1 a2 a3
#I y = Fn a = ! n /2 % $ I n /2