Hmmorgan

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Numerical Solutions to the KdV Equation

Hannah Morgan

Abstract
Implicit difference schemes for nonlinear PDEs, such as the Korteweg-de Vries
(KdV) equation, require large systems of equations to be solved at each timestep,
leading to long computation times. A group of finite difference schemes for KdV are
given here, and a parallel algorithm is presented and implemented for one such scheme.
A computational model is compared to experimental results, and the performance and
scalability of the algorithm are discussed.

1 Introduction
The Korteweg-de Vries (KdV) equation, given by (1), is a nonlinear PDE first introduced in
[1] in 1895 to model low amplitude water waves in shallow, narrow channels like canals.

ut + 6uux + uxxx = 0 (1)

It has since been applied to many other problems in physics and engineering including plasma
physics. KdV has been extensively studied both theoretically and numerically. Many finite
difference schemes have been derived for the equation including an explicit one by Zabusky
and Kruskal in [2], who discovered the existence of solitons, and an implicit one from Goda
in [3]. Spectral methods (e.g. [4] and [5]) and finite element methods (e.g. [6], [7], and [8])
have also been used to solve KdV. The accuracy of a given method is simple to measure
since KdV has a smooth analytical solution given by (2).
"√ #
c c
u(t, x) = sech2 (x − ct − a) (2)
2 2
With the rise of parallel numerical analysis, some work has been done to effectively im-
plement certain methods to solve KdV. For example, an intrinsically parallel finite difference
scheme was developed in [9] and a pseudospectral method was implemented in parallel in
[10]. In this paper, we discuss implementation techniques of a finite difference scheme for
KdV and use the block inverse algorithm, a parallel numerical algorithm given in [11], in an
attempt to improve the computation time needed to solve KdV.

2 A finite difference scheme


In this section, we will build a finite difference scheme to solve the KdV equation. To find
u(t, x), we will first find an approximation for the nonlinear advection equation ut +f (u)x = 0
with f (u) = 3u2 and then add the dispersive term uxxx . We will implement our scheme in
Octave.

1
2.1 Formulation
To obtain a finite difference method, we first write the first order Taylor series where xi+1 =
xi + h and xi−1 = xi − h:

y(xi−1 ) = y(xi ) − hy 0 (xi )


y(xi+1 ) = y(xi ) + hy 0 (xi )

Using the equations above, we can write a forward time and backward space scheme:
ui+1,j − ui,j
ut =
∆t
f (ui,j ) − f (ui,j−1 )
f (u)x =
∆x
Now we can write a finite difference method for the nonlinear advection equation
ut + f (u)x = 0 and, given some initial data u(0, j) = g for j = 0, . . . , n, solve it using:

∆t
ui+1,j = ui,j − (f (ui,j ) − f (ui,j−1 ))
∆x
Now we will write the third order Taylor series where xi−2 = xi − 2h and xi−3 = xi − 3h:

9h2 00 27h3 000


y(xi−3 ) = y(xi ) − 3hy 0 (xi ) + y (xi ) − y (xi ) (3)
2 6
4h2 00 8h3 000
y(xi−2 ) = y(xi ) − 2hy 0 (xi ) + y (xi ) − y (xi ) (4)
2 6
h2 h3
y(xi−1 ) = y(xi ) − hy 0 (xi ) + y 00 (xi ) − y 000 (xi ) (5)
2 6
y(xi ) = y(xi ) (6)

Using the equations above, we want to take a linear combination a·(3)+b·(4)+c·(5)+d·(6)


to get an approximation for y 000 (xi ). Adding the equations above and combining like terms,
we get:

ay(xi−3 ) + by(xi−2 ) + cy(xi−1 ) + dy(xi ) = (a + b + c + d)y(xi )


h2 h3
+ (−3a − 2b − c)hy 0 (xi ) + (9a + 4b + c) y 00 (xi ) + (−27a − 8b − c) y 000 (xi )
2 6
Using the coefficients of the left hand side above, we get a system of four equations and four
unknowns:

a+b+c+d=0
−3a − 2b − c = 0
9a + 4b + c = 0
−27a − 8b − c = 6

2
Solving, we find a = −1, b = 3, c = −3, d = 1. Finally, we get:
−y(xi−3 ) + 3y(xi−2 ) − 3y(xi−1 ) + y(xi )
y 000 (xi ) =
h3
Now we have a scheme for uxxx :
ui,j − 3ui,j−1 + 3ui,j−2 − ui,j−3
uxxx =
∆x3
Using Taylor series to rewrite each term, we can write a formulation for the KdV equation
ut + f (u)x + uxxx = 0 as:
∆t ∆t(ui,j − 3ui,j−1 + 3ui,j−2 − ui,j−3 )
ui+1,j = ui,j − (f (ui,j ) − f (ui,j−1 )) −
∆x ∆x3

2.2 Implementation
We will implement our method different ways in Octave. The first implementation uses
the built-in function filter to compute finite differences. Then we will write a matrix
formulation of the equation above and solve the problem using difference matrices in Octave.
Finally, we will increase the efficiency our code by implementing sparse matrices.

2.2.1 Using filter


We will implement our method in Octave using the built-in function filter to compute the
finite difference formula above. The command y = filter(b, a, x) returns the solution
to the difference equation:
N
X M
X
a(k + 1)y(n − k) = b(k + 1)x(n − k)
k=0 k=0

with N = length(a) - 1 and M = length(b) - 1. We write the backward difference method to


compute f (u)x in Octave using the code:
a = [1];
b = [1 -1];
f_x = filter(b, a, f);
Checking with the summation expression above, we are setting f 0 (n) = f (n) − f (n − 1), as
desired. Similarly, we write the difference method to compute uxxx as:
c = [1];
d = [1 -3 3 -1];
u_xxx = filter(d, c, u);
Again checking that we are using filter correctly, we are computing:

u000 (n) = u(n) − 3u(n − 1) + 3u(n − 2) − u(n − 3)

Finally, we can solve the KdV equation using the time-stepping scheme:

3
Figure 1: Resulting plot of KdV solution with Gaussian initial data using filter with
dt = 0.05, dx = 0.9, and nts = 40.

u = u - (dt/dx)*filter(b, a, f) - (dt/(dx*dx*dx))*filter(d, c, u)

A complete Octave code to solve KdV using filter and resulting plot is given below.

% Finite difference time-stepping for 1-D KdV equation


% u_t + (3u^2)_x + u_xxx = 0

% Parameters
dx=.9;
dt=.05;
nts=40;
a=[1];
b=[1 -1];
c=[1];
d=[1 -3 3 -1];

r=1:dx:100;
% Gaussian initial data:
u0=1*exp(-(.05*(r-50)).^2);
u=u0;

% Computing finite differences:


for k=1:nts
u=u-(dt/dx)*filter(b, a, 3*(u .* u))-(dt/(dx*dx*dx))*filter(d, c, u);
end
plot(r,u0,r,u+1)

2.2.2 Using difference matrices


Let us again consider a first order backward difference and write y 0 = yi −yhi−1 . If y is
represented as a column vector, we can also write: y 0 = h1 Dy, where D is the difference

4
matrix given by:  
1 0 0 ··· 0
−1 1 0 · · · 0
 
 0 −1 1 · · · 0
D= 
 .. .. . . . . .. 
 . . . . .
0 0 · · · −1 1
f (ui,j )−f (ui,j−1 )
Then to calculate f (u)x = ∆x
, we can write:
 
  f (ui,1 )
1 0 0 ··· 0  ..
.

−1 1 0 ··· 0 
  
1 
 0 −1 1 · · · 0  f (u i,j−1 )

f (u)x =
  
∆x  .. ..   f (ui,j ) 
 
.. . . ..
 . . . . .  .. 
0 0 · · · −1 1
 . 
f (ui,n )
Similarly, we can use a matrix formulation to write y 000 = 1
h3
D3 y. Then for uxxx we have:

ui,1

  . 
1 0 0 0 ··· 0   .. 
−3 1 0 0 ··· 0  ui,j−3 
 

1  3 −3 1 0 ··· 0 ui,j−2 
 
uxxx =

∆x3 −1 3 −3 1 · · · 0  ui,j−1 
  
 .. . . . . . . . . ..   u 
 . . . . . .  .i,j 

0 0 · · · 3 −3 1  .. 
ui,n
Altogether, we have a time-stepping scheme:
       
ui+1,1 ui,1 f (ui,1 ) ui,1
 ..   ..   ..   .. 
 .   .  ∆t  .  ∆t 3 
. 
 ui+1,j  =  ui,j  − D  f (ui,j )  − D u
       
i,j 
 .   .  ∆x  .  ∆x3

 . 
 ..   ..   ..   .. 
ui+1,n ui,n f (ui,n ) ui,n
In our code, however, u is a row vector, so we take the transpose of the matrices defined
above to compute finite differences. For the first difference, then, we have:
 
1 −1 0 · · · 0
0 1 −1 · · · 0 
1   
f (u)x = f (ui,1 ) · · · f (ui,n )  ... ... . . . . . . ... 
 
∆x  
0 0 · · · 1 −1
0 0 ··· 0 1
In Octave, we populate the first order difference matrix fod and the third order tod and
solve KdV using the time stepping scheme:

5
u = u - ((dt/dx)*(3*(u .* u))*fod) - ((dt/(dx*dx*dx))*u*tod)

A partial code is given below.

% Creating a first order difference matrix fod


fod=eye(nr);
for k=1:(nr-1)
fod(k, k+1)=-1;
end

% Creating a third order difference matrix tod


tod=eye(nr);
for k=1:(nr-1)
tod(k, k+1)=-3;
end
for k=1:(nr-2)
tod(k, k+2)=3;
end
for k=1:(nr-3)
tod(k, k+3)=-1;
end

% Computing finite differences:


for k=1:nts
u=u-((dt/dx)*(3*(u .* u))*fod)-((dt/(dx*dx*dx))*u*tod);
end

2.2.3 Using sparse matrices


The implementation given above is not efficient. Instead, we will implement sparse matrices
and compute finite differences using sparse linear algebra in Octave. We can partially solve
the efficiency problem by converting the matrices defined above using the function sparse
before computing finite differences:

fod = sparse(fod);
tod = sparse(tod);

However, we need not construct these full matrices fod and tod. Instead, we will initially
create the sparse difference matrices we need to compute finite differences. The command
S = sparse(i, j, s) uses the vectors i, j and s (all with same length) to generate a sparse
matrix so that:
S(i(k), j(k)) = s(k)
where k = 1, . . . , length(i). A program for creating the sparse difference matrices tod and
fod is given below with identical output as filter above.

% Creating a sparse first order difference matrix fod

6
kc=0;
for k=1:nr
kc=kc+1;
rindx(kc)=k;
cindx(kc)=k;
val(kc)=1;
end
for k=1:(nr-1)
kc=kc+1;
rindx(kc)=k;
cindx(kc)=k+1;
val(kc)=-1;
end
fod=sparse(rindx, cindx, val);

% Creating a sparse third order difference matrix tod


kc=0
for k=1:nr
kc=kc+1;
rindx2(kc)=k;
cindx2(kc)=k;
val2(kc)=1;
end
for k=1:(nr-1)
kc=kc+1;
rindx2(kc)=k;
cindx2(kc)=k+1;
val2(kc)=-3;
end
for k=1:(nr-2)
kc=kc+1;
rindx2(kc)=k;
cindx2(kc)=k+2;
val2(kc)=3;
end
for k=1:(nr-3)
kc=kc+1;
rindx2(kc)=k;
cindx2(kc)=k+3;
val2(kc)=-1;
end
tod=sparse(rindx2, cindx2, val2);

7
2.3 Results
We use the analytical solution (2) to KdV as initial data with varying parameters:

% Soliton initial data:


c=1;
u0=(c/2)*sech((sqrt(c)/2)*r).^2;

Because the number of time steps will vary depending on our choice of dt, we modify the
parameters of our code:

% Parameters:
dx=.01;
dt=.01;
T=2; % total time
nts=T/dt;

We calculate the maximum relative error of our computed value for u using the code:

max(abs(u-exact)./exact)

where exact is the analytical solution of KdV after a time T .


Because our direct formulation includes division by ∆x3 , we cannot decrease this parameter
by much. Table 1 shows the calculated maximum relative and absolute error and Figure 3
shows our calculated solution.

dx relative error absolute error


1 412.98 0.039905
0.9 1242.2 0.054803
0.8 6892.0 0.067737
0.7 1.1862e+05 0.16609
0.6 1.3426e+07 1.0894
0.5 Inf Inf

Table 1: Maximum relative and absolute error for KdV with c = 0.5, T = 1, and dt = 0.01.

3 A parallel numerical solution


In this section, we will use a well known algorithm to implement a difference scheme for KdV
in parallel. We will discuss the effectiveness of the algorithm for our chosen scheme after
comparing a computational model with experimental results.

8
Figure 2: KdV after T = 1 with dt = 0.01 and dx = 1.

3.1 Overview of well studied methods


The authors of [12] compiled and tested a variety of numerical methods to solve KdV. We
revisit the methods and certain results here. Note that the superscript j represents the time
variable and the space variable is denoted by the subscript i so that uji = u(i∆x, j∆t).

1. Explicit scheme (Zabusky and Kruskal)


Zabusky and Kruskal developed the explicit leapfrog finite difference scheme:
∆t  j  
uj+1
i = uj−1
i −2 ui+1 + uji + uji−1 uji+1 − uji−1
∆x
∆t  j j j j

− u − 2u + 2u − u i−2 .
(∆x)3 i+2 i+1 i−1

2. Hopscotch Method
The Hopscotch Method is a mixed implicit/explicit scheme given by:

∆t  j 
uj+1
i = uji − 3 j
fi+1 − fi−1
∆x
∆t  j j j j

− u − 2ui+1 + 2ui−1 − ui−2 (7)
2(∆x)3 i+2

∆t  j+1 
uj+1
i = uji −3 f j+1
− fi−1
∆x i+1
∆t 
j+1 j+1 j+1 j+1

− u − 2ui+1 + 2ui−1 − ui−2 (8)
2(∆x)3 i+2
u2
where f = 2
and (7) is used to calculate (i + j) even, (8) for (i + j) odd.

3. Implicit scheme (Goda)


An implicit finite difference scheme to solve KdV from Goda is given by:

9
   
1 j+1 j 1 j+1

j j

j+1

j j
u − ui+1 + u u + ui+1 − ui−1 ui + ui−1
∆t i ∆x i+1 i
 
1 j+1 j+1 j+1 j+1
+ u − 2ui+1 + 2ui−1 − ui−2 = 0
2(∆x)3 i+2

where the quasi-pentagonal system of equations shown below must be solved at each
time step.
 
x x x x x
x x x x x
 
x x x x x 
 

 x x x x x 

 
 

 x x x x x
x x x x x
x x x x x

4. The Proposed Scheme


The following scheme is based on the inverse scattering transform:
   
1 j+1 j 1 j+1 j+1 j+1 j+1 j j j j
u − ui = u − 3ui + 3ui+1 − ui+2 + ui−2 − 3ui−1 + 3ui − ui+1
∆t i 2(∆x)3 i−1
 
3  j 2  j+1 2
− ui − ui
2∆x
 
1 j+1

j+1 j+1 j+1

j

j j j
− u u + ui+1 + ui+2 − ui−1 ui + ui−1 + ui−2
2∆x i+1 i

5. M. Kruskal
Kruskal suggested the following scheme for ut + uxxx = 0:
j+1 j+1 j+1 j+1
uj+1
i − uji ui−1 − 3ui + 3ui+1 − ui+2
=
∆t 2(∆x)3
uji−2 − 3uji−1 + 3uji − uji+1
+
2(∆x)3

10
and used the following to solve KdV:
uj+1 − uji
 
i 3θ j+1 2
 j+1 2
 j
2 j
2
+ ui+1 − ui−1 + ui+1 − ui−1
∆t 4∆x
 
1 − θ j+1 j+1 j+1
 j j j

+ u ui+1 − ui−1 − ui ui+1 − ui−1
2∆x i
uj+1 j+1
i−1 − 3ui + 3uj+1 j+1
i+1 − ui+2
+
2(∆x)3
uj − 3uji−1 + 3uji − uji+1
+ i−2 = 0.
2(∆x)3

2
Experimentally, it was found that θ = 3
gave the best results.
6. Split Step Fourier Method by Tappert
Note that for this and the next method we will use the superscript m to represent the
time variable and the subscript n for the space variable, since both use the imaginary
number, i. A Fourier Method for KdV can be derived by making the change of variables
Y = (x + p) πp , where p is half the length of the interval of calculation, to normalize the
spatial period to [0, 2π] and discretizing by N points so that ∆Y = 2π N
. KdV, then,
becomes:

π π3
ut + 6 uuY + 3 uY Y Y = 0.
p p
The transformation into discrete Fourier space for k = − N2 , . . . , N2 − 1 is given by:

N −1
1 X
û(k, t) = F u = √ u(j∆Y, t)e−2πijk/N
N j=1

and the inverse transform for k = − N2 , . . . , N2 − 1:

1 X
u(j∆Y, t) = F −1 û = √ û(k, t)e2πijk/N .
N k
The Slip Step Fourier Method uses a finite difference method to solve ut + 6 πp uuY
followed by the fast Fourier transform to advance the solution using the linear term
uY Y Y , described below.

"
∆t π 2 2 2 2
ũm+1
n = um
n − um+1
n+1 − 8 um+1
n−1 − um+1
n+2 + um+1
n−2
8∆Y p
#
2 2 2 2
+ 8 um
n+1 − 8 um
n−1 − um
n+2 + um
n−2

11
followed by the fast Fourier transform:
3 π 3 /p3 ∆t
u(Yj , t + ∆t) = F −1 eik

F ũ(Yj , t) .

7. A Pseduospectral Method by Fornberg and Whitham


This pseudospectral method uses the fast Fourier transform is used to evaluate uY =
F −1 (ikF u) and uY Y Y = F −1 (−ik 3 F u). Combined with a leap-frog scheme, the ap-
proximation becomes:

u(Y, t + ∆t) − u(Y, t − ∆t)


6π π3
+ 2i ∆tu(Y, t)F −1 (kF u) − 2i∆t 3 F −1 (k 3 F u) = 0.
p p
  3 3
 
Fornberg and Whitham substituted −2iF −1 sin ∆t πp3k F u for the last term above.

∆x ∆t L∞ time
Zabusky 0.1739 0.002 0.00469 28
Hopscotch 0.2 0.003 0.00472 23
Goda 0.1 0.002 0.00492 244
Proposed 0.16 0.125 0.00173 7
Kruskal 0.08 0.04 0.00453 24
Tappert 0.3125 0.004 0.00494 63
Fornberg 0.625 0.0096 0.00113 12

Figure 3: Computation time in seconds for tolerance < 0.005 after T = 1.0 and wave
amplitude 1

∆x ∆t L∞ time
Zabusky 0.08 0.0019 0.00930 591
Hopscotch 0.1 0.0005 0.00994 272
Goda 0.04 0.00025 0.001282 3568
Proposed 0.1 0.1 0.00332 23
Kruskal 0.04 0.011 0.00952 163
Tappert 0.156 0.002 0.00943 271
Fornberg 0.3125 0.0042 0.00474 40

Figure 4: Computation time in seconds for tolerance < 0.01 after T = 1.0 and wave amplitude
2

Various tests were performed to analyze the goodness of the methods described above.
A selection of them are included here in Figures 3, 4, and 5.

12
∆x ∆t L∞ time
Zabusky 0.12 0.00066 0.00165 349
Hopscotch 0.13 0.001 0.00142 283
Goda 0.1 0.0005 0.00165 2764
Proposed 0.1 0.14 0.00148 19
Kruskal 0.08 0.015 0.00145 106
Tappert 0.15625 0.005 0.00186 322
Fornberg 0.625 0.0148 0.00107 24

Figure 5: Computation time in seconds for tolerance < 0.002 after T = 3.0 with two inter-
acting solitons as initial conditions with amplitudes 0.5 and 1

3.2 The block inverse algorithm


This section describes an algorithm that will allow us to solve the finite difference problem
for the KdV equation in parallel.
Let A be an banded n × n matrix with bandwidth 2w − 1. A can be factored and written
as A = LU where L is a lower triangular matrix and U is an upper triangular matrix, both
with bandwidth w. We will solve the system Ax = f in two steps by writing LU x = f . First
we will solve Ly = f for y, then U x = y for x, using the block inverse algorithm for both.
The steps are described in detail below.
Let L be a lower triangular matrix with bandwidth w. Pick integers k and s so that
n = ks and w ≤ s. We can write L as a block matrix with k s × s blocks on the diagonal
and k − 1 below the diagonal, shown below.

L1

R1 L2
L= = =
.. ..
. .

Rk−1 Lk s

Assume that each Li is invertible and let D be the n × n block diagonal matrix where

Di = L−1
i , i = 1, . . . , k.

Now, instead of the original system, we will solve DLx = Df . To do so, we multiply through
by D and write f as a block vector, shown below.

13
I D1 f1 b1

G1 I D2 f2 b2
DL = Df = =
... ... ... .. ..
. .

Gk−1 I Dk fk bk

Above, I is the s × s identity matrix and


Li+1 Gi = Ri , i = 1, . . . , k − 1 (9)
Li bi = fi , i = 1, . . . , k. (10)
Since Li is lower triangular for all i, solving (9) and (10) is a matter of forward substitution.
Note that the first s − w columns of Ri are zero for all i and that the same must be true
of Gi . We can simplify the computation of (9) by only solving for nonzero entries. Let G bi be
the rightmost w columns of Gi and R bi be the rightmost w columns of Ri . Then (9) becomes:

Li+1 G
bi = R
bi , i = 1, . . . , k − 1 (11)

We further decompose G bi by letting Mi be the top s − w rows of G


bi , and Hi the bottom
w rows of Gbi , shown below. We can write bi as ui and vi , and xi as yi and zi in a similar
fashion.

Mi s−w ui yi
Gi = bi =
G bi = xi =

Hi w vi zi
w

Now we have simple equations to the system DLx = b. The first block of x is straightforward:
y1 = u1 , z1 = v1
and the subsequent blocks are given by:
zi+1 = vi+1 − Hi zi , i = 1, . . . , k − 1 (12)
yi+1 = ui+1 − Mi zi , i = 1, . . . , k − 1. (13)
The zi ’s in equation (12) above will be computed sequentially, but equation (13) gives us an
opportunity to compute the yi ’s in parallel. Note that there is a possibility for parallelism
in (12) using the parallel prefix algorithm as suggested in [13] if w is large enough.
Conversely, let U be an upper triangular matrix with bandwidth w. Let k and s be as
before, so that n = ks and w ≤ s and write U as a block matrix shown below.

14
w

U1 R1

U2 ..
.
U= = =
... R
k−1

Uk s

Assume each Ui is invertible and D is the block diagonal matrix where

Di = Ui−1 , i = 1, . . . , k.

Now, instead of U x = f , we will solve DU x = Df following the same steps as before, shown
below.

I G1 D1 f1 b1
... D2 f2
I b2
DU = Df = =
.. .. .. ..
. Gk−1 . . .

I Dk fk bk

Again, I is the identity matrix and

Ui Gi = Ri , i = 1, . . . , k − 1 (14)
Ui bi = fi , i = 1, . . . , k. (15)

Here, since Ui is upper triangular for all i, we will use backwards substitution to solve 14
and 15.
Note that now the last s − w columns of Ri are zero for all i and that the same must be
true of Gi . We will again decompose the problem and only solve for nonzero entries of Gi
(G
bi ). Let Mi the top w rows of G bi , and Hi the bottom s − w rows of G
bi , shown below. We
can write bi as ui and vi , and xi as yi and zi similarly, shown below.

Mi w ui yi

Gi = G = bi = v xi =
s−w zi
bi
Hi i

15
Now we have simple equations to the system DU x = b backwards. The last block of x is
straightforward:
y k = uk , zk = vk
and the subsequent blocks are given by:
yi = ui − Mi yi+1 , i = 1, . . . , k − 1 (16)
zi = vi − Hi yi+1 , i = 1, . . . , k − 1. (17)
This time, the yi ’s in equation (16) above will be computed sequentially, but equation (17)
lets us compute the zi ’s in parallel.

3.3 Implementation
An implicit finite difference scheme to solve KdV from Goda proposed in [3] is given by:
   
1 j+1 j 1 j+1 j j
 j+1 j j

u − ui+1 + u u + ui+1 − ui−1 ui + ui−1
∆t i ∆x i+1 i
 
1 j+1 j+1 j+1 j+1
+ u − 2ui+1 + 2ui−1 − ui−2 = 0
2(∆x)3 i+2
which can be written as the pentadiagonal system:

1 c d uj+1
1 uj1
b 1 c d uj+1
2 uj2
a b 1 c
a b 1
=

d
c
a b 1 uj+1
n ujn

where
∆t
a=−
2(∆x)3
∆t ∆t j
ui + uji−1

b= 3

(∆x) ∆x
∆t ∆t j j

c=− + u + u
(∆x)3 ∆x i i+1

∆t
d= .
2(∆x)3

16
The algorithm described above was implemented in C++ using MPI where A was set to
be the finite difference matrix given by Goda described before. We chose Goda’s scheme
because [12] found it to have prohibitively large computation times in some cases but the
scheme has truncation error O(∆t) + O((∆x)2 ) and is unconditionally stable.
Large matrices were implemented as 3 × m arrays, where m is the number of elements
in the matrix and each row in the array contains the row, column, and value of an element
in the matrix. Parallel factorization of banded matrices has been studied in [14], [15], and
[16], but here A was factored by one processor using a modified Dolittle’s algorithm for a
pentadiagonal matrix. The problem was then solved as described above using the block
inverse algorithm with k processors. The details of the algorithm for Ly = f are presented
in Algorithm 1 below and solving U x = y follows similarly.

Algorithm 1 Block inverse algorithm to solve Lx = f


for i = 0, . . . , k − 1 do
Pi initializes fi
for t = 1, . . . , T do
P0 initializes A
P0 factors A into L and U
for i = 1, . . . , k − 1 do
P0 sends Li , Ri−1 to Pi
Pi solves for Gi−1 where Li Gi−1 = Ri−1
for i = 0, . . . , k − 1 do
Pi solves for bi where Li bi = fi
P0 computes for z0 where z0 = v0
P0 sends z0 to P1
for i = 1, . . . , k − 2 do
Pi receives zi−1 from Pi−1
Pi computes zi where zi = vi − Hi zi−1
Pi sends zi to Pi+1
Pk−1 receives zk−2 from Pk−2
Pk−1 computes zk−1
for i = 0, . . . , k − 1 do
Pi computes yi where y1 = ui − Mi zi−1
Pi resets fi for next timestep

3.4 Performance Analysis


3.4.1 Computational model
We can count the amount of storage used and the number of operations done in our program
to develop a computational model of the algorithm. Let n be the number of grid points
used in the finite difference scheme so that A is an n × n matrix. We run our program on k
processors so that the block matrix size is s = n/k. The block inverse algorithm uses 10k + 8

17
Figure 6: Total storage used for n = 100, 000.

integers and (29s + 3 + n)k + 33n − 36 + (k − 1)s doubles so that the total number of bytes
used is:
4[10k + 8] + 8[(29s + 3 + n)k + 33n − 36 + (k − 1)s].
For a fixed sized problem, then, the amount of storage scales linearly with k, shown in
Figure 6. Let T1 , T2 , T3 , T4 , T5 , Tsend , Trec be the amount of time it takes to perform
an assignment, multiplication, division, addition, subtraction, MPI send, and MPI receive,
respectively. We break each timestep into a series of “tasks” that we time separately and the
amount of computation for each is given below. P0 and Pk−1 are handled separately from
the other processors since they behave differently.

P0 Pi Pk−1

(46n + 18sk − 47)T1


+[43n + 12s(k − 1) − 55]T2
+(8n − 11)T3
(18s)Trec (18s − 9)Trec
+(46n + 18sk − 47)T4
+(46n + 18sk − 47)T5
+(46n + 18sk − 47)Tsend

18
(18s)T1 (18s)T1
+(6s)T2 +(6s)T2
+(3s)T3 +(3s)T3
+(3s)T4 +(3s)T4
+(66s)T5 +(66s)T5

(6s)T1 (6s)T1 (6s)T1


+(2s)T2 +(2s)T2 +(2s)T2
+(s)T3 +(s)T3 +(s)T3
+(s)T4 +(s)T4 +(s)T4
+(2s + 6)T5 +(2s + 6)T5 +(2s + 6s)T5

(15i + 3)T1 [15(k − 1) + 3]T1


+(9i)T2 +[9(k − 1)]T2
(3)T1 +(9i)T4 +[9(k − 1)]T4
+(3)Tsend +(3i)T5 +[3(k − 1)]T5
+(3i)Trec +[3(k − 1)]Trec
+[3(i + 1)]Tsend +[3(k − 1)]Tsend

(5s − 15)T1 (5s − 15)T1


+(3s − 9)T2 +(3s − 9)T2
(s − 3)T1
+(3s − 9)T4 +(3s − 9)T4
+(s − 3)T5 +(s − 3)T5

(2s)T1 (2s)T1 (2s)T1


+(3)T4 +(3)T4 +(3)T4
+(3)T5 +(3)T5 +(3)T5

19
(15s − 6)T1 (15s − 6)T1
+(6s − 9)T2 +(6s − 9)T2
+(3s)T3 +(3s)T3
+(3)T4 +(3)T4
+(9s − 18)T5 +(9s − 18)T5

(6s)T1 (6s)T1 (6s)T1


+(2s)T2 +(2s)T2 +(2s)T2
+(s)T3 +(s)T3 +(s)T3
+(s)T4 +(s)T4 +(s)T4
+(3s − 3)T5 +(3s − 3)T5 +(3s − 3)T5

[15(k − 1) + 3]T1 [15(k − i − 1) + 3]T1


+[9(k − 1)]T2 +[9(k − i − 1)]T2
+[9(k − 1)]T4 +[9(k − i − 1)]T4 (3)T1
+[3(k − 1)]T5 +[3(k − i − 1)]T5 +(3)Tsend
+[3(k − 1)]Trec +[3(k − i − 1)]Trec
+[3(k − 1)]Tsend +[3(k − i)]Tsend

(5s − 15)T1 (5s − 15)T1


+(3s − 9)T2 +(3s − 9)T2
(s − 3)T1
+(3s − 9)T4 +(3s − 9)T4
+(s − 3)T5 +(s − 3)T5

(3s)T1
(2s)T1 (2s)T1
+(k − 1)T2
+(s − 3)T4 +(s − 3)T4
+(s − 3)T4
+(s)Tsend +(s)Tsend
+[s(k − 1)]Trec

20
Figure 7: Total computation time on k = 4, 8, 10, 16 processors.

3.4.2 Discussion of results


The program was tested with the initial soliton u(x, 0) = 21 sech2 12 x − 10 centered at


x = 20 on the domain [0, 40). The problem size was fixed with n = 100, 000 grid points
so that ∆x = 0.0004 and was run for ten time steps with ∆t = 0.1. It was tested on
k = 4, 8, 10, and 16 processors so that each processor solved an s × s system where s =
25, 000, 12, 500, 10, 000, and 6, 250, respectively. We can see in figure 7 that for this problem,
increasing processor count and using the block inverse algorithm is not a particularly effective
way to speed up the computation. We hypothesize why in the following discussion.
Figure 8 shows in detail the block inverse algorithm on four and eight processors during
one timestep. The computation has been broken down into subtasks, each denoted by a

Figure 8: Computation time for one timestep on four processors (left) and eight (right).

21
Figure 9: Total computation time on k = 4, 8, 10, 16 processors.

different color. For each task, the size of the bar is the length of computation. A description
of the computation is given below.

1. During the “setup” of the problem (dark blue in Figure 8), P0 factors A and collects
and distributes blocks to P1 , . . . , Pk−1 .

2. All processors except for P0 solve for Gi using equation (11). P0 starts the next step.

3. All processors solve for bi using equation (10).

4. Processor Pi waits for zi−1 from its neighbor Pi−1 , computes zi by (12) and sends it to
Pi+1 . P0 immediately computes zi and sends it to P1 .

5. All processors solve for yi using (13).

6. The “setup” task resets the array used to store fi so that code can be repeated.

7. P0 , . . . , Pk−2 solve for Gi using (14). Pk−1 starts the next step.

8. All processors solve for bi using (15).

9. Processor Pi waits for zi+1 from Pi+1 , computes zi using (17) and sends it to Pi−1 . Pk−1
immediately computes zi and sends it to Pk−2 .

10. All processors solve for yi by equation (16).

Figure 8 shows that most of the computation time is used factoring and distributing
pieces of the matrix A. After each timestep, Goda’s scheme updates the difference matrix
A, so factoring at each timestep is unavoidable. This computation cannot be reduced by

22
Figure 10: Forward block inverse algorithm on k = 4, 8, 10, 16 processors.

Figure 11: Computation time for problem size n = 4, 000 (left) and n = 100, 000 (right).

increasing the number of processors, but instead increases the length of this task by adding
additional sends, shown in Figure 9.
If we restrict ourselves to solving Ly = f , the forward substitution part of our program
and steps 2 through 5 above, we see that the block inverse algorithm works well in practice.
Figure 10 shows clearly that speedup is gained by adding processors for a fixed problem size.
It is also worth mentioning that the size of the problem influences the effectiveness of
the algorithm. We ran the program again on a smaller grid size n = 4, 000 and found that
the algorithm performs better on larger problems. Figure 11 shows the computation time
for the forward part of the algorithm for both problem sizes. Each task can be done in less
time using more processors (due to the fact that the block matrix size s is smaller), except
for the ones involving communication. By increasing the problem size, large communication
costs using more processors can be mitigated by a larger computation.
Finally, Figure 12 shows the computation time over all ten timesteps on P1 and P2 when
k = 4. Here we can see that not much variance occurs between processors or over timesteps.

23
Figure 12: Computation time over 10 timesteps.

4 Conclusion
In this paper, an attempt was made at parallelizing a finite difference scheme for the KdV
equation. The block inverse algorithm that was used does not appear to be a good candidate
for this problem. While we have not accomplished our original goal, this study has exposed
some of the difficulties with solving nonlinear PDEs in parallel that should be investigated
further. We have shown, however, that the algorithm would work well with a linear PDE
such as the linearlized KdV.

24
References
[1] Korteweg, Diederik Johannes, and Gustav De Vries. “Xli. on the change of form of long
waves advancing in a rectangular canal, and on a new type of long stationary waves.” The
London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 39.240
(1895): 422-443.

[2] Zabusky, Norman J., and Martin D. Kruskal. “Interaction of solitons in a collisionless
plasma and the recurrence of initial states.” Phys. Rev. Lett 15.6 (1965): 240-243.

[3] Goda, Katuhiko. “On stability of some finite difference schemes for the Korteweg-de Vries
equation.” Journal of the Physical Society of Japan 39.1 (1975): 229-236.

[4] Abe, Kanji, and Osamu Inoue. “Fourier expansion solution of the Korteweg-de Vries
equation.” Journal of Computational Physics 34.2 (1980): 202-210.

[5] Fornberg, Bengt, and G. B. Whitham. “A numerical and theoretical study of certain
nonlinear wave phenomena.” Philosophical Transactions of the Royal Society of London
A: Mathematical, Physical and Engineering Sciences 289.1361 (1978): 373-404.

[6] Arnold, Douglas N., and Ragnar Winther. “A superconvergent finite element method for
the Korteweg-de Vries equation.” Mathematics of Computation 38.157 (1982): 23-36.

[7] Bona, Jerry L., Vassilios A. Dougalis, and Ohannes A. Karakashian. “Fully discrete
Galerkin methods for the Korteweg-de Vries equation.” Computers & Mathematics with
Applications 12.7 (1986): 859-884.

[8] Winther, Ragnar. “A conservative finite element method for the Korteweg-de Vries equa-
tion.” Mathematics of Computation (1980): 23-43.

[9] Qu, Fu-li, and Wen-qia Wang. “Alternating segment explicit-implicit scheme for nonlinear
third-order KdV equation.” Applied Mathematics and Mechanics 28 (2007): 973-980.

[10] Guo, Jinhua, and Thiab R. Taha. “Parallel implementation of the split-step and the
pseudospectral methods for solving higher KdV equation.” Mathematics and Computers
in Simulation 62.1 (2003): 41-51.

[11] Schendel, Udo. Introduction to numerical methods for parallel computers. Prentice Hall
PTR, 1984.

[12] Taha, Thiab R., and Mark J. Ablowitz. “Analytical and numerical aspects of certain
nonelinear evolution equations III. Numerical, Korteweg-de Vries equation.” Journal of
Computational Phycis 55.2 (1984): 231-253.

[13] Blelloch, Guy E. “Prefix sums and their applications.” (1990).

[14] Quintana-Ort, Gregorio, et al. “SuperMatrix for the Factorization of Band Matrices
FLAME Working Note 27.” (2007).

25
[15] Gupta, Anshul, et al. “The design, implementation, and evaluation of a symmetric
banded linear solver for distributed-memory parallel computers.” ACM Transactions on
Mathematical Software (TOMS) 24.1 (1998): 74-101.

[16] Hajj, Ibrahim N., and Stig Skelboe. “A multilevel parallel solver for block tridiagonal
and banded linear systems.” Parallel Computing 15.1 (1990): 21-45.

[17] Hammack, Joseph L., and Harvey Segur. “The Korteweg-de Vries equation and water
waves. Part 2. Comparison with experiments.” Journal of Fluid mechanics 65.02 (1974):
289-314.

[18] Hammack, Joseph L., and Harvey Segur. “The Korteweg-de Vries equation and water
waves. Part 3. Oscillatory waves.” Journal of Fluid Mechanics 84.02 (1978): 337-358.

[19] Bona, J., et al. “Conservative, discontinuous Galerkin-methods for the generalized Ko-
rtewegde Vries equation.” Mathematics of Computation 82.283 (2013): 1401-1432.

[20] Bona, J. L., et al. “Conservative, high-order numerical schemes for the generalized
Korteweg-de Vries equation.” Philosophical Transactions of the Royal Society of London.
Series A: Physical and Engineering Sciences 351.1695 (1995): 107-164.

[21] Dutykh, Denys, Marx Chhay, and Francesco Fedele. “Geometric numerical schemes for
the KdV equation.” Computational Mathematics and Mathematical Physics 53.2 (2013):
221-236.

[22] Craig, Walter, and Catherine Sulem. “Numerical simulation of gravity waves.” Journal
of Computational Physics 108.1 (1993): 73-83.

[23] Kolebaje, Olusola Tosin, and Oluwole Emmanuel Oyewande. “Numerical solution of the
Korteweg-de Vries equation by finite difference and adomian decomposition method.”
International Journal of Basic and Applied Sciences 1.3 (2012): 321-335.

[24] Scott, L. Ridgway, Terry Clark, and Babak Bagheri. Scientific parallel computing. Vol.
146. Princeton: Princeton University Press, 2005.

[25] Scott, L. Ridgway. “Tsunami Simulation.” (2013).

26

You might also like