Seismic Source Inversion Using Discontinuous Galerkin Methods A Bayesian Approach

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Seismic Source Inversion Using

Discontinuous Galerkin Methods


A Bayesian Approach
Huan Yu
Juan Pablo Madrigal Cianci
December 12, 2016

Abstract
Source inversion is a set of techniques that geophysicist and mathe-
matician use in order to recover information about an earthquake based
on some measures. We are interested in recovering three particular pa-
rameters of our earthquake: the location, the frequency of the wave and
its amplitude. In this project we record data from a one dimensional
wave and compute the cost functional, which is a discretization of the
sum of the L2 errors between the recorded and a simulated data, which
depends on our set of parameters . The forward wave equation is solved
using discontinuous Galerkin (dG) methods for waves with a point source.
A Bayesian inversion is performed in order to account for the errors in the
measure. Different MCMC samplers are utilized throughout the project
with very satisfactory results.

1 Introduction
Seismic inversion involves the set of methods which seismologists use to infer
properties of an earthquake through physical measurements. In practice, geol-
ogist use seismographs to record physical information about the earthquakee.
This data are usually the time series of ground displacements, velocities and
accelerations, recorded by an array of receivers (seismographs) on the surface of
the ground and in observation wells. The idea behind this problem is to obtain
a set of parameters such that, when inputed into a forward model ( a solver for
the wave equation), we obtain very similar results to those meassured. More
formally, we are trying to find a set of parameters such that, given our mea-
sured data at different positions and different times , dr (tm ) and our forward
model uh (xr , tm ), this set of parameters minimizes the following cost functional
Nr X
X Tm
c() = |dr (tm ) uh (xr , tm ; )|2 , (1)
r=1 m=0

1
where Nr and Tm are the total number of receiver and final time of the measure
respectively. Note that this cost functional is the discrete version of the sum of
the L2 norm of the error of the measures. A schematic respresentation is show
in figure 1.

Figure 1: Schematic representation for a two dimensional seismic inversion prob-


lem. Note that the rectangle denotes our computational domain.

There are two approaches to find , a deterministic approach and a Bayesian


approach. The deterministic approach uses minimization algorithms (steepest
descent, conjugate gradient, BFGS,. . . ). This approach has the disadvantage
that it ignores the randomness in the measurements, which is something that
does not happen for real world problems: any measuring device, even the most
advanced ones, will have some measuring error associated to it. Other disad-
vantages of using the traditional optimization algorithms is the fact that they
are very sensitive to the choice of initial point, and they might get stuck at a
local minimum instead of a global minimum. On the contrary, the Bayesian
approach does take this uncertainty into account, however, it might require a
large number of iterations, which, depending on the problem and the type of
solver being used, could become really expensive to compute.

Similar work to this project has been done by Long, Motammed, and Tem-
pone (2015), however, they use a finite difference scheme for the forward model.
We use a discontinuous Galerkin (dG) scheme in order to solve for the forward
model. Discontinuous Galerkin are energy-based PDE solvers first introduced
by the late seventies in order to solve a Neutron transport equation. The bor-
row features from finite elements and finite volumes, and the main idea behind
them is to divide the computational domain into k non-overlapping cells and
approximate the solution to the PDE by a polynomial of degree N on each of

2
these cells. Application of dG method to point source wave equations has been
done by Field, Hesthaven, and Lau (2009). An excellent reference for these type
of methods in general is given by Hesthaven and Warbuton (2009).

2 Problem Setup
Given the difficulty of finding real world data, we resorted to simulate our
earthquake by solving a one dimensional acoustic wave equation with speed one
and reflective boundary conditions. The problem is given by
utt = uxx + 0 (x x0 )s(t; w, h), (2)
t [0, 1], x [5, 5]
2
)(t1)2
s(t; w, h) = 2(h2 )(t 1)e(w (3)
Where we chose our parameters to be x0 = 0, w = 6, h = 6. This parameters
correspond to the frequency (width), the height, and the location of the source.
After one second, the wave look like the following:

Figure 2: We can see what the one dimensional seismic wave looks like after one
second. Notice the symmetry with respect to the origin of the source (x = 0.)

Note that this problem can be solved analytically. The analytical solution
of this equation using our chosen parameters is given by
2 2
u(x, t) = 36(x t 2)e36(xt2) + 36(x + t + 2)e36(x+t+2) , (4)
Thus, we have a closed form of our recorded data and as such, we can just
generate it as we please. Moreover, we know what our parameters are and as

3
such, we can use them to verify the validity of the results.

Experimentally, we consider N recorders on our computational domain D =


[5, 5]. Each recorder will read the displacement at that point for different
periods of time between 0 and 1 seconds. As mentioned before, it is a key as-
sumption that our measuring devices have some error (white noise) associated
with them. We assumed that each measurement has an error that follows a
N (0, 0.2) Distribution. After simulating our (noisy) data, we proceeded to im-
plement a FORTRAN dG solving routine into R (see appendix A) in order to
compute the cost functional (equation (1)). A derivation for the dG method
for a wave equation with a point source is done by Field, Hesthaven and Lau
(2009). Having this,we then proceeded to perform the Bayesian inversion us-
ing different choices of priors, number of iterations, bur in periods etc. Dur-
ing the experiments, 4 sampling algorithms were used, Metropolis-Hastings and
Adaptive Metropolis-Hastings (coded by ourselves) and Hit-and-Run Metropolis
(HARM) and Adaptive-HARM, obtained from the Bayesian inference package
LaplacesDemon (see appendix B).

3 Numerical Aspects
3.1 Constructing the Posterior
The main idea behind Bayesian inversion is to use Bayes theorem in order to
obtain a probability density for the parameters given the data, y. Formally,
Bayes theorem states that

P (|y) P (y|)p(), (5)

where P () is the probability density function. P (y|) is called the likelihood and
p() is called the prior. The idea is then to propose a likelihood function and
a posterior distribution. Note that this posterior distribution captures some in-
formation of what the vector of parameters might be, based on previous/expert
information. Following an approach similar to Long, Motammed and Tempone
(2015), we use equation (1) in order to construct our proposal distribution:
t 1
" NR N
#
1X X |u(tm , xr ; ) dr (tm )|2
p(|dr (tm )) exp p(), (6)
2 r=1 m=0 2

where is the standard deviation of our parameters. Note then that the like-
lihood becomes a normal distribution centered at the cost functional c() with
standard deviation . Based on this, we cal also define the log-cost functional,
L():
NR N t 1
1X X |u(tm , xr ; ) dr (tm )|2
L() := log(p(|dr (tm ))) = log(p()) + c,
2 r=1 m=0 2
(7)

4
where c is a constant. It is evident then that maximizing the likelihood is an
equivalent problem to minimizing our log-cost functional. Thus, we can obtain
a probability density function for the values of our parameters.

4 Results
Since we generated recorded data and applied some random structure to account
for the uncertainty of the measuring devices, we know the true values of our
parameters. Then we want to use four different MCMC algorithms which include
Metropolis Hasting, Adaptive Metropolis Hastings, Hit and Run Metropolis
Hastings and Adaptive Hit and Run Metropolis Hastings to study our posteriors
and compares the results with true values. An excellent source to learn more
about this methods is given by Geyer (2005), Chen and Schmeister (1992) and
Rudolf and Ullrich (2015.). As mentioned before, we have four parameters
which consist of location x0 , width w, height h and standard deviation for the
random structure. Based on our prior knowledge, we tried uniform distribution,
Gaussian distribution, and inverse gamma distribution with different means and
standard deviations for the priors. Finally we decide to use following priors
which give us relatively good results to compare different experiments.

x0 N (1, 1); w N (3, 5); h N (7, 5); N (0.5, 1) (8)

Note that the code is made such that the algorithm yields only acceptable
values of x0 , for it would not make sense for the earthquake to start outside the
computational domain.

5
Posterior of x0 Posterior of w Posterior of he Posterior of Sd

4000
4000

3000

5000
Frequency

Frequency

Frequency

Frequency
2000
2000

2000
0 1000
0

0
1.0 0.0 1.0 5.5 6.0 6.5 5.2 5.8 6.4 0.18 0.20 0.22

True value = red line True value = red line True value = red line True value = red line

Chain values of x0 Chain values of w Chain values of he Posterior of Sd

6.6
1.0

6.6

0.22
6.2
6.2
0.0

0.20
5.8
5.8

5.4
1.0

5.4

0.18
0 20000 0 20000 0 20000 0 20000

True value = red line True value = red line True value = red line True value = red line

ACF for x0 ACF for w ACF for h ACF for Sd


0.8

0.8
0.8

0.8
0.4
ACF

ACF

ACF

ACF
0.4
0.4

0.4
0.4 0.0

0.0
0.0

0.0

0 20000 0 20000 0 20000 0 20000

Lag Lag Lag Lag

Figure 3: Posterior plots based on Metropolis Hastings

To begin with, we construct a Gaussian proposal distribution and generate


50000 iterations using a burn-in of 500 in order to reduce the effect the starting
point. From Figure 3, we note that posterior distributions for four parameter
present an approximately normal shape. Yet posterior of parameter location x0
is more disperse. Trace plots of width w, height h and standard deviation of
random structure show that the mixing is sound, but trace plot of parameter
location indicates a cyclic pattern which means the convergence is not good.
ACF plots for w, h and suggest very week autocorrelation which is good
for the model. However, ACF for x0 shows a periodical decay as lag increases
which means strong correlations between those iterations. From above analysis,
we can see that Metropolis Hastings method has not given us an ideal result.
Then we repeat the experiment with the same priors as before, we generate
50000 iterations for the adaptive part with 5000 burn-in and 100 thinning. We
construct a gaussian proposal distribution with a covariance matrix which is
based on 5000 iterations training set. This is our adaptive metropolis Hastings.
From Figure 4, we can tell that the histograms of all parameters behave good
normal shapes and the means of posteriors are very close to true values which
are presented in red lines in graph. And the mixing for all trace plots is also

6
considered to be good. Yet we need to mention that for parameter trace
plot there are few iterations with high standard deviation values. ACF plots
still indicate very week autocorrelation between iterations for all parameters.
Comparing above analysis with the results from Metropolis Hasting, what it
boils down to that the adaptive Metropolis Hasting gives us a better prediction
for parameters.

Posterior of x0 Posterior of w Posterior of he Posterior of Sd


80000

40000

40000

50000
Frequency

Frequency

Frequency

Frequency
40000

20000

20000

20000
0

0
2 0 1 2 5.5 6.0 6.5 7.0 5.5 6.0 6.5 0.18 0.20 0.22

True value = red line True value = red line True value = red line True value = red line

Chain values of x0 Chain values of w Chain values of he Posterior of Sd


2

6.5
6.5
1

0.21
6.0
0

6.0
1

0.19
5.5
5.5
2

0e+00 3e+05 0e+00 3e+05 0e+00 3e+05 0e+00 3e+05

True value = red line True value = red line True value = red line True value = red line

ACF for x0 ACF for w ACF for h ACF for Sd


0.8
0.8
0.8

0.8
ACF

ACF

ACF

ACF

0.4
0.4
0.4

0.4

0.0
0.0
0.0

0.0

0 20000 50000 0 20000 50000 0 20000 50000 0 20000 50000

Lag Lag Lag Lag

Figure 4: Posterior plots based on Adaptive Metropolis Hastings

Now We ran other two models using the R package LaplacesDevil using
the same priors (experimentally, these priors gave the best results). For Hit
and Run Metropolis Hastings, we generate 10000 iterations to take a look at
the posterior distributions of parameter x0 , w and h. From Figure 5, we note
that the mixing for x0 is not good. Histogram of x0 is also not very normal.
The mixing for other two parameters are relatively better and histograms show
normal shapes. But ACF plots for all parameters show highly auto correlation.
Therefore, we can conclude that this method do not give us a better result
compared to Metropolis Hasting.

7
Figure 5: Posterior plots based on Hit and Run Metropolis Hastings

We consider adaptive Hit and Run Metropolis Hasting method. We first gen-
erate 50000 iterations using same priors as before. A covariance matrix proposal
is created with two step training data set 1000 and then 10000. From Figure 6,
we notice that the mixing is good. The posteriors are normal distributed. Yet
there are very strong correlations between iterations.

8
x0 x0 x0

0.3

0.0 0.6
1
2

0.0
0 20000 40000 3 1 1 2 3 0 10 20 30 40

w w w
9

0.6

0.0 0.6
6

0.0
3

0 20000 40000 3 4 5 6 7 8 9 0 10 20 30 40

h h h
0.6

0.0 0.6
6

0.0
3

0 20000 40000 3 4 5 6 7 8 0 10 20 30 40

Figure 6: Posterior plots based on Adaptive Hit and Run Metropolis Hastings

Finally and based on previous experiments we chose the set of priors:

x0 N (0, 0.5); h N (5, 5); w N (7, 5), (2, 2). (9)

The obtained results with 500,000 iterations of MCMC, a burn in period of 5000
and no thinning are given as follows:
A table that summarizes the results by giving the means is presented below.
In all cases, the true values were included in a 90% confidence interval centered
around the average (not shown), which is satisfactory. From table 1, we can see
that for each parameter the posterior means are closet to the true means using
adaptive Metropolis Hasting, regardless of the choice of the prior. However,
using a Gamma distribution for and a N (0, 0.5) as the prior for the location
of the source, provides much better results. This choice of prior was made by a
combination of what was suggested in class and by observing the results given
in Figure 4. Based on graph analysis and posterior means, we conclude that the
adaptive metropolis hasting gives us the best estimate for parameters among
those four methods.

9
Figure 7: As we can see, we can benefit from previous experiments in the choice
of our priors.

10
Param MH AMH AMH (As in Figure 7) HARM AHARM True Value
X0 0.0450 -0.0002 -0.0070799 0.0052 0.0041 0.0000
w 5.9810 6.0010 5.9999 5.9023 5.9528 6.0000
h 5.9760 5.9890 5.9929 5.8342 5.8601 6.0000
Sd 0.2038 0.2011 1.9989 0.2034 0.1972 0.2000

Table 1: Summary statistic for posteriors for different MCMC methods

5 Discussion
5.1 Conclusions
We can see from the results that the Bayesin inversion worked for this problem.A
Bayesian approach provides accurate results and involves use of prior/expert
information, as well as capturing the randomness of the model, something not
possible with a deterministic approach. We consider that the experimental set
up that gave the best results was the adaptive Metropolis Hastings (coded by
ourselves), however, running it takes a lot of time (around 10 hours on an Intel
i7 3.5 GHz). As mentioned before, this is one of the downfalls of this method.
Alternatively, we could have made the code run faster by reducing the order
of the polynomial or the number of elements in the dG method, however, we
wouldve had to sacrifice some accuracy. The code was run using 21 elements
(dividing our interval into 21 equally spaced subintervals) using a polynomial
approximation of degree 21.

11
2
10
nx=11
nx=21
0 nx=31
10

10 -2

10 -4
Error

-6
10

-8
10

10 -10

-12
10

10 -14
5 10 15 20 25 30 35 40
Polynomial Degree

Figure 8: Error for different number of elements and different polynomial degrees

Conversely, if we wanted a more accurate result then we would have had to


increase the computational time.

Additionally, we propose choosing the initial point as well as the priors based
on first running a deterministic solver for a small number of iterations.

5.2 Future Work


In the future we would like to include more complex forms for the forward
model, like the elastic wave equation. Moreover, we would also like to implement
the wave equation in two and three dimensions, as well as try the Bayesian
inversion with more parameters. This will lead to a bigger computational cost,
so implementing the dG code in parallel as well as improving the numerical
efficacy of the method would be of utmost importance.

12
References
Long, Quan, Mohammad Motamed, and Ral Tempone. Fast Bayesian op-
timal experimental design for seismic source inversion. Computer Methods
in Applied Mechanics and Engineering 291 (2015): 123-145.
Villagran, Alejandro, et al. Computational methods for parameter esti-
mation in climate models. Bayesian Analysis 3.4 (2008): 823-850.

Hesthaven, Jan S., and Tim Warburton. Nodal discontinuous Galerkin


methods: algorithms, analysis, and applications. Springer Science & Busi-
ness Media, 2007.
Field, Scott E., Jan S. Hesthaven, and Stephen R. Lau.Discontinuous
Galerkin method for computing gravitational waveforms from extreme
mass ratio binaries. Classical and Quantum Gravity 26.16 (2009): 165010.
Rudolf, D. and Ullrich, M. Comparison of hit-and-run, slice sampling and
random walk Metropolis. 2015. ArXiv.
Geyer, C. Markov Chain Monte Carlo Lecture Notes, 2005. University of
Minnesota.
Chen MH. Schmeiser, B. Performance of the Gibbs, Hit-and-Run and
Metropolis Samplers. Purdue University Technical Report # 92-18. Pur-
due University. 1992.
Hall, B. Hall, M, Satisticat, LLC. LaplacesDemon Package Documenta-
tion.. CRAN. 2016.

13
A Calling Fortran subroutines from R.
Although R is a very powerful statistical language, it is very useful to be able to
implement subroutines from other languages, like FORTRAN or C++. In our
case, we had a wave equation solver written in FORTRAN, and translating the
code into R would not have been feasible both for the time constraint associated
to this project, as well the fact that the computations would have taken far more
time if implemented in R instead of Fortran. Thus, we created a function wrap.

The main idea is to create an *.so file in Unix (or *.DLL for Windows) such
that we can call our subroutine from R. Note that there are various tutorials
on-line, however, we were not able to find any that included both the use of
different subroutines, as well as packages like BLASS or LAPACK. Lets move all
our FORTRAN files into our working directory.

We start by creating a MAKEVAR file with no extenssion and the following lines.
This is done in order to use textttBLASS and LAPACK.
PKG LIBS=$(LAPACK LIBS) $(BLAS LIBS) $(FLIBS)

Now, we need our main FORTRAN file to be a subroutine that does not print
anything and that does not have any variable declared as output. That is, we
want the header of our FORTRAN function to be of the form:
subroutine costfun(p,q,nx,x0,w,he,l2e)

use doubleprecision
use PDEMODULE
implicit none

real (kind=dp),intent(out) :: l2e


integer, intent(in) :: p,q,nx
real (kind=dp) :: hx
real (kind=dp),intent (in) :: x0,w,he
REAL(KIND = dp), ALLOCATABLE, DIMENSION(:) :: r int,w int,x int,f int
REAL(KIND = dp), ALLOCATABLE, DIMENSION(:,:) :: P int,Pr int
REAL(KIND = dp), DIMENSION(0:p,1:nx) ::u,ue
REAL(KIND = dp), DIMENSION(0:nx) :: x
REAL(KIND = dp), DIMENSION(1:nx) :: xh
integer :: nint,nrec,nsamp,i,k
.
.
.
.
call urec(p,nx,nrec,nsamp,x0,w,he,ue)
call unum(p,q,nx,x0,w,he,nrec,nsamp,u)

Having this, we then go to R and use the command:


setwd("/Downloads/src")
system(R CMD SHLIB o mysubroutine to be imported.so aux subroutine1.
f90 aux subroutine2.f90 .... llapack lblas)
#Checks that the .so file was created

14
system("ls")
dyn.load("costfun.so")

clearly, we must make sure to check that the code compiles before doing this.
After running the code, we then proceed to cchek that our function loaded:
#Checks that the function did load
.Fortran("costfun",p=as.integer(21),q=as.integer(21),nx=as.integer(11),
x0=as.numeric(0),w=as.numeric(6),he=as.numeric(6),l2e=as.numeric(1)
)

And then we proceed to create a function using the R command .Fortran:


#Checks that the function did load
myfunction < function(x0in,win,hein) {
dyn.load("costfun.so")
retvals < .Fortran("costfun",p=as.integer(21),q=as.integer(21),nx=as.
integer(11),x0=as.numeric(x0in),w=as.numeric(win),he=as.numeric(
hein),l2e=as.numeric(1))
return(retvals$l2e)
}

Note that it is important to declare the inputs of the function with the same
type of variable as it is declared in the Fortran subroutine. Thus, we can use R
to call our Fortran subroutine by
answer<myfunction(arg1,arg2,...)

15
B About the R package LaplacesDemon
Laplaces Demmon is a Bayesian Inference R package. According to the package
documentation, , LaplacesDemon is to provide a complete and self-contained
Bayesian environment within R. For example, this package includes dozens
of MCMC algorithms, Laplace Approximation, iterative quadrature, variational
Bayes, parallelization, big data, PMC etc.
One main characteristic of this program is the fact that the posterior distribu-
tion can be programmed directly in R (a big difference with say, JAGS, that has
its own language to program the posterior). This is a natural advange, given
the fact that our posterior distribution came from the Fortran solver imple-
mented in A. The package is very useful (even though it must be programmed
in a certain way) and it is a nice source of different MCMC algorithms.

Moreover, given that the code also includes the option for high performance
computing it is nice to have it in order to implement the future work described
in 5.2.

16

You might also like