Accelerating Inversion Deep Learning
Accelerating Inversion Deep Learning
A R T I C LE I N FO A B S T R A C T
Keywords: Geostatistical seismic inversion is one of emerging technologies in reservoir characterization and reservoir un-
Seismic inversion certainty quantification. However, the challenge of intensive computation often restricts its application in
Geostatistics practical studies. To circumvent the computational limitation, in this work, we present a distributed parallel
TensorFlow approach using TensorFlow to accelerate the geostatistical seismic inversion. The approach provides a general
GPU
parallel scheme to efficiently take advantage of all the available computing resources, i.e. CPUs and GPUs: the
computational workflow is expressed and organized as a Data Flow Graph, and the graph can be divided into
several sub-graphs which are then mapped to multiple computing devices to concurrently evaluate the opera-
tions in them. The high-level interface and the feature of automatic differentiation provided by TensorFlow also
makes it much easy for users to implement their algorithms in an efficient parallel manner, and allows em-
ploying programs on any computing platform almost without alteration. The proposed method was validated on
a 3D seismic dataset consisting of 600 × 600 × 200 grid nodes. The results indicate that it is feasible to the
practical application and the computational time can be largely reduced by using multiple GPUs.
∗
Corresponding author.
E-mail address: mliu4@uwyo.edu (M. Liu).
1
Mingliang Liu developed the methodology, implemented the code, and applied it to a real dataset.
2
Dario Grana supervised the research project and contributed to the preparation of the manuscript.
https://doi.org/10.1016/j.cageo.2018.12.007
Received 29 April 2018; Received in revised form 1 October 2018; Accepted 20 December 2018
Available online 21 December 2018
0098-3004/ © 2018 Elsevier Ltd. All rights reserved.
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
practice. Therefore, it is necessary to accelerate the inversion method to parallelism); thirdly, 3D seismic measurements generally include en-
make it computationally efficient for practical applications. With the ormous volumes of data (tens or hundreds of gigabytes), which might
recent advent of high-speed multi-core CPUs and GPUs, specialized be beyond the capacity of a single computing device. In this case, we
parallel algorithms for geostatistical seismic inversion have been de- can subdivide the 3D seismic data into several small sub-volumes and
veloped for the high-performance parallel computing platforms. Vargas process them across multiple CPUs or GPUs (data parallelism). The new
et al. (2007) proposed to parallelize the sequential simulation algorithm contribution is in the parallelization of the seismic inversion problem
by subdividing the study area into small spatial domains, and then in- and the computational advantage of the proposed workflow.
dependently simulating on different processors. Since the simulation of One of the advantages of TensorFlow is its fast auto-differentiation
each node does not consider the currently simulated nodes, the result is ability that allows computing the gradients and Jacobians of any vari-
not exactly the same as the original sequential algorithm. Alternatively, ables without analytically deriving the necessary partial derivatives
Nunes and Almeida (2010) proposed a parallelization scheme by using that is usually time-consuming and error prone. Therefore, any gra-
a functional decomposition to split the simulation algorithm into single dient-based method can benefit from this feature. Combining auto-dif-
sub-tasks, and then implement parallelly using multi-threads with ferentiation with the set of first-rate optimizers implemented in
multi-cores CPUs. Tahmasebi et al. (2012) exploited the graphics TensorFlow, model parameters can be automatically and iteratively
computational units (GPU) to reduce the execution time using the updated according to the selected objective function. It is also possible
Compute Unified Device Architecture (CUDA). Huang et al. (2013) to apply other user-defined or optimizers from third-party libraries.
improved the computational performance of Direct Sampling method Overall, with the advantages of distributed computation and auto-
for multiple-point geostatistics simulations with the use of GPUs. differentiation, TensorFlow allows us to implement the geostatistical
Ferreirinha et al. (2015) introduced a generic strategy based on OpenCL seismic inversion algorithm in parallel in a simple and accessible way.
to efficiently use multiple computational devices (CPUs and GPUs) for At present, TensorFlow provides a high-level API for building and ex-
stochastic seismic inversion. Similar parallel algorithms have been ecuting a Data Flow Graph in various languages, including Python, C+
presented by Mariethoz (2010) and Peredo et al. (2014). +, Java and Go, among which the Python API is the most complete and
In this paper, we propose a distributed parallel method to accelerate easiest one to use due to the advantages of multi-platform compat-
the geostatistiscal seismic inversion using TensorFlow, an open-source ibility, readable syntax, and easy integration with other scientific
heterogeneous distributed deep learning framework developed by computing libraries implemented via C, C++ or Fortran. The backend
Google and first released to the public at the end of 2015. Although and core modules of TensorFlow are written in C/C++, therefore,
TensorFlow is designed with the hopes of speeding up deep learning by independently from the API used for Data Flow Graph in the geosta-
providing a simple-to-use and computationally efficient infrastructure, tistical seismic inversion, the code will be converted to C/C++ and
its generic architecture and extensibility make it applicable to any then compiled to machine code transparently, hence boosting the per-
numerical problems that can be expressed as a Data Flow Graph. One of formance.
the advantages of TensorFlow is that it efficiently and easily exploits
heterogeneous distributed computational devices, i.e. CPU, GPU and 2. Overview of TensorFlow
TPU (Tensor Processing Unit developed by Google to specifically speed
up tensor calculations). It provides an extremely efficient and user- TensorFlow is currently the most widely used deep learning fra-
friendly platform for compute-intensive applications. With such plat- mework. It was initially designed to simplify the construction of deep
form programs can be easily deployed to a scalable computing en- neural networks and speed up the learning process with a hetero-
vironment with a uniform Application Programming Interface (API). In geneous distributed computational environment, and then became a
TensorFlow, underlying computing devices are transparent to users, more generic library for numerical computation, making easy large-
and computational tasks are scheduled by the dedicated distributed scale numerical optimization problems, i.e. inverse problems and so-
execution engine. In contrast with MPI and CUDA, users no longer need lutions of partial differential equations. Different with the traditional
to explicitly handle data transmission, memory management and mes- numerical libraries, TensorFlow adopts Data Flow Graph, which is a
sage passing between devices: the algorithms are expressed in terms of common programming model in the fields of cloud computing and
Data Flow Graph and the underlying operations are completely implicit. machine learning, to express and organize the computation workflow,
Other frameworks, i.e. PyTorch and MXNet, could also be applied, al- and then map the mathematical operations in the graph to different
ternatively to TensorFlow. computational devices (CPUs, GPUs, and TPUs). As illustrated in Fig. 1,
Seismic inversion, geostatistical simulations and uncertainty quan- the design of TensorFlow is highly modular: users build the Data Flow
tification in reservoir modeling generally require a large computational Graph by the front end, i.e. Python, C++, Java and Go, and the core
effort due to the size of seismic datasets and the dimension of the model and distributed execution engine is responsible for dispatching different
space. The proposed reservoir characterization workflow requires the computing operations in the Data Flow Graph to the underlying com-
generation of a large number of geostatistical realizations of the re- puting devices. Corresponding to the three layers in Fig. 1, there are
servoir model and the optimization of the models to match the mea- three important roles in a TensorFlow program, client, master and
sured seismic data. The simulations are obtained using geostatistical
algorithms and the optimization is performed using gradient-based
methods but could also be obtained using stochastic perturbation al-
gorithms. The implementation using stochastic optimization methods
generally allows avoiding local minima of the objective function;
however, the proposed gradient-based inversion method using multiple
initial models allows exploring the model space with a smaller com-
putational effort thanks to the parallelization approach. Indeed, with
the advent of TensorFlow, it becomes simple and flexible to accelerate
the geostatistical seismic inversion in three parallel strategies. First,
geostatistical simulation and seismic inversion typically involve many
large-scale matrix operations that can be massively parallelized, and
thus computed on GPUs; secondly, as multiple reservoir realizations are
used in the geostatistical inversion, we can utilize multiple CPUs or
GPUs to simulate and invert the models simultaneously (model Fig. 1. Architecture of TensorFlow.
38
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
Y=W∗X+b (1) Fig. 3. Data Flow Graph of the linear regression model described as Equation
(1).
where X and Y represent the input and output respectively; W and b
represent the weights and bias that we want to estimate. The matrix 3. Geostatistical seismic inversion
multiplication in the Data Flow Graph corresponds to a single node that
connects two inputs (W, X ) and an output (the result of W*X ); simi- The estimation of elastic properties from seismic data is a common
larly, the add operation corresponds to a node with two inputs inverse problem in geophysics, which can be written in the following
(W*X , b ) and an output (Y ). To better understand, debug, and optimize general form
programs, TensorFlow also provides a suite of visualization tools,
d = f (m) + e (2)
namely TensorBoard, to visualize the Data Flow Graph and plot quan-
titative metrics about the execution of the graph. where d represents the seismic data, m represents the set of elastic
The mechanism of Data Flow Graph brings several advantages to attributes, f represents the seismic forward model mapping m into d ;
TensorFlow. First, constructing Data Flow Graph is like building blocks, and e is the measurement error. The goal of seismic inversion is to
which makes the model very flexible to be modified and extended by estimate the unknown model variables m from d assuming that the
adding or removing components in the graph. Secondly, this me- forward model f is a good approximation of the physical relations
chanism is inherently parallel and distributed, and can work well in between model properties and data.
large-scale numerical problems: it is easy for TensorFlow to identify Machine learning and inverse theory methods are tools for para-
operations that can execute in parallel, and we can also explicitly meter estimation (Kappler et al., 2005). Indeed, we can apply the fra-
subdivide the Data Flow Graph into several sub-graphs to execute them mework and optimization methods of machine learning to solve inverse
on different devices simultaneously. Thirdly, the Data Flow Graph is a problems. In this work, we aim to introduce a state-of-the-art hetero-
language-independent representation of models. One can easily build geneous distributed computing framework for the acceleration of
the model in Python or other languages, and then deploy it to a variety seismic inversion. To evaluate the computational performance, we
of computing platforms. This “code one and run everywhere” nature focus on a geostatistical post-stack inversion application to compare the
provides a high scalability across computing machines. execution times on different computing environments. The inversion
Once the Data Flow Graph is built and gets started in a session, the can be extended to pre-stack inversion, full waveform inversion, re-
TensorFlow core and distributed execution engine will map operations servoir history matching and any other model-based inverse problems.
in the graph that are actually needed to the underlying computing In post-stack seismic inversion, the model parameter of interest is
devices to fetch the variables of interest. Thanks to the built-in feature acoustic impedance and the forward operator is generally a convolu-
of auto-differentiation, the variables can be updated automatically and tional model of the form
iteratively by minimizing the pre-defined objective function through
the backpropagation algorithm (Rumelhart et al., 1986) which is the d = w ∗ r + e = w ∗ g (Ip) + e (3)
workhorse of learning in neural networks.
where w is the wavelet estimated from well logs and seismic, r re-
presents the reflection coefficients calculated from the acoustic
39
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
The prior AI models are only conditional to well log data, but not
honor seismic data. To provide a more accurate description of the
Fig. 4. Two-layer neural network corresponding to the convolution forward subsurface model, we should update the models according to the misfit
model in seismic inversion. between the synthetic and measured seismic data. To make the solution
more stable, we adopt the Tikhonov regularization method to control
impedance Ip through the function g . In the paradigm of machine the complexity of the models. Therefore, the objective function is
learning, Equation (3) can be modeled as a two-layer neural network,
J (m) = ||d syn − d 2true ||2 + λ||m − m2true ||2 (6)
including an input layer (model variable including Ip ) and output layer
(seismic response d ) without hidden layers (Fig. 4). Different from ty- where d syn and dtrue represent the synthetic seismic and true seismic
pical neural networks, in this model, the input variables (Ip ) are un- respectively; mprior represents the prior model. The first term in the
know variables to be estimated rather than the weights (wavelet w ). objective function represents the data misfit, and the second term re-
More specifically, the geostatistical seismic inversion is a mathe- presents the model misfit, namely regularization term. The parameter λ
matical approach consisting of three steps: simulation, inversion, and is a hyper-parameter that controls the balance between model misfit
uncertainty evaluation. In geostatistical simulation, an ensemble of and data misfit. With the advantage of auto-differentiation underlying
initial reservoir models is built from the prior geological knowledge and TensorFlow, it is simple to add any regularization term, such as L1
spatial correlation parameters through a chosen geostatistics simulation regularization and constraint of space smoothing, to the objective
algorithm, i.e. Fast Fourier transform with moving average (FFT-MA; Le function.
Ravalec et al., 2000) probability field simulation (PFS; Srivastava, The objective function is also called loss function and it measures
1992; Froidevaux, 1993), sequential Gaussian simulation (SGS; Doyen, the quality of model parameters based on how well the synthetic
2007) and multiple-point statistics (MPS; Mariethoz and Caers, 2014). seismic consist with the real seismic data. The best estimate of model
These prior reservoir simulations are then updated to assimilate seismic parameters m is defined to be the model that minimizes the loss
data and other geophysical measurements to provide a better descrip- function
tion of the true reservoir behavior. Finally, we collect all inversion re-
m = arg min J (m)
sults of the reservoir realizations to evaluate the associated model un- m (7)
certainty, which could make the decision-making in exploration and
Many optimization algorithms have been developed to find the
production more robust.
optimal model: gradient descent (steepest descent) is the most common
one and it extensively applied in the field of inverse problems. It
3.1. FFT-moving average and probability field simulation minimizes the loss function by updating the model parameters in the
direction opposite to the gradient, as described in Equation (8). By
In this paper, we use probability field simulation (Srivastava, 1992; iteratively performing the gradient descent until the desired misfit is
Froidevaux, 1993), which is a fast-conditional simulation method and reached, we can obtain the optimal solution.
computationally efficient for large grids with millions of nodes, to
ml + 1 = ml − ηg l (8)
generate the prior reservoir models. Unlike constructing the local
probability density distribution (PDF) at every grid cell to incorporate where l is the iteration index, ml
is the model parameters at the l th
previously simulated samples in sequential simulation methods, the iterate, g l = ∇m J (ml) denotes the gradient of loss function with regard
local PDF in probability field simulation are estimated through kriging to model parameters evaluated at the l th iterate, and η is the learning
analysis of well data. Once the local PDF is obtained, we use a corre- rate.
lated probability field to draw samples from the PDF to simulate re- In practice, the learning rate is the most important hyper-parameter
servoir models that account for spatial correlation (Doyen, 2007). In that controls the step size to guide us how to optimize the model
particular, a reservoir simulation is the sum of the local mean and a parameters according to the loss function. If the learning rate is too
spatially correlated Gaussian noise field scaled by the local standard small, gradient descent can be very slow and take more time to con-
deviation verge; however, gradient descent with a large learning rate might
40
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
with
ul
uˆ l =
1 − β1 (10)
vl
vˆ l =
1 − β2 (11)
ul = β1 ul − 1 + (1 − β1) g l (12)
where and are referred to as the first moment (the mean) and the
ul vl
second moment (the uncentered variance) of the gradient respectively;
ûl and v̂ l are the corresponding biased-corrected first and second mo-
ment estimates to avoid the raw moments biased towards zero during
the initial updating; the symbol ∘ denotes the Schur product; the hyper-
parameters β1 and β2 control the exponential decay rates for the mo-
ment estimates, defaulting to 0.9 and 0.999 respectively; ε is equal to
10−8.
Overall, the workflow of geostatistical seismic inversion in this
paper is as illustrated in Fig. 5. We first generate a set of prior acoustic
impedance models through FFT-MA and probability field simulation
algorithms, and then convolve the wavelet estimated from seismic well
tie with the reflectivity coefficients computed from these acoustic im-
pedance models to generate the synthetic seismic responses. According
Fig. 5. Workflow of the geostatistical seismic inversion.
to the misfit between the synthetic and the observed seismic data, we
update the initial ensemble of reservoir models iteratively until con-
vergence through the backpropagation algorithm.
41
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
directions, and the time sample interval is 1 ms. For the geostatistical
inversion, we generate 1000 realizations of acoustic impedance models
using the probability field simulation method. We assume that the
spatial correlation is stationary and characterized by an exponential
autocorrelation function (Equation (14)) with correlation range of
1250 m in both inline and crossline directions (a = 1250m , b = 1250m ),
and 5 ms in the vertical direction (c = 5ms ).
x2 y2 z2
r (x , y, z ) = exp[−( + 2 + 2 )1/2]
a2 b c (14)
Fig. 8. Data Flow Graph of the geostatistical seismic inversion with multiple
GPUs in data parallelism. Fig. 9. Performance of different optimization methods.
42
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
Fig. 10. Inversion results at Well A (a) and Well B (b). Left: inverted Ip models
(the black curve represents the actual Ip , the dash black cure represents the
prior mean, the light gray curves represent 1000 posterior Ip realizations and
the red curve represents the posterior mean); Right: synthetic seismic response
(the dash black curves represent the real seismic trace and the red curves re-
present the synthetic seismic response of the posterior realizations). (For in-
terpretation of the references to colour in this figure legend, the reader is re- Fig. 13. Inline section of the inverted Ip models cross Well A in the scheme of
ferred to the web version of this article.) data parallelism. (a) the mean of 1000 inverted realizations; (b) the standard
deviation of 1000 inverted realizations; (c) and (d) two randomly selected prior
realizations; (e) and (f) the updated realizations corresponding to the two prior
realizations above.
43
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
Table 2
The absolute computational time with different number of GPUs.
Number of GPUs 1 2 3 4
44
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45
45