0% found this document useful (0 votes)

104 views

Accelerating Inversion Deep Learning

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views

Accelerating Inversion Deep Learning

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Computers and Geosciences 124 (2019) 37–45

Contents lists available at ScienceDirect

Computers and Geosciences

journal homepage: www.elsevier.com/locate/cageo

Accelerating geostatistical seismic inversion using TensorFlow: A T

heterogeneous distributed deep learning framework
Mingliang Liu∗,1, Dario Grana2
Department of Geology and Geophysics, University of Wyoming, 1000 E. University Ave, Laramie, WY, 82071, USA

A R T I C LE I N FO A B S T R A C T

Keywords: Geostatistical seismic inversion is one of emerging technologies in reservoir characterization and reservoir un-
Seismic inversion certainty quantification. However, the challenge of intensive computation often restricts its application in
Geostatistics practical studies. To circumvent the computational limitation, in this work, we present a distributed parallel
TensorFlow approach using TensorFlow to accelerate the geostatistical seismic inversion. The approach provides a general
GPU
parallel scheme to efficiently take advantage of all the available computing resources, i.e. CPUs and GPUs: the
computational workflow is expressed and organized as a Data Flow Graph, and the graph can be divided into
several sub-graphs which are then mapped to multiple computing devices to concurrently evaluate the opera-
tions in them. The high-level interface and the feature of automatic differentiation provided by TensorFlow also
makes it much easy for users to implement their algorithms in an efficient parallel manner, and allows em-
ploying programs on any computing platform almost without alteration. The proposed method was validated on
a 3D seismic dataset consisting of 600 × 600 × 200 grid nodes. The results indicate that it is feasible to the
practical application and the computational time can be largely reduced by using multiple GPUs.

1. Introduction available measurements. Alternatively, stochastic inversion schemes

allow generating multiple plausible highly-detailed realizations con-
The objective of seismic reservoir characterization is to estimate ditioning on the measurements and investigating the associated model
elastic and petrophysical properties of interest from seismic and well uncertainties through the empirical covariance matrix. Stochastic in-
log data (Doyen, 2007). In the last decades, a variety of seismic inversion algorithms include simulated annealing (Sen and Stoffa, 1991),
version methods have been proposed, including deterministic and sto- genetic algorithms (Mallick, 1995), gradual deformation (Hu, 2000),
chastic algorithms (Tarantola, 2005; Aster et al., 2011). In deterministic probability perturbation method (Caers and Hoffman, 2006; Grana
approaches to seismic inversion, the aim is to find a set of reservoir et al., 2012), and ensemble-based methods (Gineste and Eidsvik, 2017;
properties that minimize the synthetic-to-seismic misfit by using nu- Thurin et al., 2017; Liu and Grana, 2018).
merical optimization techniques, such as gradient-based methods. The Geostatistical seismic inversion is an inversion method in which
deterministic approaches are generally easy to implement and compu- multiple prior reservoir models are simulated using geostatistical al-
tationally fast, but provide only a single estimate of subsurface para- gorithms, and then updated to honor geophysical observations by de-
meters that is unrealistically smooth and might exaggerate the reservoir terministic or stochastic optimization methods (Bortoli, 1992; Haas and
connectivity. However, the misfit functions of geophysical inverse Dubrule, 1994; González et al., 2007; Bosch et al., 2010; Azevedo et al.,
problems are not necessarily monotonically convex, and thus determi- 2015; Azevedo and Soares, 2017). The posterior reservoir realizations
nistic solutions might fall into local optimums and fail to describe the can be used not only for model uncertainty quantification, but also as
subsurface reservoir accurately. In addition, due to the limited band- starting models in history matching to be further refined by assimilating
width of measured data (in particular, the absence of low frequencies) production data or/and time-lapse seismic. However, due to the size of
and the low signal-to-noise of seismic data, solutions of seismic inverse seismic data and the large number of reservoir realizations in the
problems are inherently non-unique, which implies that theoretically geostatistical seismic inversion, this method becomes computationally
there are an infinite number of reservoir realizations that honor the prohibitive for real datasets, which largely limits its application in

∗
Corresponding author.
E-mail address: mliu4@uwyo.edu (M. Liu).
1
Mingliang Liu developed the methodology, implemented the code, and applied it to a real dataset.
2
Dario Grana supervised the research project and contributed to the preparation of the manuscript.

https://doi.org/10.1016/j.cageo.2018.12.007
Received 29 April 2018; Received in revised form 1 October 2018; Accepted 20 December 2018
Available online 21 December 2018
0098-3004/ © 2018 Elsevier Ltd. All rights reserved.
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

practice. Therefore, it is necessary to accelerate the inversion method to parallelism); thirdly, 3D seismic measurements generally include en-
make it computationally efficient for practical applications. With the ormous volumes of data (tens or hundreds of gigabytes), which might
recent advent of high-speed multi-core CPUs and GPUs, specialized be beyond the capacity of a single computing device. In this case, we
parallel algorithms for geostatistical seismic inversion have been de- can subdivide the 3D seismic data into several small sub-volumes and
veloped for the high-performance parallel computing platforms. Vargas process them across multiple CPUs or GPUs (data parallelism). The new
et al. (2007) proposed to parallelize the sequential simulation algorithm contribution is in the parallelization of the seismic inversion problem
by subdividing the study area into small spatial domains, and then in- and the computational advantage of the proposed workflow.
dependently simulating on different processors. Since the simulation of One of the advantages of TensorFlow is its fast auto-differentiation
each node does not consider the currently simulated nodes, the result is ability that allows computing the gradients and Jacobians of any vari-
not exactly the same as the original sequential algorithm. Alternatively, ables without analytically deriving the necessary partial derivatives
Nunes and Almeida (2010) proposed a parallelization scheme by using that is usually time-consuming and error prone. Therefore, any gra-
a functional decomposition to split the simulation algorithm into single dient-based method can benefit from this feature. Combining auto-dif-
sub-tasks, and then implement parallelly using multi-threads with ferentiation with the set of first-rate optimizers implemented in
multi-cores CPUs. Tahmasebi et al. (2012) exploited the graphics TensorFlow, model parameters can be automatically and iteratively
computational units (GPU) to reduce the execution time using the updated according to the selected objective function. It is also possible
Compute Unified Device Architecture (CUDA). Huang et al. (2013) to apply other user-defined or optimizers from third-party libraries.
improved the computational performance of Direct Sampling method Overall, with the advantages of distributed computation and auto-
for multiple-point geostatistics simulations with the use of GPUs. differentiation, TensorFlow allows us to implement the geostatistical
Ferreirinha et al. (2015) introduced a generic strategy based on OpenCL seismic inversion algorithm in parallel in a simple and accessible way.
to efficiently use multiple computational devices (CPUs and GPUs) for At present, TensorFlow provides a high-level API for building and ex-
stochastic seismic inversion. Similar parallel algorithms have been ecuting a Data Flow Graph in various languages, including Python, C+
presented by Mariethoz (2010) and Peredo et al. (2014). +, Java and Go, among which the Python API is the most complete and
In this paper, we propose a distributed parallel method to accelerate easiest one to use due to the advantages of multi-platform compat-
the geostatistiscal seismic inversion using TensorFlow, an open-source ibility, readable syntax, and easy integration with other scientific
heterogeneous distributed deep learning framework developed by computing libraries implemented via C, C++ or Fortran. The backend
Google and first released to the public at the end of 2015. Although and core modules of TensorFlow are written in C/C++, therefore,
TensorFlow is designed with the hopes of speeding up deep learning by independently from the API used for Data Flow Graph in the geosta-
providing a simple-to-use and computationally efficient infrastructure, tistical seismic inversion, the code will be converted to C/C++ and
its generic architecture and extensibility make it applicable to any then compiled to machine code transparently, hence boosting the per-
numerical problems that can be expressed as a Data Flow Graph. One of formance.
the advantages of TensorFlow is that it efficiently and easily exploits
heterogeneous distributed computational devices, i.e. CPU, GPU and 2. Overview of TensorFlow
TPU (Tensor Processing Unit developed by Google to specifically speed
up tensor calculations). It provides an extremely efficient and user- TensorFlow is currently the most widely used deep learning fra-
friendly platform for compute-intensive applications. With such plat- mework. It was initially designed to simplify the construction of deep
form programs can be easily deployed to a scalable computing en- neural networks and speed up the learning process with a hetero-
vironment with a uniform Application Programming Interface (API). In geneous distributed computational environment, and then became a
TensorFlow, underlying computing devices are transparent to users, more generic library for numerical computation, making easy large-
and computational tasks are scheduled by the dedicated distributed scale numerical optimization problems, i.e. inverse problems and so-
execution engine. In contrast with MPI and CUDA, users no longer need lutions of partial differential equations. Different with the traditional
to explicitly handle data transmission, memory management and mes- numerical libraries, TensorFlow adopts Data Flow Graph, which is a
sage passing between devices: the algorithms are expressed in terms of common programming model in the fields of cloud computing and
Data Flow Graph and the underlying operations are completely implicit. machine learning, to express and organize the computation workflow,
Other frameworks, i.e. PyTorch and MXNet, could also be applied, al- and then map the mathematical operations in the graph to different
ternatively to TensorFlow. computational devices (CPUs, GPUs, and TPUs). As illustrated in Fig. 1,
Seismic inversion, geostatistical simulations and uncertainty quan- the design of TensorFlow is highly modular: users build the Data Flow
tification in reservoir modeling generally require a large computational Graph by the front end, i.e. Python, C++, Java and Go, and the core
effort due to the size of seismic datasets and the dimension of the model and distributed execution engine is responsible for dispatching different
space. The proposed reservoir characterization workflow requires the computing operations in the Data Flow Graph to the underlying com-
generation of a large number of geostatistical realizations of the re- puting devices. Corresponding to the three layers in Fig. 1, there are
servoir model and the optimization of the models to match the mea- three important roles in a TensorFlow program, client, master and
sured seismic data. The simulations are obtained using geostatistical
algorithms and the optimization is performed using gradient-based
methods but could also be obtained using stochastic perturbation al-
gorithms. The implementation using stochastic optimization methods
generally allows avoiding local minima of the objective function;
however, the proposed gradient-based inversion method using multiple
initial models allows exploring the model space with a smaller com-
putational effort thanks to the parallelization approach. Indeed, with
the advent of TensorFlow, it becomes simple and flexible to accelerate
the geostatistical seismic inversion in three parallel strategies. First,
geostatistical simulation and seismic inversion typically involve many
large-scale matrix operations that can be massively parallelized, and
thus computed on GPUs; secondly, as multiple reservoir realizations are
used in the geostatistical inversion, we can utilize multiple CPUs or
GPUs to simulate and invert the models simultaneously (model Fig. 1. Architecture of TensorFlow.

38
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

Fig. 2. Execution mechanism of TensorFlow program: (a) Standalone mode and

(b) distributed mode (Abadi et al., 2015).

worker. As illustrated in Fig. 2, the client builds the computation graph

and creates a session to send the defined graph to the master. Then, the
master prunes and partitions the graph into multiple pieces so as to
delivers them to worker services for execution. In the standalone mode,
client, master and worker are located in the same machine, while in the
distributed mode, the three roles can be located in different machines
and the communication between them is through gPRC, a high per-
formance, open-source universal remote procedure call framework
(Abadi et al., 2015). This architecture provides a uniform API to make
low level modules and devices transparent to users, which not only
frees us from the cumbersome and challenging tasks of parallel pro-
gramming, but also make it possible to migrate the application from
one computing platform to others almost without modification. For all
these strengths, both the development and maintenance costs of ap-
plications could be greatly reduced.
In TensorFlow, the Data Flow Graph models a program as a directed
graph in which nodes represent mathematical operations and edges
represent the multidimensional data arrays (tensors) that flow between
the nodes (Martin and Estrin, 1967; Yourdon and Constantine, 1979).
For example, Fig. 3 depicts the Data Flow Graph that represents a
simple linear regression model

Y=W∗X+b (1) Fig. 3. Data Flow Graph of the linear regression model described as Equation
(1).
where X and Y represent the input and output respectively; W and b
represent the weights and bias that we want to estimate. The matrix 3. Geostatistical seismic inversion
multiplication in the Data Flow Graph corresponds to a single node that
connects two inputs (W, X ) and an output (the result of W*X ); simi- The estimation of elastic properties from seismic data is a common
larly, the add operation corresponds to a node with two inputs inverse problem in geophysics, which can be written in the following
(W*X , b ) and an output (Y ). To better understand, debug, and optimize general form
programs, TensorFlow also provides a suite of visualization tools,
d = f (m) + e (2)
namely TensorBoard, to visualize the Data Flow Graph and plot quan-
titative metrics about the execution of the graph. where d represents the seismic data, m represents the set of elastic
The mechanism of Data Flow Graph brings several advantages to attributes, f represents the seismic forward model mapping m into d ;
TensorFlow. First, constructing Data Flow Graph is like building blocks, and e is the measurement error. The goal of seismic inversion is to
which makes the model very flexible to be modified and extended by estimate the unknown model variables m from d assuming that the
adding or removing components in the graph. Secondly, this me- forward model f is a good approximation of the physical relations
chanism is inherently parallel and distributed, and can work well in between model properties and data.
large-scale numerical problems: it is easy for TensorFlow to identify Machine learning and inverse theory methods are tools for para-
operations that can execute in parallel, and we can also explicitly meter estimation (Kappler et al., 2005). Indeed, we can apply the fra-
subdivide the Data Flow Graph into several sub-graphs to execute them mework and optimization methods of machine learning to solve inverse
on different devices simultaneously. Thirdly, the Data Flow Graph is a problems. In this work, we aim to introduce a state-of-the-art hetero-
language-independent representation of models. One can easily build geneous distributed computing framework for the acceleration of
the model in Python or other languages, and then deploy it to a variety seismic inversion. To evaluate the computational performance, we
of computing platforms. This “code one and run everywhere” nature focus on a geostatistical post-stack inversion application to compare the
provides a high scalability across computing machines. execution times on different computing environments. The inversion
Once the Data Flow Graph is built and gets started in a session, the can be extended to pre-stack inversion, full waveform inversion, re-
TensorFlow core and distributed execution engine will map operations servoir history matching and any other model-based inverse problems.
in the graph that are actually needed to the underlying computing In post-stack seismic inversion, the model parameter of interest is
devices to fetch the variables of interest. Thanks to the built-in feature acoustic impedance and the forward operator is generally a convolu-
of auto-differentiation, the variables can be updated automatically and tional model of the form
iteratively by minimizing the pre-defined objective function through
the backpropagation algorithm (Rumelhart et al., 1986) which is the d = w ∗ r + e = w ∗ g (Ip) + e (3)
workhorse of learning in neural networks.
where w is the wavelet estimated from well logs and seismic, r re-
presents the reflection coefficients calculated from the acoustic

39
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

z (u) = m (u) + σ (u) ε (u) (4)

where z (u) is the simulated value at location u ; m (u) is a deterministic
function that represents the local mean; σ (u) is the local standard de-
viation; and ε (u) is the spatially correlated Gaussian noise field.
The simulation of Gaussian noise field is obtained by the FFT-
moving average (FFT-MA) method. A spatially correlated Gaussian
noise field is simulated by convolving a Gaussian white noise with the
spatial correlation structure of the random field, namely moving
average (Equation (5)). To speed up this procedure, we apply the FFT to
perform the convolution in the frequency domain.
ε (u) = n (u) ∗ G (u) (5)
where n (u) is the Gaussian white noise and G (u) represents the spatial
correlation structure of the random field.
We can apply the same simulation procedure many times using
different noise fields as input to generate multiple plausible reservoir
realizations honoring the well data and spatial correlation. In this
geostatistical post-stack seismic inversion, we use probability field si-
mulation in conjunction with FFT-MA to build a set of acoustic im-
pedance models as prior for the following seismic inversion.

3.2. Seismic inversion

The prior AI models are only conditional to well log data, but not
honor seismic data. To provide a more accurate description of the
Fig. 4. Two-layer neural network corresponding to the convolution forward subsurface model, we should update the models according to the misfit
model in seismic inversion. between the synthetic and measured seismic data. To make the solution
more stable, we adopt the Tikhonov regularization method to control
impedance Ip through the function g . In the paradigm of machine the complexity of the models. Therefore, the objective function is
learning, Equation (3) can be modeled as a two-layer neural network,
J (m) = ||d syn − d 2true ||2 + λ||m − m2true ||2 (6)
including an input layer (model variable including Ip ) and output layer
(seismic response d ) without hidden layers (Fig. 4). Different from ty- where d syn and dtrue represent the synthetic seismic and true seismic
pical neural networks, in this model, the input variables (Ip ) are un- respectively; mprior represents the prior model. The first term in the
know variables to be estimated rather than the weights (wavelet w ). objective function represents the data misfit, and the second term re-
More specifically, the geostatistical seismic inversion is a mathe- presents the model misfit, namely regularization term. The parameter λ
matical approach consisting of three steps: simulation, inversion, and is a hyper-parameter that controls the balance between model misfit
uncertainty evaluation. In geostatistical simulation, an ensemble of and data misfit. With the advantage of auto-differentiation underlying
initial reservoir models is built from the prior geological knowledge and TensorFlow, it is simple to add any regularization term, such as L1
spatial correlation parameters through a chosen geostatistics simulation regularization and constraint of space smoothing, to the objective
algorithm, i.e. Fast Fourier transform with moving average (FFT-MA; Le function.
Ravalec et al., 2000) probability field simulation (PFS; Srivastava, The objective function is also called loss function and it measures
1992; Froidevaux, 1993), sequential Gaussian simulation (SGS; Doyen, the quality of model parameters based on how well the synthetic
2007) and multiple-point statistics (MPS; Mariethoz and Caers, 2014). seismic consist with the real seismic data. The best estimate of model
These prior reservoir simulations are then updated to assimilate seismic parameters m is defined to be the model that minimizes the loss
data and other geophysical measurements to provide a better descrip- function
tion of the true reservoir behavior. Finally, we collect all inversion re-
m = arg min J (m)
sults of the reservoir realizations to evaluate the associated model un- m (7)
certainty, which could make the decision-making in exploration and
Many optimization algorithms have been developed to find the
production more robust.
optimal model: gradient descent (steepest descent) is the most common
one and it extensively applied in the field of inverse problems. It
3.1. FFT-moving average and probability field simulation minimizes the loss function by updating the model parameters in the
direction opposite to the gradient, as described in Equation (8). By
In this paper, we use probability field simulation (Srivastava, 1992; iteratively performing the gradient descent until the desired misfit is
Froidevaux, 1993), which is a fast-conditional simulation method and reached, we can obtain the optimal solution.
computationally efficient for large grids with millions of nodes, to
ml + 1 = ml − ηg l (8)
generate the prior reservoir models. Unlike constructing the local
probability density distribution (PDF) at every grid cell to incorporate where l is the iteration index, ml
is the model parameters at the l th
previously simulated samples in sequential simulation methods, the iterate, g l = ∇m J (ml) denotes the gradient of loss function with regard
local PDF in probability field simulation are estimated through kriging to model parameters evaluated at the l th iterate, and η is the learning
analysis of well data. Once the local PDF is obtained, we use a corre- rate.
lated probability field to draw samples from the PDF to simulate re- In practice, the learning rate is the most important hyper-parameter
servoir models that account for spatial correlation (Doyen, 2007). In that controls the step size to guide us how to optimize the model
particular, a reservoir simulation is the sum of the local mean and a parameters according to the loss function. If the learning rate is too
spatially correlated Gaussian noise field scaled by the local standard small, gradient descent can be very slow and take more time to con-
deviation verge; however, gradient descent with a large learning rate might

40
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

overshoot the minimum and fail to converge or even diverge. To deal

with the aforementioned challenges, we apply a more efficient algo-
rithm, Adaptive Moment Estimation (Adam - Kingma and Ba, 2014), to
perform the optimization. Adam combines the advantages of two re-
cently popular optimizers proposed in machine learning, namely Ada-
Grad (Duchi et al., 2011), RMSProp (Tieleman and Hinton, 2012) and
AdaDelta (Zeiler, 2012), and the numerical experiments show that
Adam is computationally efficient for optimization problems with high-
dimensional parameter space, because it only uses the first and second
moments of the gradients without Hessian matrix to compute adaptive
learning rates for each model parameter (Kingma and Ba, 2014). Like
gradient descent, the model parameters are updated iteratively by the
following equation
η
ml + 1 = ml − uˆ l
vˆ l +ε (9)

with
ul
uˆ l =
1 − β1 (10)

vl
vˆ l =
1 − β2 (11)

ul = β1 ul − 1 + (1 − β1) g l (12)

v l = β2 v l − 1 + (1 − β2)(g l∘g l) (13)

where and are referred to as the first moment (the mean) and the
ul vl
second moment (the uncentered variance) of the gradient respectively;
ûl and v̂ l are the corresponding biased-corrected first and second mo-
ment estimates to avoid the raw moments biased towards zero during
the initial updating; the symbol ∘ denotes the Schur product; the hyper-
parameters β1 and β2 control the exponential decay rates for the mo-
ment estimates, defaulting to 0.9 and 0.999 respectively; ε is equal to
10−8.
Overall, the workflow of geostatistical seismic inversion in this
paper is as illustrated in Fig. 5. We first generate a set of prior acoustic
impedance models through FFT-MA and probability field simulation
algorithms, and then convolve the wavelet estimated from seismic well
tie with the reflectivity coefficients computed from these acoustic im-
pedance models to generate the synthetic seismic responses. According
Fig. 5. Workflow of the geostatistical seismic inversion.
to the misfit between the synthetic and the observed seismic data, we
update the initial ensemble of reservoir models iteratively until con-
vergence through the backpropagation algorithm.

4. Implementation with TensorFlow

In this section, we focus on the implementation of the geostatistical

seismic inversion introduced in Section 3 in a parallel manner under the
framework of TensorFlow.

4.1. Implementation with a single GPU

The simplest parallel scheme is to perform the inversion using a

single GPU. In this scheme, reservoir realizations are updated sequen-
tially: the inversion of one realization starts running until the previous
one is totally completed. Fig. 6 shows the corresponding Data Flow
Graph of one specific realization exported from TensorBoard. In the
graph, each node, i.e. Simulation, Reflectivities, Synthetic, Misfit and
Optimization, includes a set of operations, but for the sake of concise- Fig. 6. Data Flow Graph of the geostatistical seismic inversion with single GPU.
ness we use the technique of name scope to hide the unnecessary de-
tails. We can dive into those nodes of interest in by clicking to expand
defined in Equation (6) would be evaluated automatically. In this for-
them.
ward stage, the partial derivatives of any operations would also be
Once the Data Flow Graph is built, we can start the program by
automatically computed using the chain rule and implicitly added to
feeding the seismic data, wavelet and geostatistical parameters into the
the graph as new operation nodes. In the backward stage, the
graph. Then, the necessary operation nodes to compute the error

41
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

directions, and the time sample interval is 1 ms. For the geostatistical
inversion, we generate 1000 realizations of acoustic impedance models
using the probability ﬁeld simulation method. We assume that the
spatial correlation is stationary and characterized by an exponential
autocorrelation function (Equation (14)) with correlation range of
1250 m in both inline and crossline directions (a = 1250m , b = 1250m ),
and 5 ms in the vertical direction (c = 5ms ).

x2 y2 z2
r (x , y, z ) = exp[−( + 2 + 2 )1/2]
a2 b c (14)

To analyze the computational performance of the proposed parallel

schemes in Section 4, we adopt Google Cloud Platform (GCP) to build
Fig. 7. Data Flow Graph of the geostatistical seismic inversion with multiple the computing environments. The CPU used in the following experi-
GPUs in model parallelism.
ments is Intel's Xeon Scalable Processor Skylake and the GPU is NVIDIA
Tesla K80. The detailed specifications can be queried from the official
predefined optimizer would update the acoustic impedance to minimize websites of Intel (https://ark.intel.com/products/codename/37572/
the loss function using the back-propagation algorithm. By running the Skylake) and Nvidia (https://www.nvidia.com/en-us/data-center/
forward and backward step in an alternation manner, we can update tesla-k80). For optimal CPU performance, we compiled TensorFlow
the reservoir models iteratively until their seismic response con- with the Intel-optimized Math Kernel Library (MKL) and Math Kernel
vergence to the measured seismic data. Library for Deep Neural Networks (MKL-DNN). However, when
building a cluster on GCP, it is generally not very clear about the spe-
4.2. Implementation with multiple GPUs cific hardware configuration, for example, whether the nodes are on the
same rack. More importantly, we cannot configure the network band-
To further reduce the inversion time, we can deploy the TensorFlow width, which is the typical bottleneck in the distributed computing. For
program on a GPU cluster in which each node is equipped with one or these reasons, we only evaluate the performance of multi-core CPU and
more GPUs. In this scheme, there are two parallel strategies, namely multi- GPU on a single machine in this paper.
model and data parallelism, to perform the inversion of multiple re-
servoir realizations concurrently using multiple GPUs. 5.1. Quality of the inversion results
In the sense of model parallelism, each reservoir realization would
be divided into a sub-graph and mapped to a specific GPU for inversion. Fig. 9 compares the performance of different optimizers, including
For example, as illustrated in Fig. 7, eight reservoir realizations are Adam, AdaDelta, AdaGrad, RMSProp and stochastic gradient descent
evenly allocated to 4 GPUs. In this case, 4 reservoir realizations would (SGD). We can figure out that the Adam Optimizer has a great ad-
be updated simultaneously at a time, and it would take two-round to vantage in the convergence rate and accuracy over other optimizers: the
accomplish the inversion of all reservoir realizations. In fact, operations Adam Optimizer takes much less iterations to converge asymptotically
in each GPU is identical, and we thus can scale up the inversion to tens toward a smaller misfit between the synthetic and real seismic data,
or even hundreds of reservoir realizations across more machines. which means that the proposed optimization method can obtain more
When the size of seismic data is beyond the capacity of a single accurate reservoir models that are consistent with the seismic data in
machine, we can also adopt the scheme of data parallelism. In this less computational time.
scheme, we split the seismic data into several small chunks that are then To evaluate the feasibility of the proposed workflow, we first per-
concurrently inversed on multiple GPUs. As illustrated in Fig. 8, a formed the inversion at Well A and B. Fig. 10 shows 1000 inverted
seismic data is divided into 4 sub-datasets and processed in 4 different acoustic impedance realizations as well as their mean and synthetic
GPUs concurrently. It is easily extensible and scalable to more com- seismic response. As we can see, the mean inverted results and the
puting devices and larger seismic data size. However, one limitation of synthetic seismic data have a good agreement with the actual well logs
data parallelism is that transition artefacts might exist between the sub- and the true seismic traces respectively. Moreover, the model un-
datasets, such as discontinuity at the boundaries of the sub-volumes. certainty can be evaluated by the 1000 posterior realizations. It in-
dicates that the proposed workflow is feasible, and therefore, we then
5. Results applied it to the entire seismic volume.

In this study, we use a 3D seismic dataset consisting of

600 × 600 × 200 grid nodes as the benchmark to evaluate the feasi-
bility and the computational performance of the proposed schemes. The
trace spacing of the seismic survey is 25 m in both inline and crossline

Fig. 8. Data Flow Graph of the geostatistical seismic inversion with multiple
GPUs in data parallelism. Fig. 9. Performance of diﬀerent optimization methods.

42
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

Fig. 10. Inversion results at Well A (a) and Well B (b). Left: inverted Ip models
(the black curve represents the actual Ip , the dash black cure represents the
prior mean, the light gray curves represent 1000 posterior Ip realizations and
the red curve represents the posterior mean); Right: synthetic seismic response
(the dash black curves represent the real seismic trace and the red curves re-
present the synthetic seismic response of the posterior realizations). (For in-
terpretation of the references to colour in this ﬁgure legend, the reader is re- Fig. 13. Inline section of the inverted Ip models cross Well A in the scheme of
ferred to the web version of this article.) data parallelism. (a) the mean of 1000 inverted realizations; (b) the standard
deviation of 1000 inverted realizations; (c) and (d) two randomly selected prior
realizations; (e) and (f) the updated realizations corresponding to the two prior
realizations above.

An inline section cross Well A of the seismic dataset is shown in

Fig. 11. The inversion results in the scheme of model parallelism and
data parallelism are shown in Fig. 12 and Fig. 13, respectively, in-
cluding the mean and standard deviation of the inverted 1000 reali-
zations, and 2 speciﬁc realizations randomly selected from the en-
semble. In the scheme of data parallelism, the 600 × 600 × 200
reservoir model is evenly divided into 8 sub-volumes with size of
300 × 300 × 100. From the results we can observe that all of the set of
posterior realizations are conditioning on the seismic data but with
small-scale heterogeneities, while the average model is relatively
smooth and theoretically corresponding to the deterministic inversion
result. It is also worth noting that both the inverted models from model
parallelism and data parallelism can recover the major characteristic of
Fig. 11. Inline section of seismic data cross Well A.
the subsurface, but as mentioned in the preceding section, there exists
obvious transition artefacts at the boundaries of the sub-volumes.

5.2. Performance of multicore CPU

We ﬁrst test the performance using diﬀerent number of CPU cores in

terms of speed-up ratios. In TensorFlow, a multicore CPU is regarded as
a whole device, but multithreading will be turned on by default to make
full use of the computing resource. In this experiment, the corre-
sponding Data Flow Graph is as illustrated in Fig. 6, and the speed-up
ratio is defined as the ratio of the execution time using multi-core CPU
to the execution time taken by the single-core CPU. Table 1 lists the
absolute computational time with 1, 2, 4, 8 and 16 CPU cores, and their
speed-up ratios are shown in Fig. 14. As we can see, the execution time
is reduced with the increasing number of CPU cores. However, the
improvement of computational efficiency is not linear to the number of
CPU cores. By analyzing the execution time of each operation using
Timeline, a tool built in TensorFlow, the major cause for this problem is
that all the CPU cores share the same host memory with limited
bandwidth, therefore, much time is spent on data reading and writing.
Fig. 12. Inline section of the inverted Ip models cross Well A in the scheme of
model parallelism. (a) the mean of 1000 inverted realizations; (b) the standard Table 1
deviation of 1000 inverted realizations; (c) and (d) two randomly selected prior The absolute computational time with different number of CPU cores.
realizations; (e) and (f) the updated realizations corresponding to the two prior Number of CPU cores 1 2 4 8 16
realizations above.
Computational time (hours) 647.78 483.33 263.33 144.31 100.90

43
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

Table 2
The absolute computational time with diﬀerent number of GPUs.
Number of GPUs 1 2 3 4

Computational time (hours) 23.61 12.92 9.17 7.36

Fig. 14. Speed-ups with multicore CPU.

5.3. Performance of multiple GPUs

Fig. 16. Speed-ups with different grid size.
We then exploit multiple GPUs to further accelerate the inversion
algorithm and analysis their performance. The video memory of
NVIDIA Tesla K80 GPU used in this experiment is 12 GB, which is en- numerical problems involving large matrix operations. In theory, the
ough to handle the 600 × 600 × 200 test model, so we are able to algorithm based on GPU has an increasing speed-up compared to that
adopt the scheme of model parallelism to execute the same computa- based on CPU with the size of matrix increasing. Fig. 16 shows the
tional graph on different GPUs. The corresponding Data Flow Graph is speed-ups of a single GPU for different grid size compared to a single-
as Fig. 7 illustrated. However, this is often not case in practical studies core CPU. As expected, GPU typically has a better improvement on the
of reservoir characterization, and thus have to be split into several small larger size of seismic data.
chunks to be processed sequentially.
Fig. 15 shows the computational performance with different 6. Summary and future work
number of GPUs: a single GPU is over 20 times faster than a single-core
CPU and the speed-up of 4 GPUs is about 70. It can conclude from the This work investigates the acceleration of geostatistical seismic in-
absolute computational time in Table 2 that the geostatistical inversion version using TensorFlow in a heterogeneous distributed manner,
of 1000 realizations can be reduced from a couple of weeks to one day which allows taking advantage of GPUs and the cloud computing
using a single GPU or even several hours using 4 GPUs, which would platform. To efficiently exploit multiple computing devices, i.e. CPUs
make the geostatistical seismic inversion method more available to and GPUs and process seismic data with large size, we proposed two
practical applications. In addition, because each GPU card has its own multi-device parallelization schemes, namely model and data paralle-
device memory with a larger bandwidth than the host memory, the lism, to perform the inversion on multiple devices simultaneously. As
speed-up is approximately linear to the number of GPUs, which is not shown in the numerical experiments, the GPU-based distributed system
the case in the scheme of multiple-core CPU. can offer a large computational speed-up with nearly two orders of
magnitude over the CPU in the test case.
5.4. GPU performance with different grid sizes Although this study focuses on the geostatistical seismic inversion
problem, the generic architecture and extensibility of TensorFlow make
In high performance computation, GPU is typically used to it applicable to many gradient-based numeric optimization problems in
geophysics, such as full waveform inversion and seismic history
matching. Our future work will focus on the development of a general
parallel library based on TensorFlow for geophysical inverse problems.

Computer code availability

Source code of the proposed parallel geostatistical seismic inversion

approach is available from the ﬁrst author and can be downloaded from
https://github.com/theanswer003/SeisFlow.
Name of Code: SeisFlow
Developer: Mingliang Liu
Contact address: 1000 E. University Ave. Laramie, WY 82071, USA
Telephone number: 307-343-6599
E-mail: mliu4@uwyo.edu
Year ﬁrst available: 2018
Hardware: GPU recommended
Software: Python 3.6; TensorFlow, version r1.0 or later
Program language: Python
Fig. 15. Speed-ups with multiple GPUs. Program size: about 30 kb

44
M. Liu, D. Grana Computers and Geosciences 124 (2019) 37–45

Acknowledgements multiple-point statistical simulation. Comput. Geosci. 57, 13–23.

Kappler, K., Kuzma, H.A., Rector, J.W., 2005. A comparison of standard inversion, neural
networks and support vector machines. In: 2005 SEG Annual Meeting. Society of
Authors acknowledge the School of Energy Resources, and the Exploration Geophysicists.
Department of Geology and Geophysics of University of Wyoming for Kingma, D.P., Ba, J., 2014. Adam: a method for stochastic optimization. arXiv preprint
the support. We also thank the editor and anonymous reviewers for arXiv. 1412.6980.
Le Ravalec, M., 2005. Inverse Stochastic Modelling of Flow in Porous Media, Application
their critical review of the paper and constructive comments. to Reservoir Characterization. Editions Technip.
Liu, M.L., Grana, D., 2018. Stochastic nonlinear inversion of seismic data for the esti-
References mation of petroelastic properties using the Ensemble Smoother and data re-para-
meterization. Geophysics 83 (3), 1–60.
Mallick, S., 1995. Model-based inversion of amplitude-variations-with-offset data using a
Abadi, M., Agarwal, A., Barham, P., et al., 2015. TensorFlow: large-scale machine genetic algorithm. Geophysics 60 (4), 939–954.
learning on heterogeneous systems. arXiv preprint arXiv. 1605.08695. Mariethoz, G., Caers, J., 2014. Multiple-point Geostatistics: Stochastic Modeling with
Aster, R.C., Borchers, B., Thurber, C.H., 2011. Parameter Estimation and Inverse Training Images. John Wiley & Sons.
Problems. Academic Press. Mariethoz, G., 2010. A general parallelization strategy for random path based geosta-
Azevedo, L., Nunes, R., Soares, A., Mundin, E.C., Neto, G.S., 2015. Integration of well data tistical simulation methods. Comput. Geosci. 36 (7), 953–958.
into geostatistical seismic amplitude variation with angle inversion for facies esti- Martin, D., Estrin, G., 1967. Models of computations and systems—evaluation of vertex
mation. Geophysics 80 (6), M113–M128. probabilities in graph models of computations. J. ACM 14 (2), 281–299.
Azevedo, L., Soares, A., 2017. Geostatistical Methods for Reservoir Geophysics. Springer. Nunes, R., Almeida, J., 2010. Parallelization of sequential Gaussian, indicator and direct
Bortoli, L.J., 1992. Constraining Reservoir Models with Seismic Information. Master’s simulation algorithms. Comput. Geosci. 36, 1042–1052.
thesis. Stanford University. Peredo, O., Ortiz, J.M., Herrero, J.R., Samaniego, C., 2014. Tuning and hybrid paralle-
Bosch, M., Mukerji, T., Gonzalez, E.F., 2010. Seismic inversion for reservoir properties lization of a genetic-based multi-point statistics simulation code. Parallel Comput. 40
combining statistical rock physics and geostatistics: a review. Geophysics 75 (5), (5–6), 144–158.
75A165–75A176. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-
Caers, J., Hoffman, T., 2006. The probability perturbation method: a new look at propagating errors. Nature 323 (6088), 533.
Bayesian inverse modeling. Math. Geol. 38 (1), 81–100. Sen, M.K., Stoffa, P.L., 1991. Nonlinear one-dimensional seismic waveform inversion
Doyen, P.M., 2007. Seismic Reservoir Characterization: an Earth Modelling Perspective. using simulated annealing. Geophysics 56 (10), 1624–1638.
EAGE. Srivastava, R.M., 1992. Reservoir characterization with probability field simulation. In:
Duchi, J., Hazan, E., Singer, Y., 2011. Adaptive subgradient methods for online learning SPE Annual Technical Conference and Exhibition, pp. 4–7 (October,
and stochastic optimization. J. Mach. Learn. Res. 12 (Jul), 2121–2159. Washington, D.C).
Ferreirinha, T., Nunes, R., Azevedo, L., Soares, A., Pratas, F., Tomás, P., Roma, N., 2015. Tahmasebi, P., Sahimi, M., Mariethoz, G., Hezarkhani, A., 2012. Accelerating geostatis-
Acceleration of stochastic seismic inversion in OpenCL-based heterogeneous plat- tical simulations using graphics processing units (GPU). Comput. Geosci. 46, 51–59.
forms. Comput. Geosci. 78, 26–36. Tarantola, A., 2005. Inverse Problem Theory. SIAM.
Froidevaux, R., 1993. Probability field simulation. Geostat. Troia 92, 73–83. Thurin, J., Brossier, R., Métivier, L., 2017. June. Ensemble-based uncertainty estimation
Gineste, M., Eidsvik, J., 2017. June. Seismic waveform inversion using the ensemble in full waveform inversion. In: 79th EAGE Conference and Exhibition 2017.
Kalman smoother. In: 79th EAGE Conference and Exhibition 2017. Tieleman, T., Hinton, G., 2012. Lecture 6.5-rmsprop: divide the gradient by a running
González, E.F., Mukerji, T., Mavko, G., 2007. Seismic inversion combining rock physics average of its recent magnitude. COURSERA: Neural networks for machine learning 4
and multiple-point geostatistics. Geophysics 73 (1), R11–R21. (2), 26–31.
Grana, D., Mukerji, T., Dvorkin, J., Mavko, G., 2012. Stochastic inversion of facies from Vargas, H., Caetano, H., Filipe, M., 2007. Parallelization of sequential simulation pro-
seismic data based on sequential simulations and probability perturbation method. cedures. In: Proceedings of Petroleum Geostatistics. European Association of
Geophysics 77 (4), M53–M72. Geoscientists & Engineers, Cascais, Portugal.
Haas, A., Dubrule, O., 1994. Geostatistical inversion-a sequential method of stochastic Yourdon, E., Constantine, L.L., 1979. Structured Design: Fundamentals of a Discipline of
reservoir modelling constrained by seismic data. First Break 12 (11), 561–569. Computer Program and Systems Design. Prentice-Hall, Inc.
Hu, L.Y., 2000. Gradual deformation and iterative calibration of Gaussian-related sto- Zeiler, M.D., 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv.
chastic models. Math. Geol. 32 (1), 87–108. 1212.5701.
Huang, T., Li, X., Zhang, T., Lu, D.T., 2013. GPU-accelerated Direct Sampling method for