SPE Data Analytics in Reservoir Engineering
SPE Data Analytics in Reservoir Engineering
SPE Data Analytics in Reservoir Engineering
net/publication/345311600
CITATIONS READS
3 1,044
7 authors, including:
Some of the authors of this publication are also working on these related projects:
Novel Scalable Nonlinear Formulation and Solver Frameworks for Commercial Simulator Environments View project
All content following this page was uploaded by Sebastien Matringe on 23 August 2022.
Data Analytics in
Reservoir Engineering
Sathish Sankaran
Sebastien Matringe
Mohamed Sidahmed
Luigi Saputelli
Xian-Huan Wen
Andrei Popa
Serkan Dursun
Data Analytics
in Reservoir Engineering
All rights reserved. No portion of this book may be reproduced in any form or by
any means, including electronic storage and retrieval systems, except by explicit,
prior written permission of the publisher except for brief passages excerpted for
review and critical purposes.
Disclaimer
This book was prepared by members of the Society of Petroleum Engineers and
their well-qualified colleagues from material published in the recognized technical
literature and from their own individual experience and expertise. While the material
presented is believed to be based on sound technical knowledge, neither the Society
of Petroleum Engineers nor any of the authors or editors herein provide a warranty
either expressed or implied in its application. Correspondingly, the discussion of
materials, methods, or techniques that may be covered by letters patents implies no
freedom to use such materials, methods, or techniques without permission through
appropriate licensing. Nothing described within this book should be construed to
lessen the need to apply sound engineering judgment nor to carefully apply accepted
engineering practices in the design, implementation, or application of the techniques
described herein.
10 9 8 7 6 5 4 3 2 1
http://store.spe.org
service@spe.org
1.972.952.9393
Preface v
About the Authors ix
1. Introduction 2
1.1. Objectives 2
1.2. Organization of this Book 2
1.3. Background 2
1.4. What Is Data Analytics? 4
1.5. What Is New in Data Analytics? 5
1.6. What Value Can Data Analytics Create for the Oil
and Gas Industry? 5
1.7. What Are the Challenges? 6
2. Data-Driven Modeling Methodology 7
2.1. Modeling Strategies 8
2.2. Model Development 11
2.3. Enabling Technologies 11
2.4. Uncertainty Quantification and Mitigation 12
3. Decision Making with Data-Driven Models 13
3.1. Value Creation 14
3.2. Organizational Model 15
3.3. Execution 15
4. Reservoir Engineering Applications 16
4.1. Fluid Pressure/Volume/Temperature 16
4.2. Core Analysis 20
4.3. Reserves and Production Forecasting 22
4.3.1. Resource and Reserves Calculations 22
4.3.2. Production Forecasting 26
4.4. Reservoir Surveillance and Management 32
4.4.1. Reservoir Surveillance 32
4.4.2. Reservoir Management 34
4.5. Enhanced Oil Recovery and Improved Oil Recovery 38
4.5.1. Screening Tools for EOR/IOR 38
4.5.2. Waterflood Management 39
4.5.3. Steamflood Management 42
4.6. Reservoir Simulation 45
4.6.1. Proxy Modeling 45
4.6.2. Reduced-Order Modeling 51
4.6.3. Reduced-Physics Model 54
4.6.4. Predictive Uncertainty Analysis 54
4.6.5. Data-Driven Physics-Based Predictive Modeling 56
Why Is Data Analytics Relevant Now for the Oil and Gas Industry? The confluence
of several factors such as sensor explosion, advances in cloud and hardware technol-
ogy, and innovations in data science algorithms, in addition to the recent downturn
in the oil and gas industry alongside the success of data analytics in other industries,
has contributed to the crossroads where we are with respect to applying data analyt-
ics in reservoir engineering work processes.
Over the past few years, several successful case studies (Mehta 2016; Sankaran et al.
2017) have demonstrated the benefits of applying data analytics to transform the tradi-
tional reservoir model to a data-driven decision support. The key questions that remain
are related to determining the right work processes that lend themselves to data-driven
insights, how to redesign them effectively in the new paradigm, and adopting the appro-
priate business model to complement them. Most oil and gas companies have already
embarked on this journey and are at varying maturity levels on this trajectory.
The authors would like to thank those that peer-reviewed our book and provided
us with feedback prior to publication, Eduardo Gildin, Hector Klie, Shahab Moha-
ghegh and Suryansh Purwar.
Sathish Sankaran is EVP of Engineering and Technology at Xecta Digital Labs. Prior
to that he served as Engineering Manager of Advanced Analytics and Emerging Tech-
nology for Anadarko Petroleum Corporation. His work focuses on modeling and
optimizing hydrocarbon production from reservoir to process plant, with emphasis
on blending physics and data-driven methods.
Xian-Huan Wen is a Chevron Fellow and the Chapter Manager of Reservoir Simulation
Research and Optimization in Chevron Technology Center. Wen holds a PhD degree
in Water Resources Engineering from the Royal Institute of Technology, Sweden and a
PhD degree in Civil Engineering from the Technical University of Valencia, Spain. He has
authored or coauthored more than 80 papers and holds two US patents.
1. Introduction
Data analytics has been broadly applied to a variety of areas within the oil and gas
industry including applications in geoscience, drilling, completions, reservoir, pro-
duction, facility, and operations. However, we limit our discussion here primarily to
reservoir engineering applications.
1.2. Organization of this Book. Section 2 outlines the main objectives of this book
and introduces data analytics. Successful applications in other industries are pro-
vided as a reference to illustrate the digital transformation that is driven by data and
other new technologies.
Section 3 provides an introduction to the data driven modeling methods and the
various stages in the modeling life cycle.
Section 4 illustrates the nontechnical elements needed to assimilate the results of
data driven methods into decision making for businesses. The emphasis here is on
business processes and human factors required to enable the successful adoption of
data analytics methods.
Section 5 discusses a number of published practical applications of data analytics
in reservoir engineering, spanning fluid analysis, core analysis, production surveillance,
reserves and production forecasting, reservoir surveillance and management, EOR and
IOR, reservoir simulation, and unconventionals. We discuss several reservoir modeling
approaches as a spectrum of possibilities between full-physics and data-driven methods
that incorporate data analytics to various degrees.
Section 6 discusses future trends that address needed advancements in areas
related to data, applications, and people for successful adoption of data analytics
methods in the long term.
the oil and gas industry (Mehta 2016). There seems to be a rapid increase in uptake
and sense of urgency by several companies to implement data analytic solutions. The
major driving factors are the need to improve well performance to reduce costs and
major breakthroughs in digital technology that enable commercial viability. As a
result, the digital-oilfield market is projected to reach USD 28 billion by 2023 (Market
Research Engine 2018), to which data analytics is a major contributor.
Companies have formed new data science organizations and recruited a new
breed of technical professionals to solve complex oil and gas problems. This trend is
primarily driven by current market conditions that drive companies to become more
efficient, in the footsteps of other industries.
The confluence of the following factors has led to the crossroads where we are
with respect to applying data analytics in reservoir engineering work processes:
• Development and wide availability of inexpensive sensors that have accelerated
subsurface and surface data collection (i.e., velocity, variety, and volume of data)
• Advancements in modern data storage and management (including cloud
infrastructure)
• Breakthroughs in hardware technology that address massively parallel computa-
tions [with central processing units (CPUs) and graphical processing units (GPUs)]
• Innovations in data science algorithms that leverage modern hardware and
availability of large volumes of data to improve accuracy
• Recent industry downturn and success of unconventional reservoirs (shale and
tight oil/gas reservoirs, called “unconventionals”) in the oil and gas industry
that have renewed focus on operational efficiency
• Proven success of data analytics methods to transform other industries
Business operations are becoming increasingly digital in many industries. Other
industries (e.g., banking, healthcare, insurance, power utilities) are not simply creat-
ing a digital strategy, they are digitizing their business strategy.
Banking. The finance industry has been an early adopter of big data and data
analytics to drive revenue and reduce risk in the areas of high-frequency trading, pre-
trade decision-support analytics, sentiment measurement, robo-adviser, antimoney-
laundering, know your customer, credit risk reporting, and fraud mitigation (Hand
2007; Srivastava and Gopalkrishnan 2015, Cerchiello and Giudici 2016).
Healthcare. Healthcare analytics describes actionable insights that have been
undertaken because of analysis of data collected from claims, pharmaceutical R&D,
clinical data, and patient behavior for reducing rising costs and providing better
benefits across the board in the areas of evidence-based medicine, predictive compu-
tational phenotyping, drug screening, and patient similarity (Raghupathi and Ragh-
upathi 2014; Archenaa and Mary Anita 2015; Sarwar et al. 2017; Muhammad et al.
2017; North American CRO Council 2017).
Insurance. With data analytics, insurance businesses have increased accessibility
to huge volumes of data that can be converted into customer insights resulting in
improved profitability and overall performance in the areas of personalized cus-
tomer service, fraud prediction, and accident likelihood (Wuthrich and Buser 2017;
North American CRO Council 2017).
Power Utilities. Aging infrastructure and demand challenges have forced the
power utility industry to leverage data analytics to reduce cost and improve reli-
ability, with (for example) smart meters, smart grid technology, and power-outage
predictions (Guille and Zech 2016; Goldstein 2018).
Some common goals among these industries embracing big data and data ana-
lytics involve enhanced customer experience, cost reduction, improving operational
efficiency, and value optimization. Data-driven insights are increasingly driving deci-
sion making across these businesses. Other typical benefits reported include cross-
functional agility, cost resilience, speed, and innovation.
However, there are significant challenges for adapting these technologies by the oil
and gas industry. It begins with a proper understanding of what is data analytics and
what it can do for the enterprise user.
1.6. What Value Can Data Analytics Create for the Oil and Gas Industry? Com-
panies now realize that data constitute a vital commodity and the value of data
can be realized through the power of data analytics (Saputelli 2016). Leveraging
hidden insights from mining data can help the oil and gas industry make faster and
better decisions that can reduce operational costs, improve efficiency, and increase
production and reservoir recovery. Data analytics can thus play an important role
in reducing the risks inherent in the development of subsurface resources. These
analytic advantages can improve production gains by 6 to 8% (Bertocco and Pad-
manabhan 2014).
While data analytics has broad applications in reservoir engineering, the vast
number of wells and pace of operations in unconventionals allow data to play a
critical role in the decisions that create value.
Field data-collection programs (such as fluid, logs, core) are augmented with data-
driven models to interpolate across the entire field. Not only does this reduce oper-
ating costs, but this could also leverage a fit-for-purpose method where physical
models are complex or cumbersome (Rollins and Herrin 2015).
Automated mapping programs built using big data solutions enable companies
to calculate unconventional oil and gas reserves across vast geological areas and a
large number of wells in a fraction of the time it takes traditional manual methods
(Jacobs 2016). These tools also allow companies to identify refracturing candidates
and compare completion designs with offset operators across lease lines.
Continuous data-driven learning allows new wells to be brought online with bet-
ter performance and reduced cycle time by optimizing drilling, targeting, well spac-
ing, and completions parameters.
In conventional reservoirs, modern data-driven methods provide a pragmatic
method to handle vast amounts of sensor data that provide an early-warning system
for subsurface issues (Wilson 2015; Sankaran et al. 2017). This form of predictive
analytics can help companies assess their portfolio efficiently for better business
decisions, such as acquisitions, divestitures, and field development.
1.7. What Are the Challenges? While there is a large explosion in the application of
data analytics to several reservoir engineering problems, there is still a lack of com-
mon understanding and established guidelines for building robust and sustainable
solutions.
Business understanding is fundamental to a successful data analytics project.
Proper background in the underlying physical (and nonphysical) phenomena helps
assess the situation appropriately. It helps in validating assumptions, constraints,
risks, and contingencies appropriately. This also helps to determine proper data ana-
lytics project goals.
Data Issues. Data-driven discovery often requires validation and interpretation
based on fundamental understanding of the data sources, data quality, and the anal-
ysis process (Hayashi 2013). Data quality remains a key challenge for a number of
companies, stemming from ad hoc data management practices, lack of data stan-
dards, and multiple versions of truth.
Subsurface data are fundamentally laden with uncertainty owing to sparse sam-
ples (e.g., fluid, logs, core) collected in the field. While temporal data from sensors
(e.g., in a well) might be ubiquitous, the data might be geospatially sparse (e.g., in a
reservoir). In some cases, the data format might not be conducive for real-time data
analytics. In addition, proper data selection for modeling and analysis is often poorly
understood.
Standards to format or store the data are often proprietary or nonexistent, leading
to challenges in integration and quality control. Data quality is often in question,
forcing each engineer to personally investigate data sets and check the data quality, a
task that is often repeated multiple times in the same company over the years.
Data integration and repurposing data with contextual information is often not
an easy task. Several companies have now embarked on creating in-house data foun-
dations to address these issues and enable big data analytics.
Model Selection. One of the big challenges is in knowing which modeling tech-
niques would work best for a given problem. Sometimes, this is addressed through
an exploratory data analysis and trying a variety of methods through trial and error.
Summary
• Other industries have reported success in adopting data analytics, with bene-
fits such as reducing costs, improving decision quality, gaining business agility,
and driving operational efficiency improvements.
• Advances in hardware technologies, cloud computing, data management, and
new algorithms and the explosion of data have earmarked the new age of data
analytics over the past few years.
• Several recent applications of data analytics in reservoir engineering have
emerged, with potential to reduce operational costs, improve efficiency, and
increase production and reservoir recovery.
• There are key challenges to the successful application of data analytics in res-
ervoir engineering and its adoption—namely, data issues, choice of modeling
methods, shortage of skills, and proper balance between physical understand-
ing and data-driven methods, among others.
extracting insights from the past, predicting future performance, and recommending
actions for optimal decisions on the basis of possible outcomes. Techniques used in
data-driven models can include computational intelligence, statistics, pattern recog-
nition, business intelligence, data mining, machine learning, and AI (Solomatine and
Ostfeld 2008).
The process of discovering insightful, interesting, and novel patterns—as well as
descriptive, understandable, and predictive models—from large-scale data is at the
core of data-driven models (Zaki and Wagner 2014). The main goal is to extract
important patterns to gain insights from historical (or training) data using supervised
or unsupervised learning methods. In supervised learning, a functional relationship
is “learned” between a parameter of interest and several dependent variables on the
basis of training data or representative cases. The learned model can then be used to
predict the outcome given a different set of inputs. In unsupervised learning, asso-
ciations (or patterns between inputs) are learned using techniques such as cluster
analysis, multidimensional scaling, principal-component analysis, independent com-
ponent analysis, and self- organizing maps. These methods provide a useful way for
classifying or organizing data, as well as understanding the variation and grouping
structure of a set of unlabeled data.
• Data-driven models
• Hybrid models
Analogs and Scaled Models. A scaled model is generally a physical representa-
tion of the reservoir (rock and fluid) that maintains accurate relationships between
important aspects of the model, although absolute values of the original properties
need not be preserved. This enables demonstrating the physical phenomenon rea-
sonably in miniature. Typical examples of these methods include fluid laboratory
experiments in pressure/ volume/temperature (PVT) cells to determine fluid proper-
ties and coreflood experiments to determine rock properties or rock/fluid interaction
parameters.
Analogs are a method of representing information about a source system (tar-
get reservoir or field) by another particular system (source reservoir or field). In
cases where necessary information is not available to sufficiently model the reservoir
mathematically, analogs can be used to predict outcomes such as reservoir recovery
or production profiles.
Full-Physics Models. These are first-principles-based models that require a fun-
damental understanding of the underlying phenomena with the processes involved
so that they can be represented mathematically in terms of physical equations and
numerically as reservoir simulators. While it is impossible to model all the detailed
underlying physical mechanisms, these are often referred to as full physics models
and are solved using numerical methods. As a part of this process, several physical
parameters (often high dimensional) are needed to adequately characterize the sys-
tem, which are obtained through laboratory or field tests, through empirical correla-
tions, or through calibration with field observations. When the underlying processes
are complex, formulating these physics-based models is often cumbersome, labo-
rious, expensive, or infeasible with limited resources for practical purposes. For
example, fluid flow through multistage hydraulically fractured horizontal wells in
unconventional reservoirs encompasses modeling of fluid flow in a network of rock
matrix and fractures (natural and induced), with coupled multiphysics processes
including geomechanical effects, water blocking, stress-dependent rock properties
(permeability, porosity), Darcy or non-Darcy flow, adsorption/ desorption, and mul-
tiphase effects.
Reduced-Physics Models. When the full-physics models are cumbersome to
build (and calibrate) or not fast enough for the intended modeling objective, or
when all the multiphysics are not well-understood, a reduced-physics model can be
built. Typically, this involves simplification of the physical process through some
assumptions or modeling only portions of the physics such as material-balance
models, streamline simulation, neglecting pressure-dependent variability of prop-
erties, and INSIM and capacitance/resistance models (CRMs) (Sayarpour 2008;
Chen et al. 2013; Cao et al. 2014; Cao et al. 2015; Holanda et al. 2015; Holanda et
al. 2018; Guo and Reynolds 2019). If the dominant physics is captured, these mod-
els (CRMs) can still be used under a variety of conditions with reasonable accuracy
and are much faster. Reduced-physics methods are well-suited for production fore-
casting and reservoir surveillance methods that require frequent computations that
are based on the latest information.
Reduced-Complexity Models. While full-physics models are more explanatory,
they are often computationally intensive and high dimensional (i.e., they have
2.3. Enabling Technologies. Generally, upstream and subsurface processes are com-
plex. The presence of such levels of complexity makes it sometimes advisable to
build a single global model that adequately captures the system/process behavior.
A workaround is segmentation of relatively rich training data into several subsets
and building separate specialized models on each segment. The models serve as local
or expert models, and this type of modular model development gives rise to more-
accurate representation of complex systems in the form of a committee machine
(Sidahmed and Bailey 2016).
To ensure continuous improvement, new and enhanced modeling algorithms
should be considered. Use of emerging techniques and selection of the best approach
should be piloted and assessed against similar competing models by developing mul-
tiple competing models.
High-performance and in-memory analytics was shown to shorten data-driven-
modeling cycle time and enabled smart workflows on big data (Holdaway 2014).
The emergence of new frameworks such as deep structural and hierarchical learning
transformed the way complex physical systems such as the subsurface are modeled.
Deep learning (DL) is regarded as a class of machine learning techniques that exploit
hierarchical layers of nonlinear transformations for supervised or unsupervised
Summary
• Reservoir modeling methods fall on a spectrum of modeling strategies ranging
from full-physics to data-driven methods, including reduced-physics, reduced-
complexity, and hybrid models.
• The modeling approach is determined primarily by its purpose, data availabil-
ity, speed, accuracy, and the interpretability requirement.
Note that the decision-making process is often cyclical. As new data become avail-
able, they expand the range and scale of analysis. For example, when an exploration
well is drilled, it provides a large quantity of information (e.g., logs, fluids, pressure)
to analyze and understand the nature of the reservoir. Further, when a subsequent
appraisal well is drilled, it provides different perspectives on reservoir continuity,
fluid gradients in the reservoir, delineation limits, and other factors.
3.1. Value Creation. The centerpiece for creating value starts with identifying the
most valuable applications, which lend themselves naturally to data analytics.
In unconventionals, the vast number of wells, insufficient understanding of under-
lying complex physics, and rapid pace of operations allow data to play a critical role
in the decisions that create value. Analytic capabilities allow operators to collect and
analyze subsurface data for the following:
3.3. Execution. Another key ingredient for effective decision making using analytics
is to develop the right capabilities and talent to make the most of the data. Hiring
and retaining strong analytic talent is challenging for the oil and gas industry. This is
often a scarce resource, and the talent profile in demand is not typically found within
oil and gas information-technology (IT) functions.
Further, the data foundation that supports high throughput volumes and modern
analytics engines requires new IT skills to manage cloud and open-source architec-
tures. Companies with strong data analytics capabilities balance domain knowledge
(i.e., geoscience, engineering) with analytic skills and are focused on solving specific
problems and identifying new opportunities.
Finally, change management of stakeholders is an essential but often overlooked
component in this value chain. Changes in the decision-making process must be
properly adopted by personnel at all levels for it to transform the business.
Building a well-oiled execution machinery, where decisions are moving faster,
aided through appropriate data-driven and traditional analysis, takes time and
investment. Therefore, this requires sustained focus by top management.
Summary
• The decision-making process (e.g., development, operational, commercial)
must be completely aligned with the data analytics value chain to realize the
full benefits.
• Data analytics might not be suited for all types of problems, especially when
the data are limited or of poor quality, and if the results are not easily inter-
pretable or transparent for decision making.
• An effective approach to ensure data analytics is useful for decision making
is to selectively identify key value-adding business processes supported by
the right organizational structure and focus on execution strategy to achieve
desired outcomes.
Fig. 4.2 shows where Valkó and McCain (2003) analyzed worldwide oil samples
(1,743 sample records) and published absolute average relative error (AARE ≈ 12 to
45%) in bubblepoint-pressure estimation by several popular empirical correlations.
In addition, some correlations worked better for certain fluid properties and fluid
types than for others. This led to data-driven estimation of PVT properties to extract
key features and reduce prediction errors.
As data-driven modeling techniques evolved, several authors have proposed the
use of advanced machine learning techniques to address this problem; methods
include
• Artificial neural networks (Elsharkawy 1998; Abdel-Aal 2002; Osman and
Al-Marhoun 2005; Alimadadi et al. 2011; Alarfaj et al. 2012; Alakbari 2016;
Adeeyo 2016; Moussa et al. 2017; Ramirez et al. 2017; Arief et al. 2017)
• Support vector machines (El-Sebakhy et al. 2007; Anifowose et al. 2011)
• Nonparametric regression (McCain et al. 1998; Valkó and McCain 2003)
• Kriging and radial basis functions (Møller et al. 2018)
Nonparametric regression differs from parametric regression in that the
shape of the functional relationships between the response (dependent) and the
8000 20
6000 10
4000 0
This work
2000 –10 Standing
Glaso
Lasater
Labedi
0 –20
0 2000 4000 6000 8000 50 100 150 200 250 300
Measured Bubblepoint Pressure, psia Reservoir Temperature, °F
(a) (b)
Fig. 4.3—(a) Calculated bubblepoint pressures compared with measured bubblepoint pres-
sure for 1,745 data records, (b) AARE for calculated bubblepoint pressures compared with
other correlations (Valkó and McCain 2003).
Summary
• Data-driven modeling techniques extend the practice of estimating fluid prop-
erties through empirical correlations through a systematic way of deriving
relationships from data.
• Supervised learning techniques can be used when sufficient field-specific PVT
data are available.
• Published data-driven models for estimating PVT properties should not be
used without checking on the assumptions and population statistics of under-
lying data used for training (e.g., ranges of GOR, API gravity).
4.2. Core Analysis. Core data represent an important input in any reservoir model.
Facies, porosity, permeability, relative permeability, and capillary pressure are among
the most common parameters that are extracted from routine or standard core anal-
ysis. Due to the relatively high cost of core collection and analysis, data sets are
usually relatively sparse. A common practice is to establish a correlation between
core and log data that can then be applied more generally across the field. Neural
networks have been used with great success to that end for more than 20 years
(Mohaghegh et al. 1995) but the industry has recently seen a renewed interest in
applying newer machine-learning algorithms to improve the correlations and extend
the approach to more-challenging geological environments.
Mohaghegh et al. (1995) published the first application of artificial neural net-
works (ANNs) to correlate petrophysical logs and core properties. Fig. 4.4 presents
100 2.75
2.7
10 2.65
Permeability (md)
Bulk Density
2.6
1 2.55
2.5
0.1 2.45
2.4
0.01 2.35
2.3
0.001 2.25
0 5 10 15 20 25 0.01 0.1 1 10 100
Porosity (%) Permeability (md)
225
Gamma Ray
Gamma Ray, Deep Induction
the crossplots between permeability values determined from core analysis and the
corresponding bulk density, gamma ray, and deep induction properties determined
from wireline logs. Although one would be hard pressed to visually establish a cor-
relation, an ANN applied to the problem was able to identify relevant patterns in
the data set and provided a robust correlation that could be used as an artificial
petrophysical log of permeability. The machine-learning-based permeability was
compared to the core permeability on blind test samples and showed an excellent
correlation (R2 = 0.963). Fig. 4.5 shows the comparison between the core permeabil-
ity and the model based on the ANN.
50 50
30 30
20 20
10 10
0 0
0 10 20 30 40 50
Samples Lab Measurement of Cores (md)
Several authors have investigated the use of different machine learning meth-
ods for this problem and have documented their performance. Al-Anazi and Gates
(2010) and Shahab et al. (2016) have, for example, reported the strong performance
of algorithms based on support vector machine over different types of neural net-
works for this problem.
The approach of predicting core data from log curves has also been extended
to other data sets. For example, Negara et al. (2016) published a workflow using
support vector regression to correlate total organic carbon (TOC) obtained by core
measurements to a suite of well log data (e.g., gamma ray, acoustic, resistivity, bulk
density, and elemental spectroscopy).
Capillary pressures represent another type of data typically derived from core
measurements that has been modeled using machine learning models. Mohammad-
moradi and Kantzas (2018), for example, presented a study where they used an
ANN to establish a correlation able to predict contact angle from the concentration
of calcite, clay, quartz, and total organic carbon. Their work is focused on under-
standing the wettability of unconventional reservoirs where imbibition might play a
significant role in fracturing-fluid uptake in the reservoir rock.
In a variety of petrophysical analysis efforts, machine learning algorithms are
used to accelerate the manual interpretive work performed by experts. Many authors
have documented workflows where large amounts of data were collected and a
small fraction was analyzed manually by experts to establish a training data set.
A machine-learning algorithm is then calibrated on the training set and used on
the rest of the data to complete the analysis. Sidahmed et al. (2017) used a deep
Summary
• Data-driven modeling techniques can be useful in estimating rock properties
and identifying facies, when relevant training data are available.
• Upon proper calibration, a common application is to predict (infrequently
available) core properties from (more commonly available) log data.
4.3. Reserves and Production Forecasting. The economic success of reservoir exploita-
tion relies heavily on resource estimations and production forecasts. The quality of
these forecasts often defines the success or failure of many projects (Gupta et al. 2016).
To provide reliable estimates of future production rates, all available geological and
engineering data should be integrated. To account for the uncertainties surrounding
these estimations, the evaluation of oil and gas resources, expected production, and
reserves has been transitioning from deterministic to probabilistic. Probabilistic meth-
ods offer the advantage of capturing the variability of geological or engineering factors
and help quantify the uncertainty ranges associated with the estimates.
Fig. 4.6—Performance machine learning algorithms for categorical and numerical parameters
(Perez-Valiente et al. 2014).
1000
EUR Predicted (Bcf/Well)
100
10
1
1 10 100 1000
EUR Actual (Bcf/Well)
Fig. 4.7—Comparison of EUR predicted from a response-surface model and that from a semi-
analytical model.
As an example, a method to assess EUR for several wells in the Permian liquid-rich
shale reservoir field (Guo et al. 2017) has been proposed. Each realization of the
initial ensemble was calibrated iteratively using a distributed Gauss-Newton (DGN)
method. The responses generated during iterations are added to a training data set,
which was then used to train an ensemble of support vector regression (SVR) mod-
els. The sensitivity matrix for each realization is estimated analytically from the SVR
models and used by the DGN method to generate improved search points and accel-
erate convergence. The integration of SVR into the DGN method allowed 65% of
the simulation runs to be saved compared to the traditional DGN without SVR. This
increased efficiency comes from the use of machine learning methods that continu-
ously integrate the simulated results from the previous iterations. This is an example
of how data analytics methods can be used in support of numerical simulations to
provide faster EUR forecast and uncertainty ranges.
and fluid properties, and the impact of reservoir exploitation decisions including
drainage architecture and recovery methods. In addition, production measurements
and analog reservoir response have the power to enhance the confidence of these
estimations.
Traditional analytic reservoir engineering methods include material balance and
production rate analysis. Most of these methods are derived from first-principles
models, simplified using empirical observations and calibrated to direct measure-
ments including direct measurements of production rate and pressure data. The two
most common analytical models used for production forecasting are decline curves
and type curves.
Some of the oldest and most frequently used data-driven methods for produc-
tion forecasting are those related to decline curves, including harmonic, hyperbolic,
and (the most common) exponential (Arps 1945). The popularity of these methods
derives from their simplicity: A few parameters are sufficient to calibrate the model
and provide a forecast of the future well production behavior.
Decline curves have been used in many instances for production forecasting at
the well and the reservoir level. They can be used to determine expected remaining
recovery and are quite useful to describe the typical behavior of wells by determining
a representative decline curve. DCA is the most common method used to estimate
reserves and resources.
However, the Arps method is not directly applicable to unconventional reservoirs
and would lead to significant overestimation of reserves. Newer methods such as
power-law exponential decline (Ilk et al. 2008), stretched-exponential decline (Valkó
2009), Duong (2010), logistic growth model (Clark et al. 2011), and others (Artus
et al. 2019) have been proposed in the form of empirical equations with a few fit
parameters to describe observed decline behavior (Ali and Sheng 2015).
The use of decline models, however, is limited to estimate production behavior
for known operating conditions and is inappropriate for optimizing reservoir man-
agement strategy in terms of well location, wellhead pressure control, or number of
wells.
Type curves are powerful graphical representations of the theoretical solutions
to transient and pseudosteady-state flow equations. Reservoir engineers are usually
challenged to find a match between historical reservoir performance (e.g., rates and
pressure) and a theoretical type curve (Agarwal et al. 1970; Fetkovich 1980; Carter
1985; Palacio and Blasingame 1993). Type curves are usually represented in terms of
dimensionless variables, including dimensionless pressure, rate, cumulative produc-
tion, time, radius, and wellbore storage.
Matching becomes a bit like an art instead of science because real data can be
noisy and might contain outliers. In addition, actual reservoir architecture might not
exactly fit the assumption made for the available models.
In this sense, type-curve analysis becomes a key area where machine learning and
AI can be used to augment engineers’ knowledge. Pattern recognition and CBR have
been used to derive the parameters of type curves (Saputelli et al. 2015).
1.5
kriging
splines
neural net
quadratic
Average Estimation Error * 1E5
pure quad.
1
0.5
0
PBD SFD-12 pts CCD SFD-80 pts D-Optimal SFD-125 pts Exhaustive
Fig. 4.8—Comparison of estimation errors among various experimental design and response-
surface methods (Yeten et al. 2005).
Learning from physics-based models offers two key advantages. First, a broad
and high-population data set can be generated to help train the surrogate models.
Second, they can be made to properly learn the required physical relationship at
play. In addition, when trained these models can be interrogated in a matter of sec-
onds as opposed to hours for reservoir simulation models (Mohaghegh et al. 2006).
Surrogate models can be applied to coupling with the surface model in integrated
asset models, to real-time optimization, to real-time decision making, and to analysis
under uncertain conditions.
4.3.2.3. Reduced-Physics Models. Fluid flow models can be based on first prin-
ciples (e.g., conservation of momentum, mass, and energy), empiricism, or a com-
bination of both. First principles can be combined with constitutive equations to
generate models that are valid over a wide range of operating conditions. However,
they might be cumbersome to develop and manipulate. Empirical models, on the
other hand, can be easy to develop, but might not be accurate outside the range of
data used for their calibration. Reduced-physics models (described in Section 3.1)
combine first principles with empirical constitutive equations (e.g. Darcy’s law,
ideal-gas law). These models are often easier to develop and manipulate than raw
first-principles models and maintain their applicability outside of the range of data
used for their calibration.
Engineers for example use reservoir simulation to identify the location and size
of unswept regions, to quantify the degree of communication between injectors and
producers, and to estimate the recovery efficiency in a region of the reservoir. These
insights allow engineers to propose changes to reservoir management strategies
designed to optimize the reservoir performance.
These models rely on a multivariate reservoir model to represent the variation in
well and reservoir behavior in time and space, and they often combine data-driven
and physics-based elements. Several multivariate reservoir modeling techniques
have been published. Here we discuss a recent model that has gained signifi-
cant traction over the past decade or so: namely, CRMs. CRMs get their name
from an analogy between fluid flow in porous media and current flow in an elec-
trical system (Bruce 1943). The derivation of CRM is based on enforcing mass
balance on the drainage volumes of producing wells and can account for the
influence of nearby injectors and the changes in the well operating conditions.
These models solve equations similar to those that are used in reservoir simulation,
but instead of using the reservoir pressure, the models directly estimate the well
rates. This approach transforms the set of second-order partial-differential equa-
tions usually solved in reservoir simulators into a first-order ordinary-differential
equation with an analytical solution. This simple method is fast enough to be used
in reservoirs with high well counts and has been applied to water- and CO2-
injection problems (Albertoni and Lake 2003; Yousef 2006; Sayarpour 2009a,
2009b; Weber 2009; Salazar-Bustamante et al. 2012; Holanda et al. 2015; Glad-
kov et al. 2017).
In effect, the CRM is an extension of the exponential decline curve model that
accounts for changes in operational conditions and for the influence of injectors.
The pressure effects are estimated using the CRM equation. The saturation effects
are usually modeled using empirical fractional flow models. The model offers a
prediction of the well performance that is based on its historical decline and its
production response from nearby injection that is quantified using connectivity fac-
tors with nearby injectors and parameters for an empirical fractional flow model.
Fig. 4.9 compares the performance of CRMs against other traditional reservoir engi-
neering methods such as the empirical power-law fractional flow model (EPLFFM)
and a Buckley-Leverett-based fractional flow model (BLBFFM).
120 18
Water Injection
Total Production
CRMT
Total Rate, 103×RB/D
40 6
0 0
1970 1975 1980 1985 1990 1995
Time, Years
Fig. 4.9—Overall performance match with different fractional flow models, Reincecke Reservoir
(Sayarpour et al. 2009a).
5000
0
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Liquid Rate
Bin 1 historical 30 days Bin 10 historical 300 days Bin 11 historical 330 days Bin 12 historical 360 days Bin 13 historical 390 days
Bin 14 historical 420 days Bin 15 historical 450 days Bin 16 historical 480 days Bin 17 historical 510 days Bin 18 historical 540 days
Bin 19 historical 570 days Bin 2 historical 60 days Bin 20 historical 600 days Bin 21 historical 630 days Bin 22 historical 660 days
Bin 23 historical 690 days Bin 24 historical 720 days Bin 25 historical 750 days Bin 26 historical 780 days Bin 27 historical 810 days
Bin 28 historical 840 days Bin 29 historical 870 days Bin 3 historical 90 days Bin 30 historical 900 days Bin 31 historical 930 days
Bin 32 historical 960 days Bin 33 historical 990 days Bin 34 historical 1020 days Bin 35 current 1045 days Bin 36 forecast 1075 days
Bin 37 forecast 1105 days Bin 38 forecast 1135 days Bin 39 forecast 1165 days Bin 4 historical 120 days Bin 40 forecast 1195 days
Bin 41 forecast 1225 days Bin 5 historical 150 days Bin 6 historical 180 days Bin 7 historical 210 days Bin 8 historical 240 days
Bin 9 historical 270 days Average Data – Current Average Data – Forecast Average Data – Historical
varying surface conditions (i.e., changes in tubing pressure control). The method also
serves to be more practical than analytical or semi-numerical model-based approaches
that cannot be scaled to every well in the field.
Summary
• Data analytics has revived and overcome the limitations of traditional fore-
casting methods (e.g., DCA, analogs, type curves) that might not be readily
applied outside of their strong assumptions.
• Reduced-physics or hybrid models offer the advantage of being extremely fast
to run and calibrate compared to numerical methods, while still retaining the
accuracy and ability for moderate extrapolation.
• Pure data-driven models need to be used with care for forecasting within sta-
tionary ranges based on the training population statistics.
The reduction in sensor costs and the introduction of new measurement devices
have increased the volume, frequency, and diversity of data that are being collected.
Data analytics has helped reservoir surveillance in two general directions: the auto-
mation of routine analysis and the assimilation of complex data sets.
Actual Data
Simulated Data
A significant amount of time has historically been spent by engineers locating the
data required for analysis. Routine engineering analyses, such as PTA or decline
curve matching are also time-consuming tasks that are subject to the bias of the indi-
vidual performing the interpretation. Automating such work offers the dual advan-
tage of ensuring consistency in the analysis and allowing engineers to spend more
time on problems that require deeper analysis.
the reservoir model constrains how quickly new data can be integrated. Analytical
models such as material balance or decline curves are fast enough to be updated
almost instantaneously, but they rely on a simplified representation of the problem
which often limits their applicability. More-general methods such as reservoir sim-
ulation can require months of work for an updated history match to be completed.
For decades, this posed a challenge for large mature waterfloods. Today, several
alternative data-driven models have been developed and used successfully, ranging
from reduced-physics models such as streamline-based methods (Thiele and Baty-
cky 2006), CRMs (Sayarpour 2008), or tracer-based approaches (Shahvali et al.
2012) all the way to fully data-driven models such as the neural networks used by
Nikravesh et al. (1996). The next section will cover this topic in more detail.
In addition to being used to accelerate data assimilation efforts, data analytics
methods have been leveraged to integrate complex data sets. Reservoir surveillance
usually includes measurements as varied as seismic and microseismic surveys, well
logs including image logs, cores, fiber-optic data, and pressure and production data
(e.g., Raterman et al. 2017). Data-driven models offer a flexible way to account for
eclectic sources of information that do not necessarily have a systematic framework
for integration into standard physics-based models. Such an approach is used heav-
ily in unconventional-reservoir modeling. Methods such as multivariate regression
have been used to predict the performance of unconventional wells from varied
data sets that could not be integrated into conventional physics-based models (Cie-
zobka et al. 2018; Burton et al. 2019). These models are used by unconvention-
al-field operators for business planning and development decisions.
As new methods emerge for processing and analyzing real-time data, new data
types (besides rate, pressure, and temperature) are also emerging that capture dif-
ferent physics (such as fiber-optic distributed temperature and acoustic sensing and
tracer methods).
Fig. 4.12—Predicted (solid lines) vs. blind-tested (dots) well interventions (Sharma et al. 2017).
and export facilities. This contrasts with the traditionally isolated modeling which
assumes that each individual model has fixed boundary conditions. For example, a
reservoir simulation model might assume a constant bottomhole pressure constraint
for its wells. IAM will instead pass information between a reservoir and a wellbore
model at each timestep so that the output of one component becomes the input for
the next one, forming a fully coupled system.
Surface network simulation technology was introduced by Startzman et al. (1977)
and has gained traction in the last few decades to improve the accuracy and precision
of production forecasts. In general, the success of IAM lies in integrating existing mod-
els, so that each component can be maintained as usual by the discipline specialists.
A key challenge for IAM has been to deliver acceptable run times and ensure
stability. To tackle these issues, data-driven prediction and forecasting models (e.g.,
surrogates and/or proxies) have been used in the IAM context. These data analytics
approaches are used to replicate the full physics at higher speeds.
Another practical challenge for IAM is to break the barriers and promote collabora-
tion in several operating companies, as the teams responsible for various components
of the IAM often have different cycle times for updating and managing the models. The
ownership of the combined entities as part of the IAM is often not well-established.
Various data-driven simplifications of the subsurface response have been used
in the context of IAM for driving complex asset management decision making. A
DCA-based algorithm is used to rapidly determine the rig drilling schedule, more
specifically to determine the investment timing for large offshore gas fields with the
objective of sustaining nominated gas volumes (Aulia and Ibrahim 2018). Applica-
tions have been published in many types of fields. In coalbed methane for example,
Shields et al. (2015) have successfully incorporated a predictive model in the form
of pressure- and time-dependent type curves into a hydraulic model of the surface
network to deliver an integrated production model.
Reservoir Management and the Digital Oil Field. A self-learning reservoir man-
agement strategy can be achieved by combining parametric fluid flow modeling,
model predictive control, and economic (net present value) optimization for data-
rich instrumented fields (Saputelli and Nikolaou 2003). Several industry case studies
have demonstrated the value of digital-oilfield technologies for successful reservoir
management (Adeyemi et al. 2008; Sankaran et al. 2009).
Sankaran et al. (2011) show how a digital-oilfield effort was used on the Agbami
Field in Nigeria to deliver significant reservoir management benefits. Seven case stud-
ies are presented that highlight the impact of the digital-oilfield approach taken—e.g.,
improved zonal-crossflow management, workover risk mitigation, and tighter con-
formance control. The work effort created millions of US dollars of incremental
value and allowed the operating company to get closer to management by exception
by automating routine tasks.
Summary
• Data analytics can improve reservoir surveillance in two general areas—
automation of routine analysis and assimilation of complex data sets.
• Common applications of data analytics in reservoir management include
identification of field development opportunities and automation of reservoir
optimization.
4.5.1. Screening Tools for EOR/IOR. Selecting the best EOR/IOR method is a
complex and time-consuming process involving significant amounts of data and a
considerable number of simulation runs. Ultimately, an economic model should be
coupled with the final production performance to assess the viability of the recovery
technique.
When selecting the recovery method, the fundamental parameters included in
any reservoir model are the rock, fluid, and formation properties because these are
unique to each reservoir. The selected recovery process that would return the highest
economic outcome determines the design parameters. Because the application of an
EOR/IOR process usually follows the primary-recovery mechanism, a large amount
of data and information has already been captured regarding the reservoir charac-
terization and is available for use during screening.
Many approaches have been taken for IOR/EOR screening; however, this section
will present only those that used intelligent data-driven analytics. The conventional
methods are often driven by field analogies, pilot projects on a portion of the field,
or prior operational experience on similar reservoirs. This approach could pose chal-
lenges such as lack of objective rules to define a reservoir type or the project comple-
tion time, or could have bias based on expert opinions.
Expert system-based approaches to EOR screening have been proposed (Guerillot
1988; Zerafat et al. 2011) that used an inference engine with an underlying knowl-
edge base system. Such approaches can only consider technical criteria because the
economic criteria can differ among geographical areas and companies. Other dimen-
sionality-reduction techniques have also been proposed that are based on clustering
and rule extraction algorithms (Alvarado et al. 2002) for screening EOR/IOR poten-
tial and have been applied to Venezuela mature fields.
More recently, a screening toolbox (Parada and Ertekin 2012) consisting of proxy
models that implement a multilayer cascade feed-forward back-propagation ANN
algorithm has been proposed for a diverse range of reservoir fluids and rock prop-
erties. The field development plan is featured in this tool by different well patterns,
well spacing, and well operating conditions. The screening tool predicts oil produc-
tion rate, cumulative oil production, and estimated production time for different
sets of inputs, which facilitates comparison of various production strategies such as
waterflooding, steam injection, and miscible injection of CO2 and N2. Drilling and
completion techniques, well pattern, well spacing, and the recovery mechanism were
used as design parameters. The ANN tool is able to recognize the strong correla-
tion between the displacement mechanism and the reservoir characteristics, as they
effectively forecast hydrocarbon production for different reservoirs. The blind tests
performed show that the ANN-based screening tool is able to predict the expected
reservoir performance within a few percent of error.
Other approaches have been reported with applications to field case studies such
as Bayesian classification and feature selection (Afra and Tarrahi 2015), probabilis-
tic principal-component analysis and Bayesian clustering (Siena et al. 2015; Tarrahi
et al. 2015), neural networks (Surguchev and Li 2000; Okpere and Njoku 2014;
Yalgin et al. 2018), and genetic algorithms (Armacanqui et al. 2017).
4.5.2. Waterflood Management. Waterflooding is the oldest and the most com-
mon IOR method used in the industry and is usually implemented following primary
recovery. Waterflooding is designed to compensate for the pressure depletion in the
reservoir and to displace incremental hydrocarbons. Waterfloods usually involve
many producers and injectors organized in different patterns depending on the res-
ervoir characteristics.
Waterflood optimization usually aims at maximizing the oil recovery per barrel
of injected water under specified reservoir and surface constraints (Sudaryanto and
Yortsos 2001). Given the amount of data usually available in waterflooding projects,
an excellent opportunity for data-driven optimization techniques is often presented.
Different methods are used for reservoir management, well placement, water shut-
off, rate optimization, and performance forecasting, to name only a few objectives
(Das et al. 2009; Lerlertpakdee et al. 2014).
Data analytics methods have been successfully applied to fieldwide waterflood
management in numerous reservoirs around the world. Models were trained to
forecast recovery and optimize the water injection and production targets. The
application of data-driven techniques is of specific interest when the reservoir is
too complex to be accurately modeled. This often occurs in complex geological
settings but is also often a result of the number of wells involved or the extensive
history of the field. Classical waterflooding optimization methods, such as basic
pattern calculations or advanced reservoir simulation modeling, can be impractical
for such projects.
A data-driven approach (Nikravesh et al. 1996) that takes advantage of the histor-
ical data from a large waterflood field is used to construct several neural networks,
which correlate the individual-well performance behavior as a function of the well
history itself and the injection/production conditions of the surrounding wells or
pattern. The intelligent system consists of an ensemble of neural networks with dif-
ferent functions. Specialized neural networks accurately predict wellhead pressure
as a function of injection rate, and vice versa, for all active injectors. However, the
primary neural networks are trained to history match oil and water production on
a well-by-well basis and predict future production on a quarterly or biannual basis.
The global optimization allows for designing the water injection policies that lead
to the minimum injected water and the highest oil recovery. The distinctive element
of this data-driven approach involves the division of the waterflooding field into
regions of similarly behaving wells, thus accounting for different reservoir prop-
erties and heterogeneities, and it captures the relationship between injection and
production within each region. In addition to injection/production optimization, the
system is also used for water-breakthrough time prediction, as well as infill drilling
performance.
gas injection (Salazar-Bustamante et al. 2012). This approach addresses the short-
coming of applying the CRM alone in the case of primary depletion followed by
a weak injection. Therefore, the integration of the two allows the DCA to cap-
ture the contribution of the primary depletion, while the CRM seizes the injection
component of the field performance. The advantage of this approach is that it is
data driven, relying entirely on the production/injection history. The capability of
this DCA-CRM model was demonstrated on a deep carbonate naturally fractured
reservoir under hydrocarbon gas and nitrogen injection; high reliability for short-
term production predictions was demonstrated, while allowing fast workflows and
interpretations.
Improvements of the classical CRM have been proposed (Holanda et al. 2015),
using a linear system of statespace (SS) equations to define the relationships
between inputs, outputs, and states that describe the dynamics of the system.
As such, the SS-CRM is a multi-input/multioutput matrix representation that pro-
vides more insights into reservoir behavior than analyzing only well-by-well per-
formance. The authors introduce three CRM representations and contrast their
performance—namely, integrated, producer based, and injector/producer based
(CRMIP). The work demonstrated that the highest accuracy and performance was
observed in the CRMIP, which was able to better capture the heterogeneous areas
and channel-like deposits.
CRM models are also used as a predictive model for waterflood performance diag-
nostics and optimization (Kansao et al. 2017). In this work, the CRM was generated
to develop a forecast by matching historical production and injection data, followed
by uncertainty analysis and optimization of a heterogeneous reservoir undergoing a
large waterflood development. The study demonstrated how the CRM was used to
identify water injection changes that led to increased oil production, while maintain-
ing or reducing the water cut.
CRM has also been compared with streamline-based methods that provide an
effective means to assess flow patterns and well allocation factors (Ballin et al. 2012).
That study concluded that neither method was sufficient by itself, and the best strat-
egy was to integrate them. The estimates of allocation factors from CRM were influ-
enced by data quality and quantity over the fit interval, while the streamline-based
method had intrinsic model uncertainty.
Joaquin Valley demonstrated that the approach delivered an improved well selection
for cyclic steam and an optimized injected steam volume. An incremental-production
response of 44% was achieved, yielding a 77% profitability increase (Sarma et al.
2017a).
A different, yet related data-driven approach was used to mine the cyclic steam
performance of the Cat Canyon Field. Historical steam injection volumes, rates,
quality, injection duration, and corresponding production response were collected
from more than 600 wells operated in the field. A data mining approach was used to
discover patterns of injection/production performance. Using patterns information,
a neural network was trained to predict the expected production outcome from the
wells, thus delivering a ranking of the best candidates to be placed on steam on any
given day. A better well ranking and selection was achieved, which was additionally
combined with an optimized implementation schedule, which delivered significant
incremental value to the field.
Summary
• Data analytics has been used in the context of analog selection to provide
more-robust predictions of the expected reservoir performance.
• Various data analytics have been developed to optimize various aspects of
mature waterfloods such as new well targets, water shutoff, or continuous
optimization target rates for production and injection wells.
• Steamflooding has also benefited from analytics where neural networks have
been trained on historical data to help forecast the performance of new proj-
ects. Another approach consists of training neural networks on the basis of
simulation results to provide a fast and accurate prediction engine for steam-
flood optimization.
4.6. Reservoir Simulation. Reservoir simulation is one of the most popular meth-
ods for making predictions of reservoir performance under different development
and operation strategies. The results of performance prediction serve as the basis
for many capital-intensive investment or reservoir management decisions. Reser-
voir simulation is often computationally intensive because of the large number of
cells required to represent the whole reservoir and/ or the complex physics such as
multiple phase/multiple component and coupling with the surface network or with
geomechanics. It is common that a full reservoir simulation run takes many hours
to days even with most advanced parallel solution using multiple cores in a high-
performance computing (HPC) environment.
On the other hand, the inputs of reservoir simulation including reservoir static
and dynamic properties are mostly uncertain because of limited sampling or error in
the measurement. This results in uncertainty in performance predictions. It has been
well-recognized that important reservoir management decisions need to account for
these uncertainties to better manage the risk. Quantifying uncertainties in perfor-
mance prediction often requires performing a large number (hundreds to thousands)
of simulations, which adds additional burden on reservoir simulation. Similarly,
model calibration (history matching) and reservoir development optimization also
often require many simulation runs by using different combinations of model-pa-
rameter values and development/well operation strategies.
Data analytics has been widely used to reduce such burdens in reducing the com-
putation time and costs. The two main areas that are widely used or researched on
are proxy modeling and reduced-order modeling.
This section discusses the data analytics methods applied to model generated data
including proxy modeling, reduced-order modeling, and ensemble variance analy-
sis, followed by introducing the latest attempts in the development of data-driven
physics-constrained predictive modeling.
design of experiments (DoE) where the minimum number of simulation runs are
selected to obtain maximum information based on the uncertainty space (Montgomery
2012). For more details about ED or DoE in petroleum engineering applications,
readers can refer to Friedmann et al. (2003) or Yeten et al. (2005).
Several types of proxies are used in practice. Some are based on simple analytical
functions, while others use numerical approximation that cannot necessarily be rep-
resented explicitly by a simple function.
An analytical proxy approximates the relationship between input factors and
output response by an analytical function, such as a polynomial. The proxy is
constructed by fitting the function to the data points using regression techniques
(i.e., least squares). It determines the coefficients of the analytical function by min-
imizing the sum of the squares of the errors between the data points and the fitted
function values. Each data point represents a simulation run selected from the design
matrix. If the number of data points equals the number of unknown coefficients of
the analytical function, the proxy will traverse all the data points. If this is the case,
the proxy is data exact. However, because the proxy is only an approximation, more
data points than the number of coefficients should be used to reduce error. In other
words, the least-squares method should be applied to solve overdetermined prob-
lems. Hence, analytical proxies are often not data exact.
A numerical proxy approximates the relationship between input factors and out-
put response by attempting to connect all the data points using a surface that is
generated from a numerical algorithm and cannot be represented by an analytical
function. Such a proxy is called data exact if it traverses all the data points. Three
commonly used numerical proxies are Kriging, splines, and neural networks.
Kriging predicts the value of a function at a sampling point by calculating the
weighted average of the function values of the data points. Kriging assumes that all
the points are spatially correlated with each other. The correlation is described by
a variogram model, (h), which is a function of the Euclidean distance (h) between
two points. The larger the distance, the more variant the two points are. The weights
for computing the weighted average are obtained from the covariance between the
sampling point and each data point and between all the data points themselves.
A larger weight is assigned to a data point if it is closer to the sampling point. The
weights are computed such that the squared error of the estimated function value of
the sampling point is the smallest. Kriging is data exact. More details can be found
in any geostatistics textbook, such as Journel and Huijbregts (1978) or Deutsch and
Journel (1992).
A spline function is defined by piecewise polynomials and has a high level of
smoothness at the knots where polynomials connect. Each knot refers to a data point
whose function value is known. More details on spline-based proxies can be found
in Li and Friedmann (2005).
ANNs are machine learning methods that mimic the operations of the cortical
systems of animals. An ANN model consists of many interconnected nodes, like
the neurons inside a brain. Each node accepts inputs from other nodes and gener-
ates outputs that are based on the inputs it receives and/or the information stored
internally. These nodes are grouped in multiple layers: an input and an output layer,
and one or more hidden layers in the middle. The input layer is analogous to a sense
organ, such as eye, ear, or nose, which receives inputs from outside. The hidden lay-
ers process the inputs to produce the corresponding response that is then reported
by the output layer. Like a human being, an ANN needs to be trained to produce
proper output response on the basis of given inputs. The training is done using the
data points where the relationship between input factors and output responses is
known. ANNs are often not data exact. More theoretical and algorithmic details can
be found in Reed and Marks (1999).
Other popular machine learning methods include random forest, support vector
machine, or gradient boost methods. Details of these machine learning algorithms
can be found in Mishra and Datta-Gupta (2017).
10
8
OPC
7
5
600
800 50
GOR 40
30
20 K
1000
10
Cross Validation. One of the most important practices when building proxies is
to implement a verification process using blind tests. Blind tests are sample points
where a simulation model is run, but the response is not used in the creation of the
proxy. Instead, the proxy is used to estimate the response at the sample location,
which is compared to the calculated value. Blind tests are critical to validate a proxy
but are sometimes deceiving because a successful blind test only guarantees a robust
proxy in a local part of the domain. Some common methods for cross validation are
• Exclusion of some experiments during the calibration of the training, so that the
experiments can later be used as blind tests. Such practice should be used with
designs that offer some redundancy because it might deteriorate the sampling.
• Addition of some experiments that can be chosen using ED or Monte Carlo
sampling.
• Leave-one-out cross validation, where every run is used as a test point for the
others. For each run performed, a proxy is built using the other runs of the
design. The proxy is then used to estimate the response, which is compared to
the actual response from the run. The algorithm loops through the runs, so that
all runs are used alternatively as test and calibration points. This method has
been extended to consider more than one experiment at a time, which is known
as k-fold cross validation.
Fig. 4.14 represents a graphical example of a cross validation. The ED points are
shown in black, and the validation points are shown in red.
9
0.
5
9
OPC from Proxy 8
8
7
–0
OPC 7
.5
0
6 60 50 6
0
5 70 40
600 0 0
30 5
GOR 8 K 5 6 7 8 9
60 0
GOR 800 40 90 20 OPC from Simulator (Actual)
0 20 K 00 10
100 0 10
0.030 1.00
0.90
0.025
0.80
Monte Carlo
Gaussian PDF 0.70
0.020 Gaussian CDF
Cum Frequency
0.60
Fx (X)
0.015 0.50
0.40
0.010
0.30
0.20
0.005
0.10
0.000 –
0 5 10 15 20 25 30 35 40
Recovery, % OOIP
Fig. 4.15—Probability density function (PDF) and cumulative distribution function (CDF) for
primary depletion of channelized reservoirs undergoing bottomdrive (Friedmann et al. 2003)
Iterative Response Surface. The common process of proxy building based on a set
of sampling might not be sufficient to build reliable and accurate proxies, particu-
larly when the problem is nonlinear. One way to improve this is through an iterative
process where the new sampling points are iteratively added to update proxies. Some
optimization algorithm is used to optimize the locations of the most informative
points between iterations (Castellini et al. 2010; Wantawin et al. 2017).
To initialize the algorithm, any appropriate ED can be used as the initial sam-
pling points, such as Plackett- Burman, D-optimal, Latin hypercube, or Hamers-
ley sequence. After the initial proxy is built, the following proxy properties can be
considered when selecting the new sampling points: function value, scalar gradi-
ent, bending energy, curvature, and distance from existing points. A combined score
can be computed for each proposed new sampling, and the best samplings can be
selected. The procedure stops when the number of iterations requested by the user is
complete or when the changes to the response surface are below a certain threshold.
For example, Fig. 4.16 shows three iterations of the proxy building process.
A spline-based proxy is initially built using nine samples (shown as the red points
Fig. 4.16—Iterative proxy generation; first three iterations (rows) showing the sample loca-
tions (red for current and blue for next sample locations) and proxy shape (right).
in the top row). On the basis of the initial proxy, an additional 10 samples (blue
points) are selected that are based on multiple selection criteria. New simulations are
performed, and the proxy is updated with the new samples. This process is repeated
until a stable proxy is constructed or a maximum number of iterations is reached.
4.6.2. Reduced-Order Modeling. Like the proxy model described in the preced-
ing section, reduced-order modeling can be considered as other surrogate models
can be implemented in place of the traditional reservoir simulator for computation-
ally intensive applications such as production optimization and history matching.
Reduced-order models apply fast but approximate numerical solutions that are con-
sistent with the underlying governing equations, which is the main difference from
the proxy modeling described in the preceding section.
Reduced-order modeling procedures, which have been applied in many applica-
tion areas including reservoir simulation, represent a promising means for construct-
ing efficient surrogate models. Many of these techniques entail the projection of
the full-order (high-fidelity) numerical description onto a low-dimensional subspace,
which reduces the number of unknowns that must be computed at each timestep.
We can classify existing approaches applied within the context of reservoir simula-
tion as grid-based methods, system-based methods, and snapshot-based methods
(He 2013).
In grid-based methods, constructing a coarser grid and then computing properties
for this grid reduce the dimension of the problem. The original problem is then solved
on the coarser grid. Examples are upscaling and multiscale methods. S ystem-based
methods are derived from system control theory. By introducing a basis matrix and
a constraint reduction matrix, both generated from the full-dimensional system, a
full-dimensional state matrix can be reduced to a much-lower-dimensional linear
time-invariant system, which can be solved much more efficiently (Bond and Daniel
2008).
We will focus on the description of snapshot-based methods that are more com-
monly used in the reservoir simulation community. Unlike system-based methods,
which derive the basis and reduction matrices from the system matrices, snap-
shot-based methods derive the basis matrices from snapshots, which are the states at
each timestep of training simulations. Most methods in this category are based on
proper orthogonal decomposition (POD) or its variants.
2500
P1 Water Prod. Rate (bbl/d)
1000
P1 Water Prod. Rate (bbl/d)
Ensemble True
2000 800
1500 600
1000 400
500 200
0 0
0 200 400 600 800 0 200 400 600 800
Time (day) Time (day)
(a) Initial (b) HF200
1000 1000
P1 Water Prod. Rate (bbl/d)
P1 Water Prod. Rate (bbl/d)
800 800
600 600
400 400
200 200
0 0
0 200 400 600 800 0 200 400 600 800
Time (day) Time (day)
(c) HF50 (d) HF50+TPWL150
Fig. 4.17—Production profiles and history match for a producer water rate using high fidelity
(HF) ensembles and TPWL (He et al. 2011a).
4.6.3. Reduced-Physics Model. Another proxy model that we will not discuss in
this book is reduced-physics models that accelerate flow simulations by simplifying
the physics. Streamline methods (Batycky et al. 1997; Datta-Gupta and King 2007)
fall into this category. Streamline methods decouple the flow and transport equa-
tions and then solve the transport equations as a series of 1D problems along each
streamline. This simplification can lead to substantial speedups relative to traditional
simulation for some problems.
Streamline methods have been applied for a wide range of problems including
production optimization (Samier et al. 2002; Thiele and Batycky 2003; Tanaka
et al. 2017) and history matching (Milliken et al. 2001; Wen et al. 2003). These
approaches approximate many key effects, and though they have been widely used
for waterflooding applications, they are not commonly applied for compositional
problems. In addition, the overall speedup using streamline methods is still limited
because of the need to solve the full-order equations at some timesteps.
More recently, the concept of DTOF has been extended to calculate the propaga-
tion for the pressure front in the reservoir for black-oil (Xie et al. 2012; Zhang et al.
2016; Lino et al. 2017) and compositional simulation (Lino et al. 2017). The approach
consists of two decoupled steps—calculation of the DTOF using the fast-march-
ing method and fully implicit simulation using DTOF as a spatial coordinate (Lino
et al. 2017a, 2017b). The computational efficiency is achieved by reducing the 3D flow
equations into 1D equations using the DTOF as spatial coordinate, leading to orders
of magnitude faster computation over full 3D simulation. Computational-time savings
also increase significantly with grid refinement and for high-resolution models.
method, designed to support rapid decision making under uncertainty. Typical use
cases are
• Design optimal pilot program or surveillance plan to maximize value of infor-
mation before the actual data are collected.
• Quickly update performance predictions without needing to follow the tradi-
tional model calibration process to accelerate subsequent reservoir management
decisions. In this approach, a series of simulation runs are performed using some
sampling strategy in the parameter and/or operating space (such as DoE/ED)
and the input and response parameters are recorded in a database.
The development of oil and gas reservoirs is associated with substantial risk
because the subsurface condition is highly uncertain. Data acquisition programs
such as surveillance and pilots are routinely conducted in the hope of minimizing
subsurface uncertainties and improving decision quality. However, these programs
themselves involve significant capital investment. Therefore, before any data acqui-
sition program is implemented, it is crucial to be able to evaluate the effectiveness
and quantify the value of the program so that different designs can be compared,
and the best investment decision can be made. The major challenge of estimating the
effectiveness of data acquisition programs is that the data are unknown at the time
of the analysis.
As surveillance data are obtained from the field, the cumulative probability distri-
butions (CDFs) of the key metrics need to be updated accordingly. This is normally
accomplished by a two-step approach as shown on the left side of Fig. 4.18. First,
the data are assimilated through history matching to calibrate the model parameter
uncertainties to obtain their posterior distributions. Then, a probabilistic forecast is
performed on the basis of the posterior distributions of the parameters to update the
S-curve of the key metrics.
Objectives Objectives
Fig. 4.18—Traditional model-driven approach (left) vs. direct forecasting approach (right) for
prediction and update of prediction using observation data.
This two-step approach can be very time-consuming. This is because the relations
between the objectives and the model parameters, and the relations between the data
and the model parameters, are often highly nonlinear. In addition, the potentially
large number of model parameters makes it very hard for any history-matching
algorithms to accurately capture the posterior distribution of the model with a small
number of simulation runs. Because of these challenges, updating the CDF with new
data using the traditional approach can take weeks or months.
On the other hand, many of the field development decisions are time critical and
there might not be enough time/resources to calibrate the model or to perform any
simulations after data come in. In those cases, there is a need for rapid interpretation
of the incoming data and updating of the S-curve without going through a full-
blown history-matching and probabilistic-forecast process.
In a recent study, the approach called direct forecast or predictive uncertainty anal-
ysis (also called data-space inversion) (Scheidt et al. 2015; Satija and Caers 2015;
Sun et al. 2017; He et al. 2015a, 2015b, 2017a, 2017b, 2018) has been proposed.
Fig. 4.18 shows the concept of the direct forecast method on the right, which can be
considered as a data analytics or machine learning approach using model-generated
data.
In direct forecast, the statistical relationship between the proposed measurement
data and the business objective is established based on simulation model responses
before the data acquisition. This direct relationship can then be used to rapidly
update the prediction of the objective once the data become available. These pro-
cesses are illustrated in Fig. 4.19. Data analytics algorithms can be used to address
these challenges (He et al. 2018).
20/10/20 5:44 PM
58 Data Analytics in Reservoir Engineering
Decision/Risk Decision/Risk
Probabilistic Outcome Probabilistic Outcome
Can only be as good Can only be as good
Forecast Forecast
as data can reveal and as model can reveal
(f) (f)
all parameters are included
Data-driven Future Model-driven Future Model-driven
forecast forecast Data-Driven forecast
o Model error physics-based o Model error
models
o Non-unique o Non-unique
Measure error,
o Uncertainty o Uncertainty
missing data Data Model Data Model
(d) Conditioning/hm (m) Measure error, (d) (m)
missing data
Past Present Past Present
Inverse problem
in other sections (Section 3 and Section 5.6.1) of this book. Here, we focus on the
approaches to combine conventional physics-based reservoir engineering models
with big-data and data-driven technologies (right side of Fig. 4.20). In this approach,
an ensemble of models (simple but sufficient to account for the important phys-
ics) are constructed on the basis of available data and known physics. Forecast is
obtained directly after models are calibrated to the observed data by means of a
simple and fast classification or data assimilation method. The predictive power
of purely data-driven methods is often questionable outside the space covered by
actual data. It is hopeful that with this physics-constrained hybrid approach, we
can simultaneously obtain the high performance of the data-driven methods and the
better predictability of physics-based models outside the data range. In the following
subsections, we describe several emerging technologies that aim to achieve this goal.
Well
V
Vp2 p1
Vp3
V Vp4
p5
Each ‘‘ring’’ is
3D modeled as a
single cell
1D Well
Vp1 Vp2 Vp3 Vp4 Vp5
T0 T1 T2 T3 T4
Summary
• Data analytics can be used in conjunction with numerical simulation to reduce
computation time and cost.
• Proxy modeling, reduced-order modeling, reduced-physics modeling, and
data/physics modeling are some of the modern data analytic approaches used
with reservoir simulation.
are relatively well-understood and -documented. The modeling approach taken for a
study is thus chosen on the basis of the data, time, and resources available.
The physics at play in unconventional reservoirs remains an area of active
research. The complexity of the physical phenomena involved, combined with the
scale and pace of development, has pushed the industry to adopt the use of data-
driven models to answer reservoir management questions. Reservoir simulation and
simple analytical methods remain in use, but a new type of model has recently gained
popularity to answer questions related to well spacing and targeting or optimal com-
pletion design and practices.
These new data analytics efforts usually gather data on many wells and try to
determine the key factors driving economic performance (LaFollette and Holcomb
2011, 2012; Portis et al. 2013). Data-driven models are being used at different stages
of the life cycle of unconventional reservoirs. First-pass models are often created to
support transactional decisions during acquisition or divestiture projects to assess
and rank the potential value of various basins, specific land positions, or entire
companies. Some models are built early in the development of a play to support
the appraisal efforts by quantifying uncertainties and methodically reducing them
through data acquisition or field pilot efforts organized through ED. Today, most
operators involved in unconventional reservoirs maintain evergreen models that
help optimize their development efforts. These models help answer questions related
to drilling and completion designs and practices or well targeting and spacing.
This section presents a general approach followed by applications for develop-
ing data-driven models for unconventional reservoirs (Courtier et al. 2016; Wicker
et al. 2016, 2017; Burton et al. 2019). Published references will be provided when
possible, but several companies using data analytics to model unconventional reser-
voirs are reluctant to publish details concerning the structure or the impact of their
work because of the proprietary nature of the data sets and the relative novelty
of the methods, which are still evolving. The remainder of this section describes
sequentially the various steps usually taken to develop a data-driven model for an
unconventional asset: data collection, attribute modeling, data reduction, machine
learning, and model distribution.
are used to define the stratigraphic and structural framework and to populate the
model with petrophysical properties. Seismic surveys, petrophysical logs, and core
data are integrated to estimate key rock properties such as bulk volume, rock types
and facies, mineral volumes, porosity, and permeability. Additional properties are
often modeled that were not usually estimated in conventional reservoirs. Wicker
et al. (2016) report aggregating factors such as brittleness, sweetness, and curvature
that would typically not be used in a conventional modeling workflow. The natural
fracture system is typically characterized, either through volume-average properties
or through a discrete fracture network. Key geomechanical properties are estimated
along with the regional stress field. A variety of other seismic attributes are also rou-
tinely proposed as potential indicators of well performance. The resulting geologic
model usually presents itself as a series of 2D maps or as a 3D geocellular model or
property volume (Courtier et al. 2016; Wicker et al. 2016, 2017).
4.7.1.3. Drilling and Completion Data. Information about the well design and con-
dition is often included in the analysis. Drilling and completion design as well as oper-
ational execution parameters are aggregated on a well-by-well or stage-by-stage basis.
On the drilling side, the well trajectory is usually processed to provide insights
into the completed well length, orientation of the well in azimuth and inclination
(Wicker et al. 2017), and tortuosity. Measurements taken during drilling operations,
such as measurement while drilling or mud logs, are sometimes included.
The completion data set is usually carefully compiled. When building a model
from public data sources, the only information that is consistently available might
be limited to the total volume of fluid and proppant injected during the fracturing
process. When available, a much richer data set is included, that will include the
type of fluid and proppant being used, the number of stages, the types and number
of perforations. The type of completion (e.g., plug and perforate, sliding sleeves,
or other) is also often included. Pump curve data are typically processed to yield
key information describing the fracture design and the reservoir response (Temizel
et al. 2015). A set of attributes is built that typically includes the leakoff, breakdown,
propagation, instantaneous shut-in, and closure pressures, along with the average
and maximum treating rates and the timing of the injection.
4.7.1.4. Lift Design and Production Operations. Various artificial lift methods
are used in unconventional plays. The methods depend on a variety of factors, includ-
ing depth, pressure gradient, temperature, expected water cut, and GOR. Including
the artificial lift system design and its operational parameters can be very critical to
help the model explain differences between wells with different lift conditions. The
data can be included in several ways. When the lift systems across all wells in a study
are standardized, feeding the operational parameters of the lift methods might be
sufficient. When a more diverse set of lift systems exist, it might be beneficial to esti-
mate the flowing bottomhole pressures of each well and feed that information to the
model. This allows the analytics model to better quantify the influence of changing
lift conditions with a relatively restricted set of input parameters.
Precomputing the flowing bottomhole pressures for wells has the added benefit
of accounting for different flowback strategies. In some basins, different flowback
strategies have an influence on the recovery of the wells. This can easily be accounted
for by feeding the model with the flowing-bottomhole-pressure measurements or
estimates, which will account for the choke settings, temperature, pressures, fluid
properties, and other phenomena.
4.7.2.2. Scenario Analysis and Optimization. Data analytics models for uncon-
ventional reservoirs are often used by asset team members for scenario analysis.
The team tests various completion designs, well spacing or targeting strategies, or
flowback or artificial lift approaches and assesses their potential merit in terms of
recovery and economic performance (Burton et al. 2019). Sometimes the optimal
scenarios are identified more rigorously by coupling the analytics models to an opti-
mization engine. Wicker et al. (2017) have reported using Monte Carlo analysis on
their multivariate analysis model to better understand the individual influence of key
design parameters.
Although the use of data analytics to model unconventional reservoirs is rela-
tively widespread among operators, the exact impact of these efforts is rarely publi-
cized. The benefits are usually categorized either in work efficiency gains or in direct
improvements to the development strategy. For example, an operating company
(Devon) was reported (Jacobs 2016) to have dramatically improved their analysis
efficiency through data-driven models. It was reported that an analysis that used to
take 1 week to be performed on 50 wells was accelerated and expanded to take less
than 10 minutes for the lower 48 states. Laredo has reported that their model rep-
resented a fundamental driver for value creation within field development planning
and quantified the potential effect of their work by showing that higher or lower
production attributes could lead to a production range of 75 to 130% around their
type curve.
Summary
• Today, because of the scale and pace of unconventional developments, data
analytics is used by most operators to help forecast and optimize new uncon-
ventional-field wells and drilling units.
• Algorithms that have reportedly been used successfully include multivariate
regression and random forest.
5.1. Data. According to a recent report by Forrester, less than 0.5% of all data col-
lected is ever analyzed. Just a 10% increase in data accessibility will result in more
than USD 65 million additional net income for a typical Fortune 1000 company. Oil
and gas companies have started to invest heavily in digitization, data acquisition,
and data ownership to transform the business and realize the value proposition as
seen in other industries. This heavy investment in big data and analytics is expected
5.2. Field Automation. For intelligent fields, there is a growing need for tools that
enable engineers to “see” farther into the reservoir in order to adjust the flow control
strategy as required. With sensors becoming less costly, more widespread, and more
connected, oil and gas fields are becoming more instrumented. Downhole sensors in
wells and flowlines are becoming more common, to measure pressure, temperature,
multiphase flow rates, and fluid density. Novel measurement methods such as fiber
optics (distributed acoustic sensing and distributed temperature sensing) have made
possible measurements that offer higher spatial resolution and data frequency than
traditional sensors (Smolem and van der Spek 2003).
From a reservoir management perspective, the greatest value can be achieved
when implementing intelligent flow control systems coupled with surveillance tools.
A robust surveillance strategy is fundamentally enabled by resilient data acquisi-
tion systems and active flow control systems that facilitate optimization of recovery
efficiency.
Standardized end devices and segregation of the automation network will pro-
vide a sound platform for the flourishment of internet of things (IoT) in the oil
field. Recent advancements in IoT devices and edge analytics will allow engineers
to collect more data and run their algorithms in real time near the devices to opti-
mize operations. Further, the intersection of cloud and mobility is creating new
opportunities for engineers to collaborate and access relevant data from virtually
anywhere and at any time. An increasing number of reservoir surveillance and
simulation systems are exploring cloud-based mobile applications, which adds new
5.4. People. One of the severe impediments to reaping the benefits of data analytics
in reservoir engineering applications is the shortage of practitioners with hybrid skill
sets—combining domain expertise, computing skills, and data analytics. Companies
and universities are starting to realize the need for this new breed of engineers by
establishing new innovation centers.
Companies will also have to design new career development paths for these res-
ervoir engineers so that these roles are seen as an integral part of doing business
and as fundamentally transforming the business. Up- skilling current employees and
democratizing data are essential to alleviate the impact of any skills gap. Current
employees can be trained internally or externally. There is a wealth of training pro-
grams available that can provide them with data science skills to supplement the
business experience. However, often they lack industry specific examples, business
context, application idiosyncrasies, and detailed know-how. Larger oil and gas com-
panies can also set up analytics centers of excellence to ensure that more-traditional
engineers receive some training in the use of data. Industry forums (such as SPE) can
also play an important role in improving the awareness and raising the skills level
through training and certification programs.
Different levels of expertise might be needed to successfully establish the benefits
of data-driven methods. Basic awareness programs will provide a foundation for
managers and decision makers. Individuals who will be responsible for analysis of
data analytic products will need to be skilled in basic techniques, and understand
their strengths and weaknesses to review work done by others. Practitioners tasked
with developing data products will benefit from advanced training with hands-on
experience.
Academic institutions are starting to cater to these new industry skills through
certificate programs, specialized courses, and cross-disciplinary curriculum combining
data science and scientific computing. Massive open online courses with practically
unlimited participation have focused on data science skills that have been popular in
recent years. A few digital petroleum engineering programs have been established in
select schools in response to these new industry demands.
Gartner estimates that over 40% of data science tasks will be automated by 2020,
resulting in increased productivity and broader usage of data and analytics by citizen
data scientists. This will further increase the reach of new technologies across the enter-
prise as well as help overcome the skills gap. Automation will simplify tasks that are
repetitive and manually intensive. Increasingly, oil and gas companies have renewed
their interests in implementing digital operations platforms that cater to these needs.
So, what can automation reasonably achieve in the next few years?
• Data—Companies that value data as an asset are pursuing integration of data
sources to provide clean and high-quality data for data-driven methods. Auto-
mated pipelines for data engineering and data processing will limit tedious
manual efforts and allow management by exception. However, this does not
eliminate the need of practitioners for exploratory data analysis, assessing
data needs, and making sense of the data.
• Models—Another key cognitive task related to model selection and search for
architectures is also being addressed. Leading technology companies and open
source initiatives in data analytics have been investing heavily in developing
automated machine learning and neural architecture search frameworks. They
aim to provide a suite of machine learning tools that will allow easy training of
high performance deep neural networks (Hinton et al. 2012), without requir-
ing specialized knowledge of deep learning or AI. Other major efforts are in
the area of interpretability through explainable AI that can help deliver mean-
ingful insights for decision-making purposes. Some of the biggest advances are
yet to come through discovering more-robust training methods.
The authors are optimistic about what the future holds for the role of data ana-
lytics in reservoir engineering and the oil and gas industry in general. Just 10 years
ago, only a handful of colleges in the US offered big data or analytics degree pro-
grams. Today, more than 100 schools have data-related undergraduate and gradu-
ate degrees, as well as certificates for working professionals or graduate students
wanting to augment other degrees. The popular demand for these courses portrays
a healthy pipeline of job candidates that will be realized over time. The confluence
of these new hybrid engineers with modern skill sets, increasing data availability,
and demonstration of more successful applications of data analytics will make these
methods part of routine business processes instead of optional methods.
6. References
Aanonsen, S. I., Naevdal, G., Oliver, D. S. et al. 2009. The Ensemble Kalman Filter in
Reservoir Engineering—A Review. SPE J. 14 (3): 393–412.
Abbeel, P., Quigley, M., and Ng, A. 2006. Using Inaccurate Models in Reinforcement
Learning. Presented at the In Proceedings of the 23rd International Conference on
Machine Learning, Pittsburgh, Pennsylvania, USA, 25–29 June.
Abdel-Aal, R. E. 2002. Abductive Networks: A New Modeling Tool for the Oil
and Gas Industry. Presented at the SPE Asia Pacific Oil and Gas Conference and
Exhibition, Melbourne, Australia, 8–10 October. SPE-77882-MS. https://doi.org/
10.2118/77882-MS. https://doi.org/10.2118/77882-MS.
Adeeyo, Y. A. 2016. Artificial Neural Network Modeling of Bubblepoint Pressure and
Formation Volume Factor at Bubblepoint Pressure of Nigerian Crude Oil. Presented
at the SPE Nigeria Annual International Conference and Exhibition, Lagos, Nigeria,
2–4 August. SPE-184378-MS. https://doi.org/10.2118/184378-MS. https://doi.org/
10.2118/184378-MS.
Amudo, C., Xie, J., Pivarnik, A. et al. 2014. Application of Design of Experiment
Workflow to the Economics Evaluation of an Unconventional Resource Play. Pre-
sented at the SPE Hydrocarbon Economics and Evaluation Symposium, Houston,
Texas, 19–20 May. SPE-169834-MS. https://doi.org/10.2118/169834-MS.
Anifowose, F., Ewenla, A., and Eludiora, S. I. 2011. Prediction of Oil and Gas Res-
ervoir Properties Using Support Vector Machines. Presented at the International
Petroleum Technology Conference, Bangkok, Thailand, 15–17 November. IPTC-
14514-MS. https://doi.org/10.2523/IPTC-14514-MS.
Archenaa, J. and Mary Anita, E. A. 2015. A Survey of Big Data Analytics in Health-
care and Government. Procedia Comput Sci. 50 (2015): 408–413.
Arief, I. H., Forest, T., and Meisingset, K. K. 2017. Estimating Fluid Properties Using
Surrogate Models and Fluid Database. Presented at the SPE Europec, London,
UK, 3–6 June. SPE-185937-MS. https://doi.org/10.2118/185937-MS.
Armacanqui T., J. S., Eyzaguirre G., L. F., Prudencio B., G. et al. 2017. Improve-
ments in EOR Screening, Laboratory Flood Tests and Model Description to
Effectively Fast Track EOR Projects. Presented at the Abu Dhabi International
Petroleum Exhibition and Conference, Abu Dhabi, UAE, 13–16 November. SPE-
188926-MS. https://doi.org/10.2118/188926-MS.
Arps, J. J. 1945. Analysis of Decline Curves. In Transactions of the Society of Petro-
leum Engineers, Vol. 160, Number 1, 228–247. Richardson, Texas: SPE. SPE-
945228-G. https://doi.org/10.2118/945228-G.
Artus, V., Houze, O., and Chen, C.-C. 2019. Flow Regime-Based Decline Curve for
Unconventional Reservoirs: Generalization to Anomalous Diffusion and Power
Law Behavior. Presented at the Unconventional Resources Technology Confer-
ence, Denver, Colorado, USA, 22–24 July. URTEC-2019-293-MS. https://doi.org/
10.15530/urtec-2019-293.
Aulia, A. and Ibrahim, M. I. 2018. DCA-Based Application for Integrated Asset Man-
agement. Presented at the Offshore Technology Conference Asia, Kuala Lumpur,
Malaysia, 20–23 March. OTC-28308-MS. https://doi.org/10.4043/28308-MS.
Ballin, P. R., Shirzadi, S., and Ziegel, E. 2012. Waterflood Management Based on
Well Allocation Factors for Improved Sweep Efficiency: Model Based or Data
Based? Presented at the SPE Western Regional Meeting, Bakersfield, California,
USA, 21–23 March. SPE-153912-MS. https://doi.org/10.2118/153912-MS.
Bandyopadhyay, P. 2011. Improved Estimation of Bubble Point Pressure of Crude
Oils: Modeling by Regression Analysis. Presented at the SPE Annual Technical
Conference and Exhibition, Denver, Colorado, USA, 30 October–2 November.
SPE-152371-STU. https://doi.org/10.2118/152371-STU.
Batycky, R. P. Blunt, M. J., and Thiele, M. R. 1997. A 3D Field-Scale Streamline-Based
Reservoir Simulator. SPE Res Eng 12 (4): 246–254.
Bertocco, R. and Padmanabhan, V. 2014. Big Data Analytics in Oil and Gas. Bain,
26 March 2014, https://www.bain.com/insights/big-data-analytics-in-oil-and-gas
(accessed 18 April 2019).
Bestagini, P., Lipari, V., and Tubaro, S. 2017. A Machine Learning Approach to
Facies Classification Using Well Logs. Presented at the 2017 SEG International
Exposition and Annual Meeting, Houston, Texas, 24–29 September. SEG-2017-
17729805. https://doi.org/10.1190/segam2017-17729805.1.
Bhark, E. and Dehghani, K. 2014. Assisted History Matching Benchmarking:
Design of Experiments-Based Techniques. Presented at the SPE Annual Technical
the SPE Western Regional Meeting, Garden Grove, California, USA, 27–30 April.
SPE-174055-MS. https://doi.org/10.2118/174055-MS.
He, J., Xie, J., Sarma, P. et al. 2015b. Model-Based A Priori Evaluation of Surveillance
Programs Effectiveness Using Proxies. Presented at the SPE Reservoir Simulation
Symposium, Houston, Texas, USA, 23–25 February. SPE-173229-MS. https://doi.
org/10.2118/173229-MS.
He, J., Xie, J., Wen, X.H. et al. 2016a. An Alternative Proxy for History Matching
Using Proxy-For-Data Approach and Reduced Order Modeling. J Petrol Sci Eng
146: 392–399.
He, J., Xie, J., Sarma, P. et al. 2016b. Proxy-Based Workflow for a Priori Evaluation
of Data-Acquisition Programs. SPE J. 21 (4): 1400–1412.
He, J., Sarma, P., Bhark, E. et al. 2017a. Quantifying Value of Information Using
Ensemble Variance Analysis. Presented at the SPE Reservoir Simulation Confer-
ence, Montgomery, Texas, USA, 20–22 February. SPE-182609-MS. https://doi.
org/10.2118/182609-MS.
He, J., Tanaka, S., Wen, X.H. et al. 2017b. Rapid S-Curve Update Using Ensemble
Variance Analysis with Model Validation. Presented at the SPE Western Regional
Meeting, Bakersfield, California, 23–27 April. SPE-185630-MS. https://doi.
org/10.2118/185630-MS.
He, J., Sarma, P., Bhark, E. et al. 2018. Quantifying Expected Uncertainty Reduc-
tion and Value of Information Using Ensemble-Variance Analysis. SPE J. 23 (2):
428–448. SPE-182609-PA. https://doi.org/ 10.2118/182609-PA.
Hinton, G. et al., “Deep Neural Networks for Acoustic Modeling in Speech Rec-
ognition: The Shared Views of Four Research Groups,” IEEE Signal Processing
Magazine, vol. 29, no. 6, pp. 82–97, Nov. 2012.
Hoeink, T. and Zambrano, C. 2017. Shale Discrimination with Machine Learning
Methods. Presented at the 51st U.S. Rock Mechanics/Geomechanics Symposium,
San Francisco, California, USA, 25–28 June. ARMA-2017-0769.
Holanda, R. W. de, Gildin, E., and Jensen, J. L. 2015. Improved Waterflood Analysis
Using the Capacitance- Resistance Model Within a Control Systems Framework.
Presented at the SPE Latin American and Caribbean Petroleum Engineering
Conference, Quito, Ecuador, 18–20 November. SPE-177106-MS. https://doi.
org/10.2118/177106-MS.
Holanda, R. W. de, Gildin, E., & Valkó, P. P. 2017. Combining Physics, Statistics
and Heuristics in the Decline Curve Analysis of Large Datasets in Unconven-
tional Reservoirs. Society of Petroleum Engineers. SPE-185589-MS. https://doi.
org/10.2118/185589-MS.
Holanda, R.W., Gildin, E., Jensen, J.L. et al. 2018. A State-of-the-Art Literature
Review on Capacitance Resistance Models for Reservoir Characterization and
Performance Forecasting. Energies 11 (12): 3368.
Holdaway, K. 2014. Harness Oil and Gas Big Data with Analytics: Optimize Explo-
ration and Production with Data-Driven Models, first edition. Hoboken, New
Jersey: Wiley & Sons.
Ilk, D., Rushing, J. A., Perego, A. D. and Blasingame, T. A. 2008. Exponential vs.
Hyperbolic Decline in Tight Gas Sands: Understanding the Origin and Implica-
tions for Reserve Estimates Using Arps’ Decline Curves. Presented at the SPE
Annual Technical Conference and Exhibition at Denver, USA. 21–24 September.
SPE-116731-MS. https://doi.org/10.2118/116731-MS.
Jacobs, T. 2016. Devon Energy Rises to the Top as a Data-Driven Producer. J Pet Tech-
nol 68 (10): 28–29. SPE-1016-0028-JPT. https://doi.org/10.2118/1016-0028-JPT.
Jafarizadeh, B. and Bratvold, R. B. 2009. Strategic Decision Making in the Digital Oil
Field. Presented at the SPE Digital Energy Conference and Exhibition, Houston,
Texas, USA, 7–8 April. SPE-123213-MS. https://doi.org/10.2118/123213-MS.
Javadi, F. and Mohaghegh, S. D. 2015. Understanding the Impact of Rock Properties
and Completion Parameters on Estimated Ultimate Recovery in Shale. Presented
at the SPE Eastern Regional Meeting, Morgantown, West Virginia, USA, 13–15
October. SPE-177318-MS. https://doi.org/10.2118/177318-MS.
Journel, A. G. and Huijbregts, C. J. 1978. Mining Geostatistics, 7th edition. London:
Academic Press.
Kalantari Dahaghi, A. and Mohaghegh, S. D. 2009. Top-Down Intelligent Reservoir
Modeling of New Albany Shale. Presented at the SPE Eastern Regional Meeting,
Charleston, West Virginia, USA. 23–25 September. SPE-125859-MS. https://doi.org/
10.2118/125859-MS.
Kansao, R., Yrigoyen, A., Haris, Z. et al. 2017. Waterflood Performance Diagno-
sis and Optimization Using Data-Driven Predictive Analytical Techniques from
Capacitance Resistance Models (CRM). Presented at the 79th EAGE Annual Con-
ference and Exhibition, Paris, France 12 June. SPE-185813-MS. https://doi.org/
10.2118/185813-MS.
Kaushik, A., Kumar, V., Mishra, A. et al. 2017. Data Driven Analysis for Rapid and
Credible Decision Making: Heavy Oil Case Study. Presented at the Abu Dhabi
International Petroleum Exhibition & Conference, Abu Dhabi, UAE, 13–16
November. SPE-188635-MS. https://doi.org/10.2118/188635-MS.
Klie, H. 2013. Unlocking Fast Reservoir Predictions via Nonintrusive Reduced-
Order Models. Presented at the SPE Reservoir Simulation Symposium, The Wood-
lands, Texas, USA, 18–20 February. SPE-163584-MS. https://doi.org/10.2118/
163584-MS.
Klie, H. 2015. Physics-Based and Data-Driven Surrogates for Production Forecast-
ing. Presented at the SPE Reservoir Simulation Symposium, Houston, Texas,
USA, 23–25 February. SPE-173206-MS. https://doi.org/10.2118/173206-MS.
LaFollette, R., Holcomb, W. 2011. Practical Data Mining: Lessons Learned from the
Barnett Shale of North Texas. Presented at the SPE Hydraulic Fracturing Technol-
ogy Conference, 24–26 January, The Woodlands, Texas, USA. SPE-140524-MS.
https://doi.org/10.2118/140524-MS.
LaFollette, R., Holcomb, W. 2012. Practical Data Mining: Analysis of Barnett Shale
Production Results with Emphasis on Well Completion and Fracture Stimulation.
Presented at the SPE Hydraulic Fracturing Technology Conference, 6–8 February,
The Woodlands, Texas, USA. SPE-152531-MS. https://doi.org/10.2118/152531-MS.
Landa, J.L. and Güyagüler, B. 2003. A Methodology for History Matching and the
Assessment of Uncertainties Associated with Flow Prediction. Presented at the
SPE Annual Technical Conference & Exhibition, Denver, Colorado, 5–8 October.
SPE-84465-MS. https://doi.org/10.2118/84465-MS.
Lasater, J. A. 1958. Bubble Point Pressure Correlation. J Pet Technol 10 (5): 65–67.
SPE-957-G. https://doi.org/ 10.2118/957-G.
Lee, J., Rollins, J.B. and Spivey J.P. 2003, Pressure Transient Testing. SPE Textbook
Series, Vol. 9. ISBN: 978-1-55563-099-7.
B., Warren, M. 2017. Sampling a Stimulated Rock Volume: An Eagle Ford Exam-
ple. Presented at the SPE/AAPG/SEG Unconventional Resources Technology
Conference, Austin, Texas, USA, 24–26 July. URTEC-2670034-MS. https://doi.
org/10.15530/ URTEC-2017-2670034.
Reagan, M.T., Najm, H.N., Debusschere, B.J. et al. 2004. Spectral Stochastic Uncer-
tainty Quantification in Chemical Systems. Combust Theor Model 8 (3): 607–632.
Reed, R. D. and Marks, R. J. 1999. Neural Smithing: Supervised Learning in Feed-
forward Artificial Neural Networks, first edition. Cambridge, Massachusetts: The
MIT Press.
Ren, G., He, H., Wang, Z., Younis, R. M. and Wen, X.-H. 2019. Implementation of
Physics-Based Data- Driven Models With a Commercial Simulator. Presented at
the SPE Reservoir Simulation Conference, 10–11 April, Galveston, TX, USA. SPE-
193855-MS. https://doi.org/10.2118/193855-MS.
Rewienski, M. and White, J. 2003. A Trajectory Piecewise-Linear Approach to Model
Order Reduction and Fast Simulation of Non-Linear Circuits and Micromachined
Devices. IEEE T Comput Aid D 22 (2):155–170. Richardson, J., Yu, W., and Wei-
jermars, R. 2016. Benchmarking Recovery Factors of Individual Wells Using a
Probabilistic Model of Original Gas in Place to Pinpoint the Good, Bad and Ugly
Producers. Presented at the SPE/AAPG/SEG Unconventional Resources Technol-
ogy Conference, San Antonio, Texas, USA, 1–3 August. URTEC-2457581-MS.
https://doi.org/10.15530/URTEC-2016-2457581.
Rollins, B. and Herrin, M. 2015. Finding the Key Drivers of Oil Production through
SAS Data Integration and Analysis. Presented at the Unconventional Resources
Technology Conference, 20–22 July, San Antonio, Texas, USA. URTEC-
2150079-MS. https://doi.org/10.15530/URTEC-2015-2150079.
Rousset, M., Huang, C. K., Klie, H. et al. 2014. Reduced-Order Modeling for Ther-
mal Recovery Processes. Comp Geo 18 (3–4): 401–415.
Saeedi, A., Camarda, K. V., and Liang, J.-T. 2006. Using Neural Networks for Can-
didate Selection and Well Performance Prediction in Water-Shutoff Treatments
Using Polymer Gels—A Field-Case Study. Presented at the SPE Aia Pacitic Oil &
Gas Conference and Exhibition, Adelaide, Australia, 11–13 September. SPE-
101028-MS. https://doi.org/10.2118/101028-MS.
Salazar-Bustamante, M., Gonzalez-Gomez, H., Matringe, S. F., and Castineira,
D. (2012, January 1). Combining Decline-Curve Analysis and Capacitance/
Resistance Models To Understand and Predict the Behavior of a Mature Natu-
rally Fractured Carbonate Reservoir Under Gas Injection. Society of Petroleum
Engineers. SPE-153252-MS. https://doi.org/10.2118/153252-MS.
Sankaran, S., Lugo, J. T., Awasthi, A. et al. 2009. The Promise and Challenges of
Digital Oilfield Solutions: Lessons Learned from Global Implementations and
Future Directions. Presented at the SPE Digital Energy Conference and Exhi-
bition, Houston, Texas, USA, 7–8 April. SPE-122855-MS. https://doi.org/
10.2118/122855-MS.
Sankaran, S., Olise, M. O., Meinert, D., and Awasthi, A. 2011. Realizing Value from
Implementing i-field (TM) in Agbami—A Deepwater Greenfield in an Offshore
Nigeria Development. SPE Econ & Mgmt 3 (1): 31–44. SPE-127691-PA. https://
doi.org/10.2118/127691-PA.
Sankaran, S., Wright, D., Gamblin, H. et al. 2017. Creating Value by Implementing
an Integrated Production Surveillance and Optimization System—An Operator’s
Temizel, C., Salehian, M., Cinar, M. et al. 2018. A Theoretical and Practical Com-
parison of Capacitance- Resistance Modeling With Application to Mature Fields.
Presented at the SPE Kingdom of Saudi Arabia Annual Technical Symposium and
Exhibition, Dammam, Saudi Arabia, 23–26 April. SPE-192413-MS. https://doi.org/
10.2118/192413-MS.
Thiele, M. R. and Batycky, R. P. 2003. Water Injection Optimization Using a Stream-
line-Based Workflow. Presented at the SPE Annual Technical Conference and
Exhibition, Denver, Colorado, 5–8 October. SPE- 84080-MS. https://doi.org/
10.2118/84080-MS.
Thiele, M. R. and Batycky, R. P. 2006. Using Streamline-Derived Injection Effi-
ciencies for Improved Waterflood Management. SPEREE 9 (2). SPE-84080-PA.
https://doi.org/10.2118/84080-PA.
Tian, C. and Horne, R. N. 2015. Applying Machine Learning Techniques to Interpret
Flow Rate, Pressure and Temperature Data From Permanent Downhole Gauges.
Presented at the SPE Western Regional Meeting, Garden Grove, California, USA,
27–30 April. SPE-174034-MS. https://doi.org/10.2118/174034-MS.
Valkó, P.P. 2009. Assigning Value to Stimulation in the Barnett Shale—A Simultaneous
Analysis of 7000 plus Production Histories and Well Completion Records. Presented
at the 2009 SPE hydraulic Fracturing Technology Conference, The Woodlands, TX,
USA, 19–21 January. SPE-119369-MS. https://doi.org/10.2118/119369-MS.
Valkó, P. P. and McCain W. D. 2003. Reservoir Oil Bubble Point Pressures Revisited;
Solution Gas–Oil Ratios and Surface Gas Specific Gravities. J Petrol Sci Eng 37
(3): 153–169.
Valle Tamayo, G. A., Romero Consuegra, F., Mendoza Vargas, L. F., and Osorio
Gonzalez, D. A. 2017. Empirical PVT Correlations Applied for Colombian Crude
Oils: A New Approach. Presented at the SPE Latin America and Caribbean
Petroleum Engineering Conference, 17–19 May, Buenos Aires, Argentina. SPE-
185565-MS. https://doi.org/10.2118/185565-MS.
Van Den Bosch, R. H. and Paiva, A. 2012. Benchmarking Unconventional Well
Performance Predictions. Presented at the SPE/EAGE European Unconventional
Resources Conference and Exhibition, Vienna, Austria, 20–22 March. SPE-
152489-MS. https://doi.org/10.2118/152489-MS.
van Doren, J. F. M., Markovinovic, R., and Jansen, J. D. 2006. Reduced-Order Opti-
mal Control of Water Flooding Using Proper Orthogonal Decomposition. Com-
putat Geosci 10 (1): 137–158.
Vasquez, M., and Beggs, H. D. 1980. Correlations for Fluid Physical Property Prediction.
Society of Petroleum Engineers. SPE-6719-PA. https://doi.org/10.2118/6719-PA.
Villarroel, G., Crosta, D., and Romero, C. 2017. Integration of Analytical Tools to
Obtain Reliable Production Forecasts for Quick Decision-Making. Presented
at the SPE Europec featured at 79th EAGE Conference and Exhibition, Paris,
France, 12–15 June. SPE-185818-MS. https://doi.org/10.2118/185818-MS.
Wantawin, M, Yu, W., Dachanuwattana, S. et al. 2017.An Iterative Response-Surface
Methodology by Use of High-Degree-Polynomial Proxy Models for Integrated
History Matching and Probabilistic Forecasting Applied to Shale-Gas Reservoirs.
SPE J. 22 (6): 2012–2031.
Weber D. 2009. The Use of Capacitance-Resistance Models to Optimize Injection
Allocation and Well Location in Water Floods. PhD dissertation, University of
Texas, Austin, TX, USA.
Wen, X.-H., Clayton V. Deutsch and Cullick, A. S. 2003. Inversion of Dynamic Pro-
duction Data for Permeability: Fast Streamline-Based Computation of Sensitivity
Coefficients of Fractional Flow. J. Hydrol 281 (4): 296–312.
Wen, X.-H and Chen, W. H. 2005. Real Time Reservoir Updating Using Ensemble Kal-
man Filter. Presented at the SPE Reservoir Simulation Symposium, The Woodlands,
Texas, 31 January–2 February. SPE-92991-MS. https://doi.org/10.2118/92991-MS.
Wen, X.-H and Chen, W. H. 2007. Some Practical Issues on Real Time Reservoir
Updating Using Ensemble Kalman Filter, SPE J. 12 (2): 156–166. SPE-111571-PA.
https://doi.org/10.2118/111571-PA.
Whitson, C. H. and Brulé, M. R. 2000. Phase Behavior, Vol. 20. Richardson, Texas:
Society of Petroleum Engineers.
Wicker, J., Courtier, J., and Curth, P. 2016. Multivariate Analytics of Seismic Inver-
sion Products to Predict Horizontal Production in the Wolfcamp Formation of the
Midland Basin. Presented at the SPE/ AAPG/SEG Unconventional Resources Tech-
nology Conference, San Antonio, Texas, USA, 1–3 August. URTEC-2449798-MS.
https://doi.org/10.15530/URTEC-2016-2449798.
Wicker, J., Courtier, J., Gray, D., Jeffers, T., Trowbridge, S. 2017. Improving Well
Designs and Completion Strategies Utilizing Multivariate Analysis. Presented
at the SPE/AAPG/SEG Unconventional Resources Technology Conference, Aus-
tin, Texas, USA, 24–26 July. URTEC-2693211-MS. https://doi.org/10.15530/
URTEC-2017-2693211.
Wilson, A. 2015. Creating Value With Permanent Downhole Gauges in Tight Gas
Appraisal Wells. J Pet Technol 67 (2): 112–115. SPE-0215-0112-JPT. https://doi.org/
10.2118/0215-0112-JPT.
Wilson, A. 2017. Drill and Learn: A Decision-Making Work Flow To Quantify Value
of Learning. J Pet Technol 69 (4): 95–96. SPE-0417-0095-JPT. https://doi.org/
10.2118/0417-0095-JPT.
Wuthrich, M.V. and Buser, C. 2017. Data Analytics for Non-Life Insurance Pricing.
Research Paper No. 16–68, Swiss Finance Institute, Geneva, Switzerland (Octo-
ber 2017).
Xie, J., Gupta, N., King, M. J. et al. 2012. Depth of Investigation and Depletion
Behavior in Unconventional Reservoirs Using Fast Marching Methods. Presented
at the SPE Europec/EAGE Annual Conference, Copenhagen, Denmark, 4–7 June.
SPE-154532-MS. https://doi.org/10.2118/154532-MS.
Xiu, D. and Karniadakis, G. E. 2002. Modeling Uncertainty in Steady State Diffusion
Problems via Generalized Polynomial Chaos. Comput Method Appl M 191 (43):
4927–4948.
Yalgin, G., Zarepakzad, N., Artun, E. et al. 2018. Design and Development of Data-
Driven Screening Tools for Enhanced Oil Recovery Processes. Society of Petroleum
Engineers. Presented at SPE Western Regional Meeting, Garden Grove, Califor-
nia, USA, 22–26 April. SPE-190028-MS. https://doi.org/10.2118/190028-MS.
Yang, T., Basquet, R., Callejon, A. et al. 2014. Shale PVT Estimation Based on
Readily Available Field Data. Presented at the SPE/AAPG/SEG Unconventional
Resources Technology Conference, Denver, Colorado, 25–27 August. URTEC-
1884129-MS. https://doi.org/10.15530/URTEC-2014-1884129.
Yang, T., Arief, I. H., Niemann, M., Houbiers, M. 2019a. Reservoir Fluid Data Acqui-
sition Using Advanced Mud Logging Gas in Shale Reservoirs. Presented at the
Business Objectives. Every model is built with a sense of a defined purpose or objec-
tive, which should primarily drive the modeling approach. For example, when well
placement optimization is the primary objective, a material balance model might not
be an appropriate modeling strategy.
The business objectives could be classified as follows:
Predictive—Aimed at predicting future outcomes or what would happen. For
example, predict EUR of a well from first few months of observed production (pre-
diction problem); or predict if a water breakthrough would happen within the next
1 month (classification problem).
Explanative—Understanding why something happens by using plausible mecha-
nisms to match outcome data in a well-defined manner. For example, explain why
water breaks through prematurely in producer wells by injecting water in certain
water injection wells.
Illustrative—Showing a mechanism or idea clearly to understand how it hap-
pens. For example, show that the water blocking effect is observed in flow through
hydraulically fractured horizontal wells.
System Analysis. Proper understanding and delineation of the system (or the sub-
systems) being modeled are essential initial steps in the model scoping process. If
the underlying physical mechanisms can be modeled explicitly with known model
parameters in a timely manner, then this might offset the need for data-driven mod-
els. Therefore, it is important to understand the limitations or the driving factors for
a data-driven model in the first place. The acceptance of a model is thus guided by
its “usefulness” rather than the “truth.” Having domain knowledge or partial under-
standing of the physics can also be helpful in designing features that can assist or
speed up the machine learning model.
Data Analysis. The next phase of the process dives into a preliminary analysis of the
data set to better understand the problem at hand.
• It is important to ensure reproducibility of the entire analysis that is important
for data analytics projects.
Therefore, original versions of raw data must be kept separate from the cleaned-up
data set (and its versions).
Exploratory Data Analysis. After the data have been gathered for the study, an ini-
tial exploratory data analysis starts, which is typically aimed at understanding the
data set as much as possible before modeling it. The objective of this phase is for the
modelers to familiarize themselves with each variable, understand the relationships
between the variables, visualize the distributions, and identify potential gaps or out-
liers. During this phase, a series of descriptive statistics are calculated, and a series
of displays are generated to visualize the content of the data. This important step is
critical for efforts in both descriptive and diagnostic analytics. Following this initial
investigation, the data set is transformed in several ways before the application of
the machine learning algorithms (Mishra and Datta-Gupta 2017).
Understanding the data available for modeling starts with defining the type of
data—namely, structured or unstructured. Structured data are defined by columns
and data instances with well-defined data types, whether they are numerical or cat-
egorical. Unstructured data are mostly in the form of text, image, sound, or other
such formats.
Data Availability. When limited data are available or the process often takes excur-
sions outside the historical training data range, (full or reduced) physics-based mod-
els are much more reliable for better accuracy. On the other hand, when substantial
relevant data are available, and the process is conducive to data-driven modeling,
machine learning models can be quite useful and often outperform physical models.
The choice of machine learning models is also dependent on the amount of data
available for training purposes.
The size of training and validation data sets is quite important in the choice of
modeling approach and meeting desired modeling accuracy levels. For example, if
two different algorithms are used that have very close accuracy levels, it might not
be possible to detect the difference with small training data sets. For validation data
sets, a popular heuristic is to use 30% of the available data for testing purposes. In
general, the availability of labeled data sets for classification applications in reser-
voir engineering is often limited and might be a necessity for supervised learning
algorithms.
and the clocks are adjusted to daylight savings time). Readings can be periodic or
aperiodic (e.g., the process temperature might be recorded every 15 seconds while an
analyzer reading might be acquired every 120 minutes and a sample might be taken
once or twice a week, usually in the morning shift).
The user will need to understand the tag naming conventions, to be aware that
the correct measurements are retrieved (Sidahmed et al. 2014, 2015). There might be
a choice of process historian that is queried to recover the data; a mirror database
might be preferred rather than the active primary historian, to avoid overloading the
primary application. The user will also need to understand the units of measure of
the recorded data. This ensures that the range of data (interpolation/extrapolation)
is chosen appropriately for the model purpose. This is a key step to ensure model
fidelity.
Stationary data (steady state)—Derived from process instrumentation and subject
to electronic and process noise, drift, and other forms of calibration error, stationary
data do not depend on the time that the reading is acquired because they are not
changing with time.
Fixed data—These are attributes of the system rather than of the operation of the
system (e.g., landing depth of a well, maximum flow coefficient of a valve).
Categorical data—The above data types are mostly numeric in nature, but certain
data types can be divided into groups, commonly called categorical data. For exam-
ple, a well type can be divided into vertical well, horizontal well, or slanted well; an
aquifer can be classified as infinite acting, strong, weak, or no aquifer.
Data redundancy—The question of redundant information arises when identi-
fying the tags to retrieve. Redundancy takes several forms (Sidahmed et al. 2015):
• Temporal redundancy—rather like the situation of stationary data, we can
improve our data by a higher frequency or number of measurements taken
from a single measurement tag.
• Point measurement redundancy—the designer intended that a single process
measurement be acquired by more than one instrument, usually for the pur-
pose of ensuring either that the measurement was acquired or that a more reli-
able value was recorded/acted upon through some voting system (e.g., a trip
system requiring two out of three measurements to exceed a threshold, rather
than having a single gauge provide a reading above the threshold value).
• Model based redundancy—a relationship known to be true can be exploited
to provide a value or a confirmation of a value (e.g., a system known to con-
tain a pure component can provide an estimate of a state variable such as tem-
perature from a measurement of another state such as pressure). This can be
extended to multiple measurements and multiple relationships (e.g., a whole
process mass balance and a process simulation can collaborate to estimate the
error in every gauge measurement).
Data may often need to be curated on the basis of a number of methods—e.g.,
missing value imputation, defining limits of a variable, preprocessing (scaling), fea-
ture engineering, outlier removal, variable transformation, and cross correlation.
Data cleansing—Most data-driven algorithms are sensitive to outliers and missing
data. It is therefore important to prepare the data set accordingly. A first step in the
analysis is often to identify entries that are clearly erroneous. This is often a straight-
forward statistical exercise that detects values that fall outside of a physical range
Learning Model Selection. A variety of machine learning models have been applied
to several reservoir engineering applications. The simplest model possible is a mul-
tivariate linear regression. Although the model itself is linear, the preliminary data
transformation allows for a model that is effectively nonlinear. It is also possible to
model the problem in log space, which provides a multiplicative rather than an addi-
tive model. A preprocessing algorithm known as alternating conditional expectation
is sometimes used to identify potential data transformation that leads to a superior
fit. More-advanced regression algorithms such as multivariate adaptive regressive
splines [a nonparametric regression technique that automatically models nonlineari-
ties and interactions between variables (Friedman 1991)] have also gained popular-
ity over the past few years.
More-advanced models are also routinely used. Neural networks have always
been a preferred choice because of their famous universal approximation capac-
ity. For the most part, the oil industry has so far been using simple feed-forward
back-propagation algorithms for these applications. These models perform well, but
iterations on the network architecture are sometimes necessary, which makes these
models less attractive than simpler algorithms.
Random forest is an algorithm that has gained tremendous popularity recently
because of its ease of use. Random forest is essentially a model that combines the
learning of an ensemble of decision trees, each decision tree providing a different
classification of the data set with associated estimate for each class. The resulting
algorithm is a robust nonlinear interpolation algorithm, but the model is very poor
at extrapolation in its standard form. Other algorithms such as auto-regressive inte-
grated moving average (ARIMA), finite impulse response (FIR), Box-Jenkins (BJ),
support vector machines, Bayesian belief networks, boosted trees, or convolutional
networks have also been used, although less frequently. Note that ARIMA, FIR, and
BJ methods are applied for time series modeling.
Model Calibration. Machine learning models have the capacity to fit the data set
extremely accurately. If applied without control, they can learn the data set com-
pletely and simply regurgitate the test values perfectly. Such overfitted models have
little predictive accuracy. The most common way to guarantee that machine learning
algorithms learn the key trends in the data sets rather than simply memorize the
answer is to withhold a portion of the data for validation purposes. Fig. A-1 illus-
trates the typical behavior of model complexity vs. error, where it is desirable to select
the optimal model complexity to avoid underfitting or overfitting.
Model error
Variance error
Optimal model
complexity
Error
0.5
Underfitting Overfitting
Bias error
0
0 20 60 100
Number of model parameters
Therefore, the data set is split into training, development, and test sets. As training
progresses, the modeling error for the training set and the development set should
decrease. When the development set error starts to increase, the model enters a phase
of memorization and the training should be stopped.
BK-SPE-PETRO_BRIEF-200020-TR_Data_Anal.indd
View publication stats 96 20/10/20 5:44 PM