Spe 173445 MS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

SPE-173445-MS

The Data Reservoir: How Big Data Technologies Advance Data


Management and Analytics in E&P
M.R. Brulé, IBM Software Group

Copyright 2015, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Digital Energy Conference and Exhibition held in The Woodlands, Texas, USA, 3–5 March 2015.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract
Big Data Analytics has steadily gained momentum in the upstream E&P industry. Much of the attention
has been on advancing data-driven methods including empirical statistical and stochastic approaches, and
especially artificial neural networks (ANN). The focus has been on the particular analytics method used
rather than to the management, governance, and refinement of the data used in models. Studies conducted
through the SPE and by global E&P companies have validated that data management is a major problem
in the oil & gas industry. They have clearly established that over half the engineer’s and geoscientist’s
time is spent just looking for data and assembling it before multidisciplinary analysis is even begun (Brulé
et al. 2009). Because Big Data Analytics encompasses the four V’s of data: Volume, Velocity, Variety,
and Veracity, the complexity of managing the data has increased substantially and will become even more
of a deterrent to performing analytics. The strategy for collecting, streaming, storing, transporting,
cleansing, and securing the data has become just as important as the analytic methods. Promising Big Data
management and governance concepts continue to evolve. Among the newest is the “Data Lake,” a
massively scalable “landing zone” for semistructured and unstructured data of any type, format, or
schema, implemented through Hadoop, other NoSQL, and SQL technologies. This paper will explore the
Data Lake for E&P and how its implementation and refinement into an E&P Data Reservoir can be
achieved by combining Big Data and industry data standards and other petrotechnical technologies.
Introduction – General Data Reservoir Concepts
The Data Reservoir has been generalized across industries, and the resulting high-level architecture of a
data reservoir has been developed (Chessell et al. 2014). A few key concepts are summarized here. The
Big Data trend has led organizations to consider a data lake solution. A data lake is a set of one or more
data repositories that have been created to support data discovery, reporting, analytics, and ad hoc
investigation (Chessell et al. 2014). The data lake contains data from many different sources. People in
the organization are free to add and access data from the data lake. Without data management, cleansing,
integration, and governance, a data lake can become a data swamp. Data swamps cannot be guaranteed
to be reliable in quality, lineage, and security. Here we use the term data reservoir to refer to a data lake
that is reliable and useful because it has been built with management and governance, including
traceability, cleansing, and integration capabilities.
2 SPE-173445-MS

Figure 1—Data Reservoir Conceptual Architecture Supporting Three Types of Analytics

Data Reservoir for E&P


Because of the limitations of traditional information technologies, only a small fraction of the data
acquired by sensors, logs, and other sources are actually analyzed, resulting in substantial operational and
economic loss to the industry. Only a fraction of engineering and operations data fit neatly into
conventional databases widely used for data management in E&P Several alternative industry data
standards (e.g., Energistics, ISA, ISO, Mimosa, etc.) have been progressing to address related problems
in separate domains, but they require new storage and analytics approaches (Brulé 2010, Crawford &
Morneau 2010).
A new architectural paradigm for building Big Data solutions, called data-centric computing, is
evolving. Data-centric architecture implies that “analytics be brought to the data,” to minimize data flow,
or more often in E&P, avoiding data persistence to minimize latency (Brulé 2013). Figure 1 shows the
conceptual architecture supporting three types of analytics.
The data reservoir must accommodate a wide range of use cases ranging from traditional reporting to
predictive analytics. These use cases can be characterized by the three analytics patterns shown in Figure
2 and Table 1.
The first is typified by relational databases, integrated or federated hub-and-spoke. Big Data scalability
for several business-oriented E&P domains is achieved through MPP data warehouses storing structured
data (Brulé 2009). The second use-case pattern is based on MPP streaming analytics for data-in-motion,
and includes structured, semistructured, and unstructured data (Brulé 2013). The third analytics pattern
also applies to all types of data for data-at-rest in the data reservoir.
Another key requirement is the need to support multiple industry standards that are not based on a
relational data model. A Data Reservoir must be capable of handling any Oil & Gas XML schema. It is
ideal for housing PRODML, WITSML, or any other semi-structured and unstructured data. Such a
repository can also handle WITSML data from drilling and completions as well as any other form of data
SPE-173445-MS 3

Figure 2—Three Basic E&P Use-Case Patterns

Table 1—Three Use-Case Patterns Based on Underlying Big Data Technology


4 SPE-173445-MS

be it relational, time-series, textual, video, etc. The approach must provide an E&P operating company the
capabilities for multiple solutions including production and inflow monitoring, gas-lift monitoring and
optimization, well-integrity monitoring, injection monitoring and profiling, ESP monitoring and optimi-
zation, real-time fracturing monitoring, and several others.
In the following section we will review a couple of examples of high-value use cases.

Real-Time Surveillance, Analysis, and Optimization for Drilling and for


Production with DTS
Drilling or Production surveillance, analysis, and optimization now includes vibrational and acoustic data
measured at high frequency through new sensing technologies such as distributed acoustic sensors (DAS),
distributed temperature sensing (DTS), and downhole pressure surveying with permanent downhole
gauges. Such high-density events can be recorded every 1/10th of a foot, even along a horizontal well.
Early symptom detection is aimed at real-time evaluation of downhole conditions. Predictive analytics
capabilities are emerging that can help drillers avoid unsafe and costly incidents such as stuck pipe, slip
stick, and other problems well in advance of their occurrence. Stream computing does not provide the
models; it provides a new computing infrastructure that allows the complex models to be run on the rig
or in the field, where the data are being generated, without concern for data scalability, model complexity,
bandwidth, footprint, and other previous barriers to real-time drilling and production optimization and
automation.
Problem:
y Lost production, high OpEx, increased downtime
y Many remote and decentralized sensor units is difficult to manage and track
y Requires preprocessing and management of large amounts of sensor data at multiple remote sites
y Need to move data from remote sites to a central location, often using poor communications
connections
y Much fiber optics data generated at great expense, but not well used
y Inventory of light boxes is hard to manage
y Data cannot be easily fed to applications
y Operators are not sure what changes to make when operational situations become complex
y Many terabytes of multidisciplinary data, e.g., structured and semistructured production data, and
unstructured geologic data must be analyzed and correlated
y Traditional tagged, flat-file historians are inherently limited as a surveillance and automation
platform
y Right-time production optimization and reservoir management strategies need to be integrated to
achieve Closed-Loop Reservoir Management
Top-level requirements for API-enabled DTS include:
y New light boxes would link to the repository through a PRODML transport mechanism
y The repository would feed various oilfield data-logging applications, as well as surveillance and
analysis tools.
y These tools would receive time-series historian and other data directly from dedicated interfaces.
y A portal-based application can be used to administer the repository accommodating both standard
administrative users as well as power users.
y The application must be capable of managing fiber-optic inventory
The solution approach shown in Figure 3 includes supporting fast analytics with deep analytics with
a fit-for-purpose Big Data Analytics platform.
SPE-173445-MS 5

Figure 3—DTS Fiber Optics Setup

y Use real-time streaming analytics— data-centric, in the field—to identify problems and filter
erroneous data from fiber-optics sensors before the data enter control loops or the historian
y Use BigInsights Hadoop as the storage and analytics platform that performs the scheduled and
ad-hoc analytics
Speed of execution is increased while reducing costs.
y Refine models and push updated models back to the field, at the speed of the operation, not in a
silo’d and disconnected fashion
y Improvements in a single well’s yield will pay the entire IT project’s budget for 10⫹ years
The Big Data Analytics platform with a Data Reservoir can progress over a roadmap toward production
automation. By setting up a real-time analytics environment, fiber optics data can be scored with complex
models that can be used to tune the production process.
y More effective use of fiber-optics data—DTS, DAS, DPS, DSS, etc. for production optimization
y Detect behind-pipe leaks, compositional gradients with speed and accuracy beyond conventional
methods
y Monitor fracking profiles, injection profiles, flood fronts, etc. to determine production health of
oilfield
6 SPE-173445-MS

Figure 4 —Pipeline Optimization Architecture, Combining Flow and Corrosion Characteristics

Pipeline Optimization Use Case


Production Optimization facilitated with Big Data & Analytics embodies new methods for operations
improvement that augment traditional multidomain physics-based models that have been in use for
decades in the industry. Oil & gas companies are currently saddled with unprecedented capital commit-
ments, and are looking for ways to increase production of existing assets while reducing operating
expenses, and maintaining the environment and safety standards.
The oil & gas industry has led in theory-based modeling & simulation, with the latest generation
focused on Closed-Loop Reservoir Management (CLRM) and Integrated Asset Modeling (IAM). The
modern information-oriented architecture for the upstream E&P industry combines theoretical CLRM and
IAM with empirical methods in data mining and predictive analytics. Figure 5 is an illustrative case.
Process upsets can occur if a pipeline gas condenses liquids that cause erratic flow and slugging.
Theory-based equations of state can be used to avoid the pressure, temperature, and composition
conditions at which the gas enters the two-phase region. At the same time, empirical methods of data
mining can be used to avoid the conditions under which pipeline corrosion accelerates, potentially leading
to a catastrophic pipeline failure. The two approaches in concert—physics and empirics— can provide a
game-changing approach to optimize operations and avoid events leading to safety incidents (Brulé & Fair
2014).
Solution Approach
Create Engineering Data Warehouse (relational or Hadoop) from various source systems, as complement
to field data historian, to bring data together and support a combined theoretical modeling and empirical
analytics environment. The solution approach shown in Figure 4 involves the following steps:
y Predictive analytics beyond surveillance: Score real-time data against a model running in Streams
to detect anomalies outside expect conditions. These conditions follow physics-based models and
are beyond just monitoring primary variables like weight on bit and rate of penetration.
y ‘Modeling plus Mining’ strategy: Augment the industry’s traditional physics-based modeling and
simulation methods (Integrated Asset Modeling) with empirical data mining and AI predictive
analytics based on statistical and stochastic methods. The physics-based model can be used to
determine phase-behavior and how to optimize flow characteristics (e.g., OLGA). Simultaneously
an ANN model can be used to predict how to slow down corrosion rates.
SPE-173445-MS 7

y Improve performance and scalability with the massively scalable MPP technologies in all three
analytics types (Figures 1 and 2).
Conclusion
The Data Reservoir is a modern Big Data approach that handles data of any volume and variety. Its
reference architecture assigns the proper Big Data technologies for traditional BI and reporting, real-time
analytical processing, and discovery analytics for structured, semistructured, and unstructured data. It
supports both physics-based modeling and empirical approaches such as ANN. The underlying Big Data
approach removes the limitations of relational databases in managing different data standards. Reference
architectures exist for deployment of the Data Reservoir, in off-premises and hybrid Cloud, and also with
the Internet of Things platform. The Data Reservoir solves a much wider range of E&P industry problems
than is possible with traditional information technologies.

References
Balto, A. et alet al. 2012. The Use of Real-Time Distributed-Temperature and Downhole-Pressure
Surveying to Quantify Skin and Zone Coverage in Horizontal-Well Stimulation. SPE 155723
presented at the SPE Annual Technical Conference and Exhibition held in San Antonio, Texas,
USA, 8-10 October.
Brulé, M., Charalambous, Y., Crawley, C., and Crawford, M. 2009. Reducing the Data Commute
Heightens E&P Productivity. JPT, September.
Brulé, M., Fair, W. 2014. Fusion of Physics and Empirics Boosts Predictive Analytics. Presented at
the SPE Data Driven Analytics Workshop, Galveston, Texas. 19-20 November.
Brulé, M. 2013-1. Enhanced SCADA Access and Big Data Lead to New Analytics & Optimization
Capabilities, Remote, 16 December. http://www.remotemagazine.com/main/articles/enhanced-
scada-access-and-big-data-lead-to-new-analytics-optimization-capabilities/
Brulé, M.R. 2013-2. Big Data in E&P: Real-Time Adaptive Analytics and Data-Flow Architecture,
SPE 163721 presented at the SPE Digital Energy Conference and Exhibition held in The
Woodlands, Texas, USA, 5–7 March.
Brulé, M. 2010. Fitting the data together. Digital Energy J., March-April.
Brulé, M. 2009. Using Massively Parallel Processing Databases. Digital Energy J., September-
October.
Chessell, M., Scheepers, F., Nguyen, N., van Kessel, R., van der Starre, R. 2014. Governing and
Managing Big Data for Analytics and Decision Makers, IBM Redbook, 26 August. http://
www.redbooks.ibm.com/abstracts/redp5120.html?Open
Crawford, M. Morneau, R. 2010. Accelerating Progress toward Achieving Digital Oilfield Workflow
Efficiencies. Paper SPE 134107 presented at the SPE Annual Technical Conference and Exhibition
held in Forence, Italy, 19-22 September.
Nipper, A. 2013. Message Oriented Middleware - The Future of SCADA, Remote, 16 December.
http://www.remotemagazine.com/main/message-oriented-middleware-the-future-of-scada/

You might also like