Spe 173445 MS
Spe 173445 MS
Spe 173445 MS
This paper was prepared for presentation at the SPE Digital Energy Conference and Exhibition held in The Woodlands, Texas, USA, 3–5 March 2015.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Big Data Analytics has steadily gained momentum in the upstream E&P industry. Much of the attention
has been on advancing data-driven methods including empirical statistical and stochastic approaches, and
especially artificial neural networks (ANN). The focus has been on the particular analytics method used
rather than to the management, governance, and refinement of the data used in models. Studies conducted
through the SPE and by global E&P companies have validated that data management is a major problem
in the oil & gas industry. They have clearly established that over half the engineer’s and geoscientist’s
time is spent just looking for data and assembling it before multidisciplinary analysis is even begun (Brulé
et al. 2009). Because Big Data Analytics encompasses the four V’s of data: Volume, Velocity, Variety,
and Veracity, the complexity of managing the data has increased substantially and will become even more
of a deterrent to performing analytics. The strategy for collecting, streaming, storing, transporting,
cleansing, and securing the data has become just as important as the analytic methods. Promising Big Data
management and governance concepts continue to evolve. Among the newest is the “Data Lake,” a
massively scalable “landing zone” for semistructured and unstructured data of any type, format, or
schema, implemented through Hadoop, other NoSQL, and SQL technologies. This paper will explore the
Data Lake for E&P and how its implementation and refinement into an E&P Data Reservoir can be
achieved by combining Big Data and industry data standards and other petrotechnical technologies.
Introduction – General Data Reservoir Concepts
The Data Reservoir has been generalized across industries, and the resulting high-level architecture of a
data reservoir has been developed (Chessell et al. 2014). A few key concepts are summarized here. The
Big Data trend has led organizations to consider a data lake solution. A data lake is a set of one or more
data repositories that have been created to support data discovery, reporting, analytics, and ad hoc
investigation (Chessell et al. 2014). The data lake contains data from many different sources. People in
the organization are free to add and access data from the data lake. Without data management, cleansing,
integration, and governance, a data lake can become a data swamp. Data swamps cannot be guaranteed
to be reliable in quality, lineage, and security. Here we use the term data reservoir to refer to a data lake
that is reliable and useful because it has been built with management and governance, including
traceability, cleansing, and integration capabilities.
2 SPE-173445-MS
be it relational, time-series, textual, video, etc. The approach must provide an E&P operating company the
capabilities for multiple solutions including production and inflow monitoring, gas-lift monitoring and
optimization, well-integrity monitoring, injection monitoring and profiling, ESP monitoring and optimi-
zation, real-time fracturing monitoring, and several others.
In the following section we will review a couple of examples of high-value use cases.
y Use real-time streaming analytics— data-centric, in the field—to identify problems and filter
erroneous data from fiber-optics sensors before the data enter control loops or the historian
y Use BigInsights Hadoop as the storage and analytics platform that performs the scheduled and
ad-hoc analytics
Speed of execution is increased while reducing costs.
y Refine models and push updated models back to the field, at the speed of the operation, not in a
silo’d and disconnected fashion
y Improvements in a single well’s yield will pay the entire IT project’s budget for 10⫹ years
The Big Data Analytics platform with a Data Reservoir can progress over a roadmap toward production
automation. By setting up a real-time analytics environment, fiber optics data can be scored with complex
models that can be used to tune the production process.
y More effective use of fiber-optics data—DTS, DAS, DPS, DSS, etc. for production optimization
y Detect behind-pipe leaks, compositional gradients with speed and accuracy beyond conventional
methods
y Monitor fracking profiles, injection profiles, flood fronts, etc. to determine production health of
oilfield
6 SPE-173445-MS
y Improve performance and scalability with the massively scalable MPP technologies in all three
analytics types (Figures 1 and 2).
Conclusion
The Data Reservoir is a modern Big Data approach that handles data of any volume and variety. Its
reference architecture assigns the proper Big Data technologies for traditional BI and reporting, real-time
analytical processing, and discovery analytics for structured, semistructured, and unstructured data. It
supports both physics-based modeling and empirical approaches such as ANN. The underlying Big Data
approach removes the limitations of relational databases in managing different data standards. Reference
architectures exist for deployment of the Data Reservoir, in off-premises and hybrid Cloud, and also with
the Internet of Things platform. The Data Reservoir solves a much wider range of E&P industry problems
than is possible with traditional information technologies.
References
Balto, A. et alet al. 2012. The Use of Real-Time Distributed-Temperature and Downhole-Pressure
Surveying to Quantify Skin and Zone Coverage in Horizontal-Well Stimulation. SPE 155723
presented at the SPE Annual Technical Conference and Exhibition held in San Antonio, Texas,
USA, 8-10 October.
Brulé, M., Charalambous, Y., Crawley, C., and Crawford, M. 2009. Reducing the Data Commute
Heightens E&P Productivity. JPT, September.
Brulé, M., Fair, W. 2014. Fusion of Physics and Empirics Boosts Predictive Analytics. Presented at
the SPE Data Driven Analytics Workshop, Galveston, Texas. 19-20 November.
Brulé, M. 2013-1. Enhanced SCADA Access and Big Data Lead to New Analytics & Optimization
Capabilities, Remote, 16 December. http://www.remotemagazine.com/main/articles/enhanced-
scada-access-and-big-data-lead-to-new-analytics-optimization-capabilities/
Brulé, M.R. 2013-2. Big Data in E&P: Real-Time Adaptive Analytics and Data-Flow Architecture,
SPE 163721 presented at the SPE Digital Energy Conference and Exhibition held in The
Woodlands, Texas, USA, 5–7 March.
Brulé, M. 2010. Fitting the data together. Digital Energy J., March-April.
Brulé, M. 2009. Using Massively Parallel Processing Databases. Digital Energy J., September-
October.
Chessell, M., Scheepers, F., Nguyen, N., van Kessel, R., van der Starre, R. 2014. Governing and
Managing Big Data for Analytics and Decision Makers, IBM Redbook, 26 August. http://
www.redbooks.ibm.com/abstracts/redp5120.html?Open
Crawford, M. Morneau, R. 2010. Accelerating Progress toward Achieving Digital Oilfield Workflow
Efficiencies. Paper SPE 134107 presented at the SPE Annual Technical Conference and Exhibition
held in Forence, Italy, 19-22 September.
Nipper, A. 2013. Message Oriented Middleware - The Future of SCADA, Remote, 16 December.
http://www.remotemagazine.com/main/message-oriented-middleware-the-future-of-scada/