Application of Python and Data Analytics in Oil and
Gas
Jaiyesh Chahar | jaiyesh0002@gmail.com
Reservoir Engineer(Data Analyst) at Dicelytics Pvt. Ltd. (Dice Technologies LLC)
List of Contents in this Notebook:
1. Basics of Python with oil and gas examples
2. Data Structures: Lists, Dictionaries, Tuples, Sets
3. Numpy
4. Pandas: With Volve Field Production Data
5. Matplotlib
6. Interactive Plotting: Pressure Profile in reservoir by varying Parameters
7. Oil and Gas Mini Projects: Vogel's IPR, Material Balance(Gas, Oil)
Useful Links-
1. Contact me at: https://www.linkedin.com/in/jaiyesh-chahar-9b3642107/ (https://www.linkedin.com
/in/jaiyesh-chahar-9b3642107/)
2. For more of my Projects : https://github.com/jaiyesh (https://github.com/jaiyesh)
3. Playlist of python for oil and gas by Petroleum from Scratch: https://www.youtube.com
/watch?v=UjdPncyGkIs&list=PLLwtZopJNyqYGXEYmt0zezAEuS616rACw (https://www.youtube.com
/watch?v=UjdPncyGkIs&list=PLLwtZopJNyqYGXEYmt0zezAEuS616rACw)
4. Petroleum from Scratch: https://www.linkedin.com/company/petroleum-from-scratch
/?viewAsMember=true (https://www.linkedin.com/company/petroleum-from-scratch
/?viewAsMember=true)
Welcome Everyone!
I am very excited to take you through this exiciting journey of applicaion of python and Data Analystics in
O&G Industry.
My Aim is to take you to a better, confident and super comfortable place as far as Python for Oil and Gas
is concerned.
We will start with basics of Python and then move on to use cases in Industry.
Here goes.
Python from Scratch
In [1]: # is used for commenting out the statement(block, inline comment)
#Starting with print function
print('Hello Guys')
#Escape Sequence \n helps change line.
print('Hello \nPDEU')
Hello Guys
Hello
PDEU
Single values Data Types
1. Integer eg. - 7
2. Floats eg. - 7.0
3. Booleans: EIther True or False
4. Strings: set of characters that can also contain spaces and numbers, intialises by using "__" eg. - "6"
Mathematical Operations
In [2]: # Addition
print(4+5)
print(4.0+5)
#Subtraction
print(5-1)
print(5.0-1)
#Multiplication
print(2*3)
#Division
print(625/10)
9
9.0
4
4.0
6
62.5
In [3]: ## Exponent: using **
print(5**2)
25
In [4]: ## Integer Division using //
print(26//5)
5
In [5]: ## To get remainder using %
print(26%5)
Strings
In [6]: # surrounded by either single quotation marks, or double quotation mark
s
6 : integer
6.0 : Float
"6" : String
In [7]: print('Hey There's You')
File "<ipython-input-7-4b79f5295f0b>", line 1
print('Hey There's You')
^
SyntaxError: invalid syntax
In [8]: print('Hey There\'s You')
Hey There's You
In [9]: print('Petroleum'+'Engineering')
PetroleumEngineering
In [10]: print('Spam'*3)
SpamSpamSpam
In [11]: print(4*3)
print(4*'3')
12
3333
In [12]: type(2)
Out[12]: int
In [13]: type(4*'3')
Out[13]: str
Variables: Storing any value to a name that is know as Variable
'=' sign is used for intialising
In [14]: #This is called initialization.
#The value from RHS gets stored into the variable at LHS.
x = 6
y = 7
In [15]: print(x*2 - y*3)
-9
In [16]: # Single Values are Stored in Variables.
# Variable names cannot start with a number or special character.
# Doesn't have spaces b/w two characters, but Underscore is allowed.
#Ex - 1a is invalid
#Ex - a1 is valid
#Ex- a_1 and _1_a both are valid.
Input
In [17]: porosity = input('Enter the Formation porosity: ')
Enter the Formation porosity: 0.5
In [18]: porosity
Out[18]: '0.5'
In [19]: type('porosity')
Out[19]: str
In [20]: porosity = float(input('Enter the Formation porosity: '))
Enter the Formation porosity: 0.6
In [21]: type(porosity)
Out[21]: float
In [23]: permeability = float(input(' Enter the formation\'s permeability(md):
'))
Enter the formation's permeability(md): 35
In [24]: print(f'Formation porosity is {porosity} and permeability is {permeabil
ity}')
Formation porosity is 0.6 and permeability is 35.0
In [25]: print('Formation porosity is {} and permeability is {}'.format(porosit
y,permeability))
Formation porosity is 0.6 and permeability is 35.0
In [26]: print('Formation porosity is', porosity, 'and permeability is', permeab
ility )
Formation porosity is 0.6 and permeability is 35.0
Booleans and Comparison
In [27]: print(2!= 3)
True
In [28]: print(2 == 3 )
False
In [29]: print(2 = 3 )
File "<ipython-input-29-2981d28b369a>", line 1
print(2 = 3 )
^
SyntaxError: keyword can't be an expression
In [30]: print(5>3 or 3>5 )
True
In [31]: print(5>3 and 3>5 )
False
If-else: To run a code only if certain condition holds true
In [32]: Reservoir_Pressure = float(input('Enter the reservoir pressure(psi):
'))
Hydrostatic_pressure = float(input('Enter the Hydrostatic pressure of m
ud(psi): '))
Fracture_pressure = float(input('Enter the Fracture pressure of rock(ps
i): '))
Enter the reservoir pressure(psi): 1000
Enter the Hydrostatic pressure of mud(psi): 1200
Enter the Fracture pressure of rock(psi): 1300
In [33]: if Hydrostatic_pressure > Reservoir_Pressure and Hydrostatic_pressure <
Fracture_pressure:
print('Safe Zone')
elif Hydrostatic_pressure > Reservoir_Pressure and Hydrostatic_pressure
> Fracture_pressure:
print('Risk of formation Fracture')
else:
print('Risk of kick')
Safe Zone
While Loops : To repeat a block of code again and again; until the condition
satisfies
The code in body of while loop is executed repeatedly. This is called Iterations
In [35]: i = 1
while i<=5:
print(i)
i = i+1
print('Finished')
1
2
3
4
5
Finished
In [36]: ## Can be used to stop iteration after a specific input
In [37]: Password = input('Enter the Password: ')
Enter the Password: we
In [38]: while Password!= 'abcd':
Password = input('Enter the Password: ')
print('Wrong Password Enter Again')
print('Access Granted')
Enter the Password: ww
Wrong Password Enter Again
Enter the Password: dcc
Wrong Password Enter Again
Enter the Password: abcd
Wrong Password Enter Again
Access Granted
break: for breaking the while loop prematurely
We can break an infinite loop if some condition is satisfied
In [39]: i = 0
while True: #while True is an easy way to make an infinite loop
print(i)
i = i+1
if i > 5:
print('Breaking')
break
print('Finished')
0
1
2
3
4
5
Breaking
Finished
Continue : to jump back to top of the while loop, rather than stopping it.
Stops the current iteration and continue with the next one.
In [40]: i = 0
while i <= 5:
i = i+1
if i ==3:
print('SKipping 3')
continue
print(i)
1
2
SKipping 3
4
5
6
Multi Value Data Types and Sequences
Lists
Used to store items
Square Brackets are used []
Mutable: Can Change length, elements, elements values
In [41]: porosity = [0.3,0.41,0.51,0.1]
In [42]: #indexing
porosity[3]
Out[42]: 0.1
In [43]: # empty lists are used heavily to populate it later during the program
empty = []
i = 5
while i < 10:
empty.append(i)
i = i+1
empty
Out[43]: [5, 6, 7, 8, 9]
In [44]: #STrings are also like a list of Characters, so indexing operators are
also used on strings.
String = 'Petroleum'
String[1]
Out[44]: 'e'
In [45]: #List operations
#itmens at certain index can be reassigned:
a = [7,2,9,8,10]
a[1] = 999
a
Out[45]: [7, 999, 9, 8, 10]
In [46]: #addition(concating lists)
a =[1,2,3]
b =[4,5,6]
c = a+b
c
Out[46]: [1, 2, 3, 4, 5, 6]
In [47]: #Multiply
a*3
Out[47]: [1, 2, 3, 1, 2, 3, 1, 2, 3]
In [48]: # in operator
1 in a
Out[48]: True
In [49]: #not operator
4 not in a
Out[49]: True
In [50]: #List Functions
#append: adding an item to the end of an existing list
a = [1,2,3]
a.append(4)
a
Out[50]: [1, 2, 3, 4]
In [51]: #insert: Like append but we can insert a new item at any position in li
st.
a = [1,2,3,4,5,6,7]
a.insert(4,'PETROLEUM')
a
Out[51]: [1, 2, 3, 4, 'PETROLEUM', 5, 6, 7]
In [52]: #Slicing. Just like the name says.
#It helps cut a slice (sub-part) off a List.
superset = [1,2,3,4,5,6,7,8,9]
#Slicing syntax - listname[start:stop:step] start is included. stop is
not
subset1 = superset[0:5:2]
print(subset1)
subset2 = superset[:]
print(subset2) #Skipping a part also works for first and end indices.
subset3 = superset[:-1] #starting to ending -1
print(subset3)
subset4 = superset[:] #everything
print(subset4)
[1, 3, 5]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
In [53]: #Reversing the list
reverse_set = superset[-1::-2]
#Start at -1 th index, end at 0th, take negative steps.
print(reverse_set)
[9, 7, 5, 3, 1]
In [54]: #Same can be applied to strings
name = 'Petroleum_Engineering'
print(name[0])
print(name[-1])
print(name[0:5])
print(name[::2])
print(name[-1::-1])
P
g
Petro
PtoemEgneig
gnireenignE_muelorteP
Tuples
A tuple is a collection which is ordered and unchangeable
parenthesis are used ()
Immutable
In [55]: a = (1,2,3,4,'Hello')
In [56]: #can be accessed with index
a[4]
Out[56]: 'Hello'
In [57]: a[4] = 7
---------------------------------------------------------------------
------
TypeError Traceback (most recent call
last)
<ipython-input-57-cb3ce5dc8467> in <module>
----> 1 a[4] = 7
TypeError: 'tuple' object does not support item assignment
Dictionaries
Helps store data with labels
Syntax => { key : value }
Has no order
Can be directly converted to DataFrames (tables)
In [58]: rock_properties = {'poro' : 0.25, 'perm' : 150 , 'lithology' : 'Limesto
ne'}
print(rock_properties)
{'poro': 0.25, 'perm': 150, 'lithology': 'Limestone'}
In [59]: rock_properties['poro']
Out[59]: 0.25
In [60]: rock_properties['lithology'] = 'Shale'
In [61]: rock_properties
Out[61]: {'poro': 0.25, 'perm': 150, 'lithology': 'Shale'}
Sets
Curly braces are used just like dictionaries
Unordered: Can't be indexed
Can't contain duplicate values
Faster Than List
In [62]: a = {1,2,3,4,5,6,7,1,2,2}
In [63]: a
Out[63]: {1, 2, 3, 4, 5, 6, 7}
Summary of Data Structures
1. Dictionary - Key:Value, mutable
2. Lists - Mutable, Empty lists are used heavily to populate it later during the program
3. Set - Uniqueness of Elements
4. Tuples - Data cannot be changed
for loops
The tool with which we can utilize the power of computers
We can perform repetition of a command a 1000 times in 1 second.
Iterations are always performed on Iterables (ordered-collections).
Examples of iterables - lists, strings etc
In [64]: words = ['Hello', 'people','of', 'PDEU']
In [65]: for i in words:
print(i)
Hello
people
of
PDEU
In [66]: Specific_gravity = [0.2,0.3,0.4,0.87,0.9,1]
for i in Specific_gravity:
api = (141.5/i) - 131.5
print('API gravity corresponding to Specific Gravity', i, 'is', ap
i)
API gravity corresponding to Specific Gravity 0.2 is 576.0
API gravity corresponding to Specific Gravity 0.3 is 340.166666666666
7
API gravity corresponding to Specific Gravity 0.4 is 222.25
API gravity corresponding to Specific Gravity 0.87 is 31.143678160919
535
API gravity corresponding to Specific Gravity 0.9 is 25.7222222222222
3
API gravity corresponding to Specific Gravity 1 is 10.0
Functions
Instead of writing code again and again we can create a function for different values, we can write a function
and call that whenever we want to do the calculations
In [67]: def Function():
print('Use of Function')
In [68]: #Calling Function
Function()
Use of Function
In [69]: #Returning from a function
def add(x,y):
return x+y
In [70]: add(2,3)
Out[70]: 5
In [71]: #Once we return from a function, it stops being executed, any code writ
en after the return will never be executed
def f(x,y,z):
return x/y +z
print('Hello')
In [72]: f(4,2,4)
Out[72]: 6.0
In [73]: def api(x):
api = 141.5/x - 131.5
print('The api gravity is',round(api))
In [74]: api(0.9)
The api gravity is 26
Lambda Function
Single line function
In [75]: api_lambda = lambda x : 141.5/x - 131.5
In [76]: api_lambda(0.9)
Out[76]: 25.72222222222223
List Comprehensions
Quickly creating lists whose contents obeys a simple rule
In [77]: cubes = [i**3 for i in range(10)]
cubes
Out[77]: [0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
In [78]: even_square = [i**2 for i in range(10) if i%2 ==0]
even_square
Out[78]: [0, 4, 16, 36, 64]
End of Day 1 session
Numpy
1. NumPy stands for Numerical Python. It is a Linear Algebra Library for Python.
2. Numpy is also incredibly fast, as it has bindings to C libraries. So, NumPy operations help in
computational efficiency.
3. NumPy is famous for its object - NumPy arrays. Which helps store collection (just like list) of numbers in
form of an object that can be treated and manipulated just like a number.
In [281]: #importing numpy
import numpy as np
Numpy Arrays
In [282]: #Creating Numpy array from a python list
a=[1,2,3,4,5]
arr = np.array(a)
arr
Out[282]: array([1, 2, 3, 4, 5])
In [283]: type(arr)
Out[283]: numpy.ndarray
In [284]: type(a)
Out[284]: list
In [214]: a=[1,2,3,4,5]
b=[4,5,6,7,8]
arra = np.array(a)
arrb = np.array(b)
print(a+b) #Concatation of lists not addition
print(arra+arrb)#addition of elements of array
[1, 2, 3, 4, 5, 4, 5, 6, 7, 8]
[ 5 7 9 11 13]
List cannot directly handle arithmetic operations while array can.
In [285]: #Multidimensional Arrays(Matrix): Passing lists of list in np.array
My_Matrix = [[1,2,3],[4,5,6],[7,8,9]]
m = np.array(My_Matrix)
m
Out[285]: array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [286]: #Shape sttribute to find the shape of array
m.shape
Out[286]: (3, 3)
In [217]: #Using Built in Methods for generating array
Pressures = np.arange(0,5500,500) #Making array of pressure from 0 to
5000 psi with a step size of 500 psi
Pressures
Out[217]: array([ 0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 50
00])
In [218]: #Linspace: Evenly spaced numbers over a specified interval
saturations = np.linspace(0,1,100)
#Both start and stop values are included as the first and last values
of the array.
#Creates saturation array with 100 values: starting from 0-100
saturations
Out[218]: array([0. , 0.01010101, 0.02020202, 0.03030303, 0.04040404,
0.05050505, 0.06060606, 0.07070707, 0.08080808, 0.09090909,
0.1010101 , 0.11111111, 0.12121212, 0.13131313, 0.14141414,
0.15151515, 0.16161616, 0.17171717, 0.18181818, 0.19191919,
0.2020202 , 0.21212121, 0.22222222, 0.23232323, 0.24242424,
0.25252525, 0.26262626, 0.27272727, 0.28282828, 0.29292929,
0.3030303 , 0.31313131, 0.32323232, 0.33333333, 0.34343434,
0.35353535, 0.36363636, 0.37373737, 0.38383838, 0.39393939,
0.4040404 , 0.41414141, 0.42424242, 0.43434343, 0.44444444,
0.45454545, 0.46464646, 0.47474747, 0.48484848, 0.49494949,
0.50505051, 0.51515152, 0.52525253, 0.53535354, 0.54545455,
0.55555556, 0.56565657, 0.57575758, 0.58585859, 0.5959596 ,
0.60606061, 0.61616162, 0.62626263, 0.63636364, 0.64646465,
0.65656566, 0.66666667, 0.67676768, 0.68686869, 0.6969697 ,
0.70707071, 0.71717172, 0.72727273, 0.73737374, 0.74747475,
0.75757576, 0.76767677, 0.77777778, 0.78787879, 0.7979798 ,
0.80808081, 0.81818182, 0.82828283, 0.83838384, 0.84848485,
0.85858586, 0.86868687, 0.87878788, 0.88888889, 0.8989899 ,
0.90909091, 0.91919192, 0.92929293, 0.93939394, 0.94949495,
0.95959596, 0.96969697, 0.97979798, 0.98989899, 1. ])
In [287]: #np.zeros
z = np.zeros(3)
z
Out[287]: array([0., 0., 0.])
In [288]: zm = np.zeros((3,3))
zm
Out[288]: array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
In [289]: #np.ones
o = np.ones(3)
o
Out[289]: array([1., 1., 1.])
In [290]: mo = np.ones((3,2))
mo
Out[290]: array([[1., 1.],
[1., 1.],
[1., 1.]])
In [291]: #eye: Creates an identity matrix
e = np.eye(3)
e
Out[291]: array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
In [297]: #Random: Numpy also has lots of ways to create random number arrays:
#Rand : Create an array of the given shape and populate it with random
samples from a uniform distribution over [0, 1)
rand = np.random.rand(2) #shape input
rand
Out[297]: array([0.79735451, 0.15014681])
In [298]: np.random.rand(5,5)
Out[298]: array([[0.99594475, 0.8458881 , 0.08822451, 0.16634309, 0.54486434],
[0.54400554, 0.44326835, 0.77324264, 0.51943594, 0.33831159],
[0.44580354, 0.57897933, 0.83165121, 0.44414091, 0.27359894],
[0.08399844, 0.18347876, 0.68942538, 0.99359697, 0.14182709],
[0.37589459, 0.12811173, 0.17572611, 0.11711842, 0.7414471 ]])
In [299]: #randn : Return a sample (or samples) from the "standard normal" distr
ibution(mean =0,standard deviation=1). Unlike rand which is uniform.
np.random.randn(2)
Out[299]: array([ 1.44380265, -0.44596587])
In [300]: np.random.randn(5,5)
Out[300]: array([[ 0.04106163, -1.02717883, 0.7577701 , 0.87903755, -1.431851
31],
[ 0.91525872, 0.86849532, -0.22200836, -1.71855818, -1.385504
28],
[-0.46713366, -0.41038961, 0.57663796, -0.09966134, 0.274207
21],
[-1.76228958, 0.10605052, -1.6727492 , -0.4981573 , -0.552070
81],
[ 0.15349532, 0.20697246, 1.15074355, 0.69342804, 0.224691
4 ]])
In [306]: #randint : Return random integers from low (inclusive) to high (exclus
ive).
a = np.random.randint(1,100,10)
a
Out[306]: array([ 4, 28, 29, 8, 64, 62, 38, 46, 51, 85])
In [307]: np.random.randint(1,100,10)
Out[307]: array([67, 97, 43, 19, 78, 94, 44, 67, 46, 8])
In [308]: #np.random.normal : Return random values sample from 'normal' distribu
tion
#s = np.random.normal(mean, std, no. of points)
poro = np.abs(np.random.normal(0.25,0.01,20))
poro
Out[308]: array([0.24891562, 0.24185109, 0.24929802, 0.24452465, 0.23943791,
0.23800566, 0.24806901, 0.25782996, 0.26587456, 0.24491005,
0.25495622, 0.24106408, 0.25900647, 0.24896522, 0.23589043,
0.2357999 , 0.25466743, 0.23189635, 0.25698376, 0.23932708])
In [231]: #import matplotlib.pyplot as plt
#poro = np.random.normal(0.25,0.01,20)
#count, bins, ignored = plt.hist(poro, 30, density=True)
#plt.plot(bins, 1/(0.01 * np.sqrt(2 * np.pi)) *
# np.exp( - (bins - 0.25)**2 / (2 * 0.01**2) ),
# linewidth=2, color='r')
#plt.show()
In [309]: #Operations on Arrays
a =np.array([1,2,3,4])
b =np.array([4,5,6,7])
a+b
Out[309]: array([ 5, 7, 9, 11])
In [310]: a*b
Out[310]: array([ 4, 10, 18, 28])
In [311]: a/b
Out[311]: array([0.25 , 0.4 , 0.5 , 0.57142857])
In [312]: a**b
Out[312]: array([ 1, 32, 729, 16384], dtype=int32)
In [313]: #Dot Product
a.dot(b)
Out[313]: 60
In [314]: #len function
len(a)
Out[314]: 4
In [315]: z = np.array([a,a**b])
z
Out[315]: array([[ 1, 2, 3, 4],
[ 1, 32, 729, 16384]])
In [316]: len(z) #number of rows
Out[316]: 2
In [317]: z.T #Transpose
Out[317]: array([[ 1, 1],
[ 2, 32],
[ 3, 729],
[ 4, 16384]])
In [318]: len(z.T)
Out[318]: 4
In [319]: #dtype : to see the datatype of elements in array
z.dtype
Out[319]: dtype('int32')
In [320]: #astype: TO cast to a specific datatype
z.astype(float)
Out[320]: array([[1.0000e+00, 2.0000e+00, 3.0000e+00, 4.0000e+00],
[1.0000e+00, 3.2000e+01, 7.2900e+02, 1.6384e+04]])
In [325]: #Maths Functions for min,max,mean,std values
poro = np.abs(np.random.normal(0.25,0.01,2000))
poro
Out[325]: array([0.23618817, 0.25324195, 0.24200402, ..., 0.24714725, 0.2417012
7,
0.24741424])
In [326]: poro.max()
Out[326]: 0.2943737318970742
In [327]: poro.min()
Out[327]: 0.21559888678906225
In [328]: poro.mean()
Out[328]: 0.24987956068441375
In [329]: poro.std()
Out[329]: 0.010131068219927063
In [330]: #Numpy Universal Array Functions - Numpy comes with many universal arr
ay functions
#which are essentially just mathematical operations you can use to per
form the operation across the array
arr = np.random.randint(1,100,10)
arr
Out[330]: array([ 4, 43, 70, 49, 64, 26, 21, 12, 72, 93])
In [331]: #Taking Square Roots
np.sqrt(arr)
Out[331]: array([2. , 6.55743852, 8.36660027, 7. , 8. ,
5.09901951, 4.58257569, 3.46410162, 8.48528137, 9.64365076])
In [332]: #Calcualting exponential (e^)
np.exp(arr)
Out[332]: array([5.45981500e+01, 4.72783947e+18, 2.51543867e+30, 1.90734657e+2
1,
6.23514908e+27, 1.95729609e+11, 1.31881573e+09, 1.62754791e+0
5,
1.85867175e+31, 2.45124554e+40])
In [333]: np.sin(arr)
Out[333]: array([-0.7568025 , -0.83177474, 0.77389068, -0.95375265, 0.9200260
4,
0.76255845, 0.83665564, -0.53657292, 0.25382336, -0.9482821
4])
In [334]: np.log(arr)
Out[334]: array([1.38629436, 3.76120012, 4.24849524, 3.8918203 , 4.15888308,
3.25809654, 3.04452244, 2.48490665, 4.27666612, 4.53259949])
Pandas
Ms Excel of Python but powerful This library helps us import | create | work with data in the form of tables.
The tables are called DataFrames.
1. We can directly convert a Dictionary into a DataFrame.
2. We can import excel-sheets or CSV files (most popular) into DF.
3. We can manipulate and use these tables in a user-friendly way.
In [335]: #Converting Dictionary into DataFrame
#Step 1: Import Pandas with an alias 'pd'
import pandas as pd
#Step 2: Create your dictionary
Rock_Properties = {'phi': [0.2,0.40,0.30,0.25,0.270],
'perm': [100,20,150,130,145],
'lith': ['sandstone','shale','limestone','limestone','
sandstone']}
#Step 3: Create your Table.
rock_table = pd.DataFrame(Rock_Properties)
#Step 4: Print your table.
rock_table
Out[335]:
phi perm lith
0 0.20 100 sandstone
1 0.40 20 shale
2 0.30 150 limestone
3 0.25 130 limestone
4 0.27 145 sandstone
In [336]: #Adding New Column
rock_table['Saturation'] = [0.14,0.25,0.45,0.37,0.28]
rock_table
Out[336]:
phi perm lith Saturation
0 0.20 100 sandstone 0.14
1 0.40 20 shale 0.25
2 0.30 150 limestone 0.45
3 0.25 130 limestone 0.37
4 0.27 145 sandstone 0.28
In [337]: #Importing from csv or excel files
In [338]: volve = pd.read_csv('vpd.csv')
#Similarly excel file can be read by-
#df = pd.read_excel('\path\filename.csv')
In [343]: volve.head(10)
Out[343]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS AVG_DOWNHOLE_PRE
0 07-Apr-14 7405 15/9-F-1 C 0.0
1 08-Apr-14 7405 15/9-F-1 C 0.0
2 09-Apr-14 7405 15/9-F-1 C 0.0
3 10-Apr-14 7405 15/9-F-1 C 0.0
4 11-Apr-14 7405 15/9-F-1 C 0.0
5 12-Apr-14 7405 15/9-F-1 C 0.0
6 13-Apr-14 7405 15/9-F-1 C 0.0
7 14-Apr-14 7405 15/9-F-1 C 0.0
8 15-Apr-14 7405 15/9-F-1 C 0.0
9 16-Apr-14 7405 15/9-F-1 C 0.0
In [354]: volve.describe()
Out[354]:
NPD_WELL_BORE_CODE ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPE
count 15634.000000 15349.000000 8980.000000
mean 5908.581745 19.994172 181.803870
std 649.231622 8.369911 109.712365
min 5351.000000 0.000000 0.000000
25% 5599.000000 24.000000 0.000000
50% 5693.000000 24.000000 232.897000
75% 5769.000000 24.000000 255.401250
max 7405.000000 25.000000 397.589000
In [346]: #shape
volve.shape
Out[346]: (15634, 19)
In [347]: #columns to get output of columns name
volve.columns
Out[347]: Index(['DATEPRD', 'NPD_WELL_BORE_CODE', 'NPD_WELL_BORE_NAME', 'ON_STR
EAM_HRS',
'AVG_DOWNHOLE_PRESSURE', 'AVG_DOWNHOLE_TEMPERATURE', 'AVG_DP_T
UBING',
'AVG_ANNULUS_PRESS', 'AVG_CHOKE_SIZE_P', 'AVG_CHOKE_UOM', 'AVG
_WHP_P',
'AVG_WHT_P', 'DP_CHOKE_SIZE', 'BORE_OIL_VOL', 'BORE_GAS_VOL',
'BORE_WAT_VOL', 'BORE_WI_VOL', 'FLOW_KIND', 'WELL_TYPE'],
dtype='object')
In [349]: print(volve['NPD_WELL_BORE_NAME'].value_counts())
15/9-F-4 3327
15/9-F-5 3306
15/9-F-14 3056
15/9-F-12 3056
15/9-F-11 1165
15/9-F-15 D 978
15/9-F-1 C 746
Name: NPD_WELL_BORE_NAME, dtype: int64
In [350]: volve.groupby(['NPD_WELL_BORE_NAME']).agg({'NPD_WELL_BORE_NAME':'count
'})
Out[350]:
NPD_WELL_BORE_NAME
NPD_WELL_BORE_NAME
15/9-F-1 C 746
15/9-F-11 1165
15/9-F-12 3056
15/9-F-14 3056
15/9-F-15 D 978
15/9-F-4 3327
15/9-F-5 3306
In [352]: #Conditional Dataframe Slicing
pf12 = volve['NPD_WELL_BORE_NAME'] == '15/9-F-12' #Give Boolean
volve_pf12 = volve[pf12]
In [353]: volve_pf12.head()
Out[353]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS AVG_DOWNHOLE_
1911 12-Feb-08 5599 15/9-F-12 11.50
1912 13-Feb-08 5599 15/9-F-12 24.00
1913 14-Feb-08 5599 15/9-F-12 22.50
1914 15-Feb-08 5599 15/9-F-12 23.15
1915 16-Feb-08 5599 15/9-F-12 24.00
In [265]: #Statistical description of Dataset
volve_pf12.describe()
Out[265]:
NPD_WELL_BORE_CODE ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPE
count 3056.0 3056.000000 3050.000000
mean 5599.0 21.336489 80.729069
std 0.0 6.889030 120.086898
min 5599.0 0.000000 0.000000
25% 5599.0 24.000000 0.000000
50% 5599.0 24.000000 0.000000
75% 5599.0 24.000000 239.423000
max 5599.0 25.000000 317.701000
In [355]: #info: Information of datatypes and count of null values
volve_pf12.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3056 entries, 1911 to 4966
Data columns (total 19 columns):
DATEPRD 3056 non-null object
NPD_WELL_BORE_CODE 3056 non-null int64
NPD_WELL_BORE_NAME 3056 non-null object
ON_STREAM_HRS 3056 non-null float64
AVG_DOWNHOLE_PRESSURE 3050 non-null float64
AVG_DOWNHOLE_TEMPERATURE 3050 non-null float64
AVG_DP_TUBING 3050 non-null float64
AVG_ANNULUS_PRESS 3043 non-null float64
AVG_CHOKE_SIZE_P 3012 non-null float64
AVG_CHOKE_UOM 3056 non-null object
AVG_WHP_P 3056 non-null float64
AVG_WHT_P 3056 non-null float64
DP_CHOKE_SIZE 3056 non-null float64
BORE_OIL_VOL 3056 non-null float64
BORE_GAS_VOL 3056 non-null float64
BORE_WAT_VOL 3056 non-null float64
BORE_WI_VOL 0 non-null float64
FLOW_KIND 3056 non-null object
WELL_TYPE 3056 non-null object
dtypes: float64(13), int64(1), object(5)
memory usage: 477.5+ KB
In [267]: import seaborn as sns
sns.heatmap(volve_pf12.isnull())
Out[267]: <matplotlib.axes._subplots.AxesSubplot at 0x27e4bb9bcc8>
In [356]: #Dropping Coloumns
volve_pf12.drop(['NPD_WELL_BORE_CODE','BORE_WI_VOL','NPD_WELL_BORE_NAM
E'],axis = 1, inplace = True)
volve_pf12
Out[356]:
DATEPRD ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPERATURE
1911 12-Feb-08 11.50 308.056 104.418
1912 13-Feb-08 24.00 303.034 105.403
1913 14-Feb-08 22.50 295.586 105.775
1914 15-Feb-08 23.15 297.663 105.752
1915 16-Feb-08 24.00 295.936 105.811
... ... ... ...
4962 13-Sep-16 0.00 0.000
4963 14-Sep-16 0.00 0.000
4964 15-Sep-16 0.00 0.000
4965 16-Sep-16 0.00 0.000
4966 17-Sep-16 0.00 0.000
3056 rows × 16 columns
In [357]: volve_pf12.set_index('DATEPRD',inplace = True)
In [358]: volve_pf12.head()
Out[358]:
ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPERATURE
DATEPRD
12-Feb-08 11.50 308.056 104.418
13-Feb-08 24.00 303.034 105.403
14-Feb-08 22.50 295.586 105.775
15-Feb-08 23.15 297.663 105.752
16-Feb-08 24.00 295.936 105.811
In [360]: volve_pf12['AVG_DOWNHOLE_PRESSURE']
Out[360]: DATEPRD
12-Feb-08 308.056
13-Feb-08 303.034
14-Feb-08 295.586
15-Feb-08 297.663
16-Feb-08 295.936
...
13-Sep-16 0.000
14-Sep-16 0.000
15-Sep-16 0.000
16-Sep-16 0.000
17-Sep-16 0.000
Name: AVG_DOWNHOLE_PRESSURE, Length: 3056, dtype: float64
In [361]: volve_pf12[['AVG_DOWNHOLE_PRESSURE']]
Out[361]:
AVG_DOWNHOLE_PRESSURE
DATEPRD
12-Feb-08 308.056
13-Feb-08 303.034
14-Feb-08 295.586
15-Feb-08 297.663
16-Feb-08 295.936
... ...
13-Sep-16 0.000
14-Sep-16 0.000
15-Sep-16 0.000
16-Sep-16 0.000
17-Sep-16 0.000
3056 rows × 1 columns
In [362]: a =volve_pf12[['AVG_DOWNHOLE_PRESSURE','BORE_OIL_VOL']]
In [363]: a
Out[363]:
AVG_DOWNHOLE_PRESSURE BORE_OIL_VOL
DATEPRD
12-Feb-08 308.056 285.0
13-Feb-08 303.034 1870.0
14-Feb-08 295.586 3124.0
15-Feb-08 297.663 2608.0
16-Feb-08 295.936 3052.0
... ... ...
13-Sep-16 0.000 0.0
14-Sep-16 0.000 0.0
15-Sep-16 0.000 0.0
16-Sep-16 0.000 0.0
17-Sep-16 0.000 0.0
3056 rows × 2 columns
In [364]: #index number
volve_pf12.iloc[2]
Out[364]: ON_STREAM_HRS 22.5
AVG_DOWNHOLE_PRESSURE 295.586
AVG_DOWNHOLE_TEMPERATURE 105.775
AVG_DP_TUBING 181.868
AVG_ANNULUS_PRESS 12.66
AVG_CHOKE_SIZE_P 31.25
AVG_CHOKE_UOM %
AVG_WHP_P 113.718
AVG_WHT_P 72.738
DP_CHOKE_SIZE 80.12
BORE_OIL_VOL 3124
BORE_GAS_VOL 509955
BORE_WAT_VOL 1
FLOW_KIND production
WELL_TYPE OP
Name: 14-Feb-08, dtype: object
In [365]: #index name
volve_pf12.loc['14-Feb-08']
Out[365]: ON_STREAM_HRS 22.5
AVG_DOWNHOLE_PRESSURE 295.586
AVG_DOWNHOLE_TEMPERATURE 105.775
AVG_DP_TUBING 181.868
AVG_ANNULUS_PRESS 12.66
AVG_CHOKE_SIZE_P 31.25
AVG_CHOKE_UOM %
AVG_WHP_P 113.718
AVG_WHT_P 72.738
DP_CHOKE_SIZE 80.12
BORE_OIL_VOL 3124
BORE_GAS_VOL 509955
BORE_WAT_VOL 1
FLOW_KIND production
WELL_TYPE OP
Name: 14-Feb-08, dtype: object
In [276]: #Plotting the values with value of date on x axis
#inbuilt plot function of pandas
volve_pf12.plot(figsize = (12,10),subplots = True)
Out[276]: array([<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4BC
C6A88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D5
41F48>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4BC
DCEC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D5
B0308>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D5
DEEC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D6
1D0C8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D6
50FC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D8
E9F88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D8
F3B08>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D9
2CB88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D9
91F08>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D9
CBE88>],
dtype=object)
Visualization
Library = MatplotLib
In [366]: #Step 1: Import the library(s).
import matplotlib.pyplot as plt
#Step 2: create numpy arrays for x and y.
x = np.linspace(-10,10)
y = x**2
#Step 3: Plot now.
plt.plot(x,y)
Out[366]: [<matplotlib.lines.Line2D at 0x27e4e647108>]
In [368]: #Customization
plt.style.use('dark_background')
plt.figure(figsize=(6,4)) #6X6 canvas.
# plt.style.use('default')
# 1. generate the plot. Add a label.
plt.plot(x,y,label='It is a parabola')
#2. Set x axis label
plt.xlabel('This is X-Axis')
#3. Set y axis label.
plt.ylabel('This is Y-Axis')
#4. Set the title.
plt.title('TITLE here.')
#5. set the grid.
plt.grid(True)
#6. display the label in a legend.
plt.legend(loc='best')
Out[368]: <matplotlib.legend.Legend at 0x27e4e9c9ac8>
In [279]: print(plt.style.available)
['bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggp
lot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-d
ark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'se
aborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel',
'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', '
seaborn-whitegrid', 'seaborn', 'Solarize_Light2', 'tableau-colorblind
10', '_classic_test']
In [369]: plt.figure(figsize=(16,9))
plt.plot(pd.to_datetime(volve_pf12.index),volve_pf12['BORE_OIL_VOL'])
plt.xlabel('Time')
plt.ylabel('Oil Production')
plt.title('Oil Production vs Time')
Out[369]: Text(0.5, 1.0, 'Oil Production vs Time')
End of Day-2 Session