0% found this document useful (0 votes)
5 views

3 Powerful Data Structure and Software Ecosystem

Uploaded by

maxew81693
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

3 Powerful Data Structure and Software Ecosystem

Uploaded by

maxew81693
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Data Processing Using Python

Powerful Data Structure and Software


Ecosystem
ZHANG Li/Dazhuang
Nanjing University
Department of Computer Science and Technology
Department of University Basic Computer Teaching
Data Processing Using
Python

WHY
DICTIONARY?
Nanjing University
3
Why Dictionary?
Use Python to build a simple employee information table
including names and salaries. Use the table to query salary
of Niuyun.

F ile Output:
2000
# Filename: info.py
names = ['Wangdachui', 'Niuyun', 'Linling', 'Tianqi']
salaries = [3000, 2000, 4500, 8000]
print(salaries[names.index('Niuyun')]) salaries['Niuyun']

Nanjing University
Dictionary 4

Dict • What is dictionary?


A mapping type

– key

– value

– key-value pair

Nanjing University
5
Create a Dictionary
Info
• Create a dictionary

0 'Wangdachui' − directly

1 'Niuyun', − Use dict()

2 'Linling' cInfo['Niuyun']
3 'Tianqi'
S ource

>>> aInfo = {'Wangdachui': 3000, 'Niuyun':2000, 'Linling':4500, 'Tianqi':8000}


>>> info = [('Wangdachui',3000), ('Niuyun',2000), ('Linling',4500), ('Tianqi',8000)]
>>> bInfo = dict(info)
>>> cInfo = dict([['Wangdachui',3000], ['Niuyun',2000], ['Linling',4500], ['Tianqi',8000]])
>>> dInfo = dict(Wangdachui=3000, Niuyun=2000, Linling=4500, Tianqi=8000)

{'Tianqi': 8000, 'Wangdachui': 3000, 'Linling': 4500, 'Niuyun': 2000}


Nanjing University
6
Create a Dictionary
How to set the default value of salary to be 3000?

S ource

>>> aDict = {}.fromkeys(('Wangdachui', 'Niuyun', 'Linling', 'Tianqi'),3000)


>>> aDict
{'Tianqi': 3000, 'Wangdachui': 3000, 'Niuyun': 3000, 'Linling': 3000}
sorted(aDict) = ?
['Linling', 'Niuyun', 'Tianqi', 'Wangdachui']

Nanjing University
7
Generate a Dictionary

How to generate an employee information dictionary with


name and salary list?

S ource

>>> names = ['Wangdachui', 'Niuyun', 'Linling', 'Tianqi']


>>> salaries = [3000, 2000, 4500, 8000]
>>> dict(zip(names,salaries))
{'Tianqi': 8000, 'Wangdachui': 3000, 'Niuyun': 2000, 'Linling': 4500}

Nanjing University
8
Generate a Dictionary
How to generate a dictionary of company code and stock
price from data?

{'AXP': '78.51', 'BA': '184.76', 'CAT ': '96.39', 'CSCO': '33.71', 'CVX': '106.09'}

lf = [('AXP', 'American Express Company', '78.51'),


('BA', 'The Boeing Company', '184.76'),
('CAT', 'Caterpillar Inc.', '96.39'),
('CSCO', 'Cisco Systems,Inc.', '33.71'),
('CVX', 'Chevron Corporation', '106.09')]

Nanjing University
9
Generate a Dictionary
F ile

# Filename: createdict.py
pList = … pList = [('AXP', 'American Express Company', '78.51'),
aList = [] ('BA', 'The Boeing Company', '184.76'),
bList = []
for i in range(5): ('CAT', 'Caterpillar Inc.', '96.39'), …]
aStr = pList[i][0]
bStr = pList[i][2]
aList.append(aStr)
bList.append(bStr)
aDict = dict(zip(aList,bList))
print(aDict)

{'AXP': '78.51', 'BA': '184.76', 'CAT ': '96.39', 'CSCO': '33.71', 'CVX': '106.09'}

Nanjing University
Data Processing Using
Python

USING
DICTIONARY
Nanjing University
11
Basic Operation of Dictionary
S ource

>>> aInfo = {'Wangdachui': 3000, 'Niuyun':2000, 'Linling':4500, 'Tianqi':8000}


>>> aInfo['Niuyun'] Search by key
2000
>>> aInfo['Niuyun'] = 9999 update
>>> aInfo
{'Tianqi': 8000, 'Wangdachui': 3000, 'Linling': 4500, 'Niuyun': 9999}
>>> aInfo['Fuyun'] = 1000 insert
>>> aInfo
{'Tianqi': 8000, 'Fuyun': 1000, 'Wangdachui': 3000, 'Linling': 4500, 'Niuyun': 9999}
>>> 'Mayun' in aInfo Member identification
False
>>> del aInfo['Fuyun'] Delete
>>> aInfo
{'Tianqi': 8000, 'Wangdachui': 3000, 'Linling': 4500, 'Niuyun': 9999}

Nanjing University
12
Built-in Functions of Dictionary

S ource

dict() >>> names = ['Wangdachui', 'Niuyun', 'Linling', 'Tianqi']


len() >>> salaries = [3000, 2000, 4500, 8000]
>>> aInfo = dict(zip(names, salaries))
hash() >>> aInfo
{'Wangdachui': 3000, 'Linling': 4500, 'Niuyun': 2000, 'Tianqi': 8000}
>>> len(aInfo)
4

Nanjing University
13
Built-in Functions of Dictionary

S ource

>>> hash('Wangdachui')
7716305958664889313
>>> testList = [1, 2, 3]
>>> hash(testList)
Traceback (most recent call last):
File "<pyshell#127>", line 1, in <module>
hash(testList)
TypeError: unhashable type: 'list'

Nanjing University
14
Dictionary Methods
An information dictionary is known as {'Wangdachui':3000,
'Niuyun':2000, 'Linling':4500, 'Tianqi':8000},how to output the
name and salary of employee separately?
S ource

>>> aInfo = {'Wangdachui': 3000, 'Niuyun': 2000, 'Linling': 4500, 'Tianqi': 8000}
>>> aInfo.keys()
['Tianqi', 'Wangdachui', 'Niuyun', 'Linling']
>>> aInfo.values()
[8000, 3000, 2000, 4500]
>>> for k, v in aInfo.items():
print(k, v)

Nanjing University
15
Dictionary Methods
There are two dictionaries, the first one contains original
information, while the second one has some new members
and updates, how to merge and update information?

S ource

>>> aInfo = {'Wangdachui': 3000, 'Niuyun': 2000, 'Linling': 4500}


>>> bInfo = {'Wangdachui': 4000, 'Niuyun': 9999, 'Wangzi': 6000}
>>> aInfo.update(bInfo)
>>> aInfo
{'Wangzi': 6000, 'Linling': 4500, 'Wangdachui': 4000, 'Niuyun': 9999}

Nanjing University
16
Dictionary Methods
What’s the difference between the two kinds of search
operation?

S ource S ource

>>> stock = {'AXP': 78.51, 'BA': 184.76}


>>> stock['AAA'] >>> stock = {'AXP': 78.51, 'BA': 184.76}
Traceback (most recent call last): >>> print(stock.get('AAA'))
File "<stdin>", line 1, in <module> None
KeyError: 'AAA'

Nanjing University
17
Dictionary Methods
• Delete a dictiontary
S ource
Source

>>> aStock = {'AXP': 78.51, 'BA': 184.76}


>>> aStock = {'AXP': 78.51, 'BA':184.76} >>> bStock = aStock
>>> bStock = aStock >>> aStock.clear()
>>> aStock = {} >>> aStock
>>> bStock {}
{'BA': 184.76, 'AXP': 78.51} >>> bStock
{}

clear() copy() fromkeys() get() items()


method
keys() pop() setdefault() update() values()

Nanjing University
Case Study 18

• JSON format • Keyword query with search engine


− JavaScript Object Notation Baidu:
http://www.baidu.com/s?wd=%s
− A lightweight data exchange Google:
http://www.googlestable.com/search/?q=%us
format
Bing
China:http://cn.bing.com/search?q=%us
USA:http://www.bing.com/search?q=%us
>>> import requests
>>> x = {"name":"Niuyun","address":
after decode

>>> kw = {'q': 'Python dict'}


{"city":"Beijing","street":"Chaoyang Road"} >>> r = requests.get('http://cn.bing.com/search',
}
params = kw)
>>> x['address']['street']
'Chaoyang Road' >>> r.url
>>> print(r.text)

Nanjing University
Variable Length Keyword Parameter(dict)19
Parameter type in Python
function: S ource

• Position or keyword >>> def func(args1, *argst, **argsd):


parameter print(args1)
print(argst)
• Only position parameter
print(argsd)
• Variable Length Position >>> func('Hello,','Wangdachui','Niuyun','Linling',a1= 1,a2=2,a3=3)
Parameter Hello,
('Wangdachui', 'Niuyun', 'Linling')
• Variable length keyword
{'a1': 1, 'a3': 3, 'a2': 2}
parameter with default value

Nanjing University
Data Processing Using
Python

SET

Nanjing University
21
Set
How to remove the duplicate values in information form?

S ource

>>> names = ['Wangdachui', 'Niuyun', 'Wangzi', 'Wangdachui', 'Linling', 'Niuyun']


>>> namesSet = set(names)
>>> namesSet
{'Wangzi', 'Wangdachui', 'Niuyun', 'Linling'}

Nanjing University
Set 22

• What is set?
A combination of several unordered elements with
no duplicate

– Variable set(set)

– Fixed set(frozenset)

Nanjing University
23
Create a Set

S ource

>>> aSet = set('hello')


>>> aSet
{'h', 'e', 'l', 'o'}
>>> fSet = frozenset('hello')
>>> fSet
frozenset({'h', 'e', 'l', 'o'})
>>> type(aSet)
<class 'set'>
>>> type(fSet)
<class 'frozenset'>

Nanjing University
24
Comparison between Sets
S ource

Mathematic Python
>>> aSet = set('sunrise')  in
>>> bSet = set('sunset')  not in
>>> 'u' in aSet = ==
True
≠ !=
>>> aSet == bSet
False ⊂ <
>>> aSet < bSet ⊆ <=
False ⊃ >
>>> set('sun') < aSet ⊇ >=
True Standard type operators

Nanjing University
25
Relational Operation
S ource

Mathematic Python
>>> aSet = set('sunrise')
>>> bSet = set('sunset') ∩ &
>>> aSet & bSet ∪ |
{'u', 's', 'e', 'n'}
>>> aSet | bSet - or \ -
{'e', 'i', 'n', 's', 'r', 'u', 't'} Δ ^
>>> aSet - bSet
{'i', 'r'} Set type operator
>>> aSet ^ bSet
{'i', 'r', 't'} compound
>>> aSet -= set('sun')
>>> aSet assignment operators
{'e', 'i', 'r'}
&= |= -= ^=
Nanjing University
26
Built-in Function for Set
• Function can also be used
to do similar work S ource

− For all sets >>> aSet = set('sunrise')


>>> bSet = set('sunset')
s.issubset(t)
>>> aSet.issubset(bSet)
issuperset(t) False
union(t) >>> aSet.intersection(bSet)
{'u', 's', 'e', 'n'}
intersection(t) >>> aSet.difference(bSet)
difference(t) {'i', 'r'}
symmetric_difference(t) >>> cSet = aSet.copy()
>>> cSet
copy() {'s', 'r', 'e', 'i', 'u', 'n'}

Nanjing University
27
Built-in Function for Set
• Function can also be used
to do similar work S ource

− For variable sets


>>> aSet = set('sunrise')
update(t) >>> aSet.add('!')
intersection_update(t) >>> aSet
{'!', 'e', 'i', 'n', 's', 'r', 'u'}
difference_update(t) >>> aSet.remove('!')
symmetric_difference_update(t) >>> aSet
add(obj) {'e', 'i', 'n', 's', 'r', 'u'}
>>> aSet.update('Yeah')
remove(obj) >>> aSet
discard(obj) {'a', 'e', 'i', 'h', 'n', 's', 'r', 'u', 'Y'}
>>> aSet.clear()
pop() >>> aSet
clear() set()

Nanjing University
Data Processing Using
Python

SCIPY
LIBRARY
Nanjing University
29
SciPy
Feature
• A software ecosystem based on Python
• Open-source
• Serve for math, science and engineering

Nanjing University
30
Common Data Type in Python

Dict Numeric

String Set

Tuple List

Nanjing University
Other Data Structure 31

• Data Structure in SciPy


Modification based on original
Python data structure

– ndarray(n-dimension array)
– Series(dictionary with
variable length)
– DataFrame

Nanjing University
32
NumPy
Feature
• Powerful ndarray object and ufunc() function
• Ingenious funciton
• Suitable for scientific computation like linear algebra and
random number handling
• Flexible and available general multi-dimension data structure
• Easy to connect with database
S ource

>>> import numpy as np


>>> xArray = np.ones((3,4))

Nanjing University
33
SciPy
Feature
• Key package for scientific computation in Python and it is based
on NumPy. It includes richer functions and methods than NumPy
and it probably has stronger function when they have the same
functions or methods.
• Efficiently compute NumPy matrix to benefit collaboration
between NumPy and SciPy.
• Toolbox to deal with different fields in scientific computation
with modules including interpolation, integration, optimization
and image processing. S
ource

>>> import numpy as np


>>> from scipy import linalg
>>> arr = np.array([[1,2],[3,4]])
>>> linalg.det(arr)
-2.0
Nanjing University
34
Matplotlib
Feature
• Based on NumPy
• 2-dimensional graph library to rapidly generate all
kinds of graphs
• Pyplot module provides MATLAB-like interface.

Nanjing University
35
pandas
Feature
• Based on SciPy and NumPy
• Efficient Series and DataFrame structure
• Powerful Python library for scalable data processing
• Efficient solution for large dataset slides
• Optimized library function to read/write many types of
files, like CSV and HDF5
S ource


>>> df[2 : 5]
>>> df.head(4)
>>> df.tail(3)
Nanjing University
Data Processing Using
Python

NDARRAY

Nanjing University
37
Array in Python
Format

• Use data structure like list and tuple

− One-dimensional array list = [1,2,3,4]

− Two-dimensional array list = [[1,2,3],[4,5,6],[7,8,9]]

• Array module

− Create array with array(),array.array("B", range(5))

− Provide methods including append、insert and read

Nanjing University
Ndarray 38

• What is ndarray?
0 10 0
2 03 04 N-dimensional array

50 60 0
7 08 09 – Basic data type in NumPy

10
0 11
0 12
0 13
0 14
0 – Elements are of the same type

15
0 16
0 17
0 18
0 19
0
– With another name array

20
0 21
0 22
0 23
0 24
0
– Reduce memory cost and
improve the computational
efficiency
– Powerful functions

Nanjing University
Basic Concepts of Ndarray 39

axis = 1 • Ndarray attributes


ndarray

0 10 0
2 03 04 – Dimensions are called axes, the number of
axes is rank.
50 60 0
7 08 09
axis = 0

– Basic attributes
10
0 11
0 12
0 13
0 14
0
• ndarray.ndim(rank)
15
0 16
0 17
0 18
0 19
0
• ndarray.shape(dimension)
20
0 21
0 22
0 23
0 24
0
• ndarray.size(total size)
• ndarray.dtype(type of element)
• ndarray.itemsize(size of item(in byte))

Nanjing University
40
Creation of Ndarray
S ource

>>> import numpy as np arange array


>>> aArray = np.array([1,2,3]) copy empty
>>> aArray
array([1, 2, 3]) empty_like eye
>>> bArray = np.array([(1,2,3),(4,5,6)]) fromfile fromfunction
>>> bArray
array([[1, 2, 3], identity linspace
[4, 5, 6]]) logspace mgrid
>>> np.arange(1,5,0.5) ogrid ones
array([ 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
>>> np.random.random((2,2)) ones_like r
array([[ 0.79777004, 0.1468679 ], zeros zeros_like
[ 0.95838379, 0.86106278]])
>>> np.linspace(1, 2, 10, endpoint=False) Ndarray creation
array([ 1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9]) funciton

Nanjing University
41
Creation of Ndarray
S ource

>>> np.ones([2,3]) arange array


array([[ 1., 1., 1.],
[ 1., 1., 1.]]) copy empty
>>> np.zeros((2,2)) empty_like eye
array([[ 0., 0.], fromfile fromfunction
[ 0., 0.]])
>>> np.fromfunction(lambda i,j:(i+1)*(j+1), (9,9)) identity linspace
array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9.], logspace mgrid
[ 2., 4., 6., 8., 10., 12., 14., 16., 18.],
[ 3., 6., 9., 12., 15., 18., 21., 24., 27.], ogrid ones
[ 4., 8., 12., 16., 20., 24., 28., 32., 36.], ones_like r
[ 5., 10., 15., 20., 25., 30., 35., 40., 45.],
[ 6., 12., 18., 24., 30., 36., 42., 48., 54.], zeros zeros_like
[ 7., 14., 21., 28., 35., 42., 49., 56., 63.], Ndarray creation
[ 8., 16., 24., 32., 40., 48., 56., 64., 72.], funciton
[ 9., 18., 27., 36., 45., 54., 63., 72., 81.]])
Nanjing University
42
Ndarray Operations
S ource

>>> aArray = np.array([(1,2,3),(4,5,6)])


array([[1, 2, 3],
[4, 5, 6]])
>>> print(aArray[1])
10 20 0
3 [4 5 6]
>>> print(aArray[0:2])
40 50 0
6 [[1 2 3]
[4 5 6]]
>>> print(aArray[:,[0,1]])
[[1 2]
[4 5]]
>>> print(aArray[1,[0,1]])
[4 5]
>>> for row in aArray:
print(row)
[1 2 3]
[4 5 6]

Nanjing University
43
Ndarray Operations

S ource
S ource

>>> aArray.resize(3,2)
>>> aArray = np.array([(1,2,3),(4,5,6)])
>>> aArray
>>> aArray.shape
array([[1, 2],
(2, 3)
[3, 4],
>>> bArray = aArray.reshape(3,2)
[5, 6]])
>>> bArray
array([[1, 2], >>> bArray = np.array([1,3,7])
[3, 4], >>> cArray = np.array([3,5,8])
[5, 6]]) >>> np.vstack((bArray, cArray))
>>> aArray array([[1, 3, 7],
array([[1, 2, 3], [3, 5, 8]])
[4, 5, 6]]) >>> np.hstack((bArray, cArray))
array([1, 3, 7, 3, 5, 8])

Nanjing University
44
Ndarray Calculation
S ource

/ >>> aArray = np.array([(5,5,5),(5,5,5)])


- >>> bArray = np.array([(2,2,2),(2,2,2)])
> >>> cArray = aArray * bArray
+ >>> cArray
array([[10, 10, 10],
* [10, 10, 10]])
>>> aArray += bArray
>>> aArray
Use basic operators. array([[7, 7, 7],
[7, 7, 7]])
>>> a = np.array([1,2,3])
>>> b = np.array([[1,2,3],[4,5,6]])
>>> a + b
array([[2, 4, 6],
[5, 7, 9]])
Nanjing University
45
Ndarray Calculation
S ource

>>> aArray = np.array([(1,2,3),(4,5,6)])


>>> aArray.sum()
21
>>> aArray.sum(axis = 0)
array([5, 7, 9]) sum mean
>>> aArray.sum(axis = 1) std var
array([ 6, 15])
>>> aArray.min() # return value min max
1 argmin argmax
>>> aArray.argmax() # return index
5 cumsum cumprod
>>> aArray.mean()
Use basic array
3.5
statistic methods
>>> aArray.var()
2.9166666666666665
>>> aArray.std()
1.707825127659933
Nanjing University
46
Specific Application—Linear Algebra
S ource

>>> import numpy as np


>>> x = np.array([[1,2], [3,4]]) dot Inner product of matrix
>>> r1 = np.linalg.det(x) linalg.det Determinant
>>> print(r1) linalg.inv Inverse matrix
-2.0
>>> r2 = np.linalg.inv(x) linalg.solve Multiple linear equation
>>> print(r2) linalg.eig Eigenvalue and
[[-2. 1. ] eigenvector
[ 1.5 -0.5]]
>>> r3 = np.dot(x, x)
Common Functions
>>> print(r3)
[[ 7 10]
[15 22]]
Nanjing University
47
ufunc() in Ndarray
• ufunc(universal function) F ile

# Filename: math_numpy.py
can operate each element in
import time
the array. As many ufunc()s in import math
import numpy as np
NumPy are implemented by C, x = np.arange(0, 100, 0.01)
t_m1 = time.process_time()
the speed can be fast.
for i, t in enumerate(x):
x[i] = math.pow((math.sin(t)), 2)
t_m2 = time.process_time()
add, all, any, arange, apply_along_axis, y = np.arange(0,100,0.01)
argmax, argmin, argsort, average, t_n1 = time.process_time()
bincount, ceil, clip, conj, corrcoef, cov, y = np.power(np.sin(y), 2)
cross, cumprod, cumsum, diff, dot, t_n2 = time.process_time()
exp, floor, …
print('Running time of math:', t_m2 - t_m1)
print('Running time of numpy:', t_n2 - t_n1)
Nanjing University
Data Processing Using
Python

SERIES

Nanjing University
49
Series
• Basic feature
− Object similar to one-dimensional array
− Consist of data and index.
Source

>>> from pandas import Series


>>> aSer = Series([1,2.0,'a'])
>>> aSer
0 1
1 2
2 a
dtype: object
Nanjing University
50
Index of Self-defined Series
S ource

>>> bSer = pd.Series(['apple','peach','lemon'], index = [1,2,3])


>>> bSer
1 apple
2 peach
3 lemon
dtype: object
>>> bSer.index
Int64Index([1, 2, 3], dtype='int64')
>>> bSer.values
array(['apple', 'peach', 'lemon'], dtype=object)

Nanjing University
51
Basic Operation of Series
S ource

>>> aSer = pd.Series([3,5,7],index = ['a','b','c'])


>>> aSer['b']
5
>>> aSer * 2
a 6
b 10
c 14
dtype: int64
>>> import numpy as np
>>> np.exp(aSer)
a 20.085537
b 148.413159
c 1096.633158
dtype: float64

Nanjing University
52
Data Alignment of Series
S ource

>>> data = {'AXP':'86.40','CSCO':'122.64','BA':'99.44'}


>>> sindex = ['AXP','CSCO','BA','AAPL']
>>> aSer = pd.Series(data, index = sindex)
>>> aSer
AXP 86.40
CSCO 122.64
BA 99.44
AAPL NaN
dtype: object
>>> pd.isnull(aSer)
AXP False
CSCO False
BA False
AAPL True
dtype: bool

Nanjing University
53
Data Alignment of Series
• Important feature S ource

>>> aSer = pd.Series(data, index = sindex)


− Align data with >>> aSer
AXP 86.40
different indexes CSCO 122.64
BA 99.44
during computation AAPL NaN
dtype: object
>>> bSer = {'AXP':'86.40','CSCO':'122.64','CVX':'23.78'}
>>> cSer = pd.Series(bSer)
>>> aSer + cSer
AAPL NaN
AXP 86.4086.40
BA NaN
CSCO 122.64122.64
CVX NaN
dtype: object

Nanjing University
54
Data Alignment of Series
Source

• Important feature
>>> data = {'AXP':86.40,'CSCO':122.64,'BA':99.44}
− Align data with >>> aSer = pd.Series(data, index = sindex)
>>> aSer
different indexes AXP 86.40
CSCO 122.64
during computation BA 99.44
AAPL NaN
dtype: object
>>> bSer = {'AXP':86.40,'CSCO':130.64,'CVX':23.78}
>>> cSer = pd.Series(bSer)
>>> (aSer+cSer)/2
AAPL NaN
AXP 86.40
BA NaN
CSCO 126.64
CVX NaN
dtype: float64
Nanjing University
Data Processing Using
Python

DATAFRAME

Nanjing University
56
DataFrame
• Basic Feature
− A form-like data structure
− Have an ordered column(like index)
− Can be considered as a set of Series sharing the same index
S ource

>>> data = {'name': ['Wangdachui', 'Linling', 'Niuyun'], 'pay': [4000, 5000, 6000]}
>>> frame = pd.DataFrame(data)
>>> frame
name pay
0 Wangdachui 4000
1 Linling 5000
2 Niuyun 6000

Nanjing University
57
Index and Value of Dataframe
S ource

>>> data = np.array([('Wangdachui', 4000), ('Linling', 5000), ('Niuyun', 6000)])


>>> frame =pd.DataFrame(data, index = range(1, 4), columns = ['name', 'pay'])
>>> frame
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000
>>> frame.index
RangeIndex(start=1, stop=4, step=1)
>>> frame.columns
Index(['name', 'pay'], dtype='object')
>>> frame.values
array([['Wangdachui', '4000'],
['Linling', '5000'],
['Niuyun', '6000']], dtype=object)
Nanjing University
58
Basic Operation of DataFrame
• The query for row and column of name pay
DataFrame object returns Series 0 Wangdachui 4000
S ource
1 Linling 5000
2 Niuyun 6000
>>> frame['name']
0 Wangdachui
1 Linling
2 Niuyun
Name: name, dtype: object S ource

>>> frame.pay
0 4000 >>> frame.iloc[ : 2, 1]
1 5000 0 4000
2 6000 1 5000
Name: pay, dtype: int64 Name: pay, dtype: object

Nanjing University
59
Basic Operation of DataFrame
• Modification and deletion of DataFrame object

Source
S ource

>>> del frame['pay']


>>> frame['name'] = 'admin' >>> frame
>>> frame name
name pay 0 admin
0 admin 4000 1 admin
1 admin 5000 2 admin
2 admin 6000
[3 rows x 1 columns]

Nanjing University
60
Statistics with DataFrame
• Find groups with lowest and high salaries in DataFrame object members
name pay
0 Wangdachui 4000
1 Linling 5000
2 Niuyun 6000 Source

>>> frame[frame.pay >= '5000']


S ource
name pay
>>> frame.pay.min() 1 Linling 5000
'4000' 2 Niuyun 6000

Nanjing University
61
Summary

Nanjing University

You might also like