0% found this document useful (0 votes)
39 views

Optimizing Python Code With Pandas - Chapter1

This document discusses optimizing Python code with Pandas by measuring performance, locating rows and columns efficiently, and sampling random rows and columns. It shows how to measure code execution time with the time module and compares the speed of different Pandas methods for locating and sampling data, demonstrating performance improvements of up to 173% by using the most efficient methods.

Uploaded by

ums kams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Optimizing Python Code With Pandas - Chapter1

This document discusses optimizing Python code with Pandas by measuring performance, locating rows and columns efficiently, and sampling random rows and columns. It shows how to measure code execution time with the time module and compares the speed of different Pandas methods for locating and sampling data, demonstrating performance improvements of up to 173% by using the most efficient methods.

Uploaded by

ums kams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Why we need ef cient code and

how to measure it
O P T I M I Z I N G P Y T H O N C O D E W I T H PA N D A S

Leonidas Souliotis
PhD Researcher
The poker dataset
S1 R1 S2 R2 S3 R3 S4 R4 S5 R5

1 1 10 3 11 3 13 4 4 2 1

2 2 11 2 13 2 10 2 12 2 1

3 3 12 3 11 3 13 3 10 3 1
1. Hearts

2. Diamonds

3. Clubs

4. Spades

OPTIMIZING PYTHON CODE WITH PANDAS


How do we measure time?
import time

start_time = time.time()
result = 5 + 2
print("Results from the first method calculated in %s
seconds" % (time.time() - start_time))

Results from the first method calculated


in 9.48905944824e-05 seconds

OPTIMIZING PYTHON CODE WITH PANDAS


The time.time() function
start_time = time.time()
np.sum(poker['R2'])
print("Results from the first method calculated in %s \
seconds" % (time.time() - start_time))

Results from the first method calculated in 0.000539915466309 seconds

start_time = time.time()
poker['R2'].sum()
print("Results from the second method calculated in %s \
seconds" % (time.time() - start_time))

Results from the second method calculated in 0.000655038452148 seconds

Difference in speed: 29.1814946619%

OPTIMIZING PYTHON CODE WITH PANDAS


Where time matters I
def brute_force():
res = 0
for i in range(1,1000001):
res+=i
return res

def formula():
return 1000000*1000001/2

OPTIMIZING PYTHON CODE WITH PANDAS


Where time matters II
start_time = time.time()
first_method = formula()
print("Results from the first method calculated in %s
seconds" %(time.time() - start_time))

Results from the first method calculated in 0.000108957290649 seconds

start_time = time.time()
second_method = brute_force()
print("Results from the second method calculated in %s
seconds" %(time.time() - start_time))

Results from the second method calculated in 0.174870967865 seconds

Difference in speed: 160,394.967179%

OPTIMIZING PYTHON CODE WITH PANDAS


Let's do it!
O P T I M I Z I N G P Y T H O N C O D E W I T H PA N D A S
Locate rows using the .iloc() and
.loc() functions
O P T I M I Z I N G P Y T H O N C O D E W I T H PA N D A S

Leonidas Souliotis
PhD Candidate
Locate targeted rows
rows = range(0, 500)

start_time = time.time()
data.loc[rows]
print("Results from the first method calculated in %s seconds" % (time.time() - start_time))

Results from the first method calculated in 0.001951932 seconds

start_time = time.time()
data.iloc[rows]
print("Results from the first method calculated in %s seconds" % (time.time() - start_time))

Results from the second method calculated in 0.0007140636 seconds

Difference in speed: 173.355592654%

OPTIMIZING PYTHON CODE WITH PANDAS


Locate targeted columns
start_time = time.time()
data.iloc[:,:3]
print("Results from the first method calculated in %s seconds" % (time.time() - start_time))

Results from the first method calculated in 0.00125193595886 seconds

start_time = time.time()
data[['S1', 'R1', 'S2']]
print("Results from the first method calculated in %s seconds" % (time.time() - start_time))

Results from the first method calculated in 0.000964879989624 seconds

Difference in speed: 29.7504324188%

OPTIMIZING PYTHON CODE WITH PANDAS


Let's do it!
O P T I M I Z I N G P Y T H O N C O D E W I T H PA N D A S
Select random rows using
.random()
O P T I M I Z I N G P Y T H O N C O D E W I T H PA N D A S

Leonidas Souliotis
PhD Candiadate
Sampling random rows
start_time = time.time()
poker.sample(100, axis=0)
print("Results from the second method calculated in %s seconds" % (time.time() - start_time))

Results from the first method calculated in 0.000750064849854 seconds

OPTIMIZING PYTHON CODE WITH PANDAS


Sampling random rows using .sample()
start_time = time.time()
poker.iloc[np.random.randint(low=0, high=poker.shape[0], size=100)]
print("Results from the second method calculated in %s
seconds" % (time.time() - start_time))

Results from the second method calculated in 0.00103211402893 seconds

Difference in speed: 37.6033057849%

OPTIMIZING PYTHON CODE WITH PANDAS


Sampling random columns
start_time = time.time()
poker.sample(3, axis=1)
print("Results from the second method calculated in %s seconds" %(time.time() - start_time))

Results from the second method calculated in 0.000683069229126 seconds

N = poker.shape[1]
start_time = time.time()
poker.iloc[:,np.random.randint(low=0, high=N, size=3)]
print("Results from the first method calculated in %s seconds" %(time.time() - start_time))

Results from the first method calculated in 0.0010929107666 seconds

Difference in speed: 59.9999999998%

OPTIMIZING PYTHON CODE WITH PANDAS


Let's do it!
O P T I M I Z I N G P Y T H O N C O D E W I T H PA N D A S

You might also like