0% found this document useful (0 votes)

16 views23 pages

Tips For Testing in Python 1646539645

Uploaded by

9m8cr5k72j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views23 pages

Tips For Testing in Python 1646539645

Uploaded by

9m8cr5k72j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Efficient Python Tricks and Tools for

Data Scientists

Testing - By Khuyen Tran

pytest benchmark: A Pytest Fixture to
Benchmark Your Code
$ pip install pytest-benchmark

If you want to benchmark your code while testing with pytest, try
pytest-benchmark.

To use pytest-benchmark works, add benchmark to the test

function that you want to benchmark.

# pytest_benchmark_example.py
def list_comprehension(len_list=5):
return [i for i in range(len_list)]

def test_concat(benchmark):
res = benchmark(list_comprehension)
assert res == [0, 1, 2, 3, 4]

On your terminal, type:

$ pytest pytest_benchmark_example.py
You should see the statistics of the time it takes to execute the test
functions on your terminal:

Link to pytest-benchmark.
pytest.mark.parametrize: Test Your
Functions with Multiple Inputs
If you want to test your function with different examples, use
pytest.mark.parametrize decorator.

To use pytest.mark.parametrize, add

@pytest.mark.parametrize to the test function that you want
to experiment with.

# pytest_parametrize.py
import pytest

def text_contain_word(word: str, text: str):

'''Find whether the text contains a
particular word'''

return word in text

test = [
('There is a duck in this text',True),
('There is nothing here', False)
]

@pytest.mark.parametrize('sample, expected',
test)
def test_text_contain_word(sample, expected):

word = 'duck'

assert text_contain_word(word, sample) ==

expected

In the code above, I expect the first sentence to contain the word
"duck" and expect the second sentence not to contain that word.
Let's see if my expectations are correct by running:

$ pytest pytest_parametrize.py

Sweet! 2 tests passed when running pytest.

Link to my article about pytest.

pytest parametrize twice: Test All Possible
Combinations of Two Sets of Parameters
If you want to test the combinations of two sets of parameters,
writing all possible combinations can be time-consuming and is
difficult to read.

import pytest

def average(n1, n2):

return (n1 + n2) / 2

def perc_difference(n1, n2):

return (n2 - n1)/n1 * 100

# Test the combinations of operations and

inputs
@pytest.mark.parametrize("operation, n1, n2",
[(average, 1, 2), (average, 2, 3),
(perc_difference, 1, 2), (perc_difference, 2,
3)])
def test_is_float(operation, n1, n2):
assert isinstance(operation(n1, n2),
float)
You can save your time by using pytest.mark.parametrize
twice instead.

# pytest_combination.py
import pytest

def average(n1, n2):

return (n1 + n2) / 2

def perc_difference(n1, n2):

return (n2 - n1)/n1 * 100

# Test the combinations of operations and

inputs
@pytest.mark.parametrize("operation",
[average, perc_difference])
@pytest.mark.parametrize("n1, n2", [(1, 2),
(2, 3)])
def test_is_float(operation, n1, n2):
assert isinstance(operation(n1, n2),
float)

On your terminal, run:

$ pytest -v pytest_combination.py
From the output above, we can see that all possible combinations of
the given operations and inputs are tested.
Pytest Fixtures: Use The Same Data for
Different Tests
If you want to use the same data to test different functions, use
pytest fixtures.

To use pytest fixtures, add the decorator @pytest.fixture to the

function that creates the data you want to reuse.

# pytest_fixture.py
import pytest
from textblob import TextBlob

def extract_sentiment(text: str):

"""Extract sentimetn using textblob.
Polarity is within range [-1, 1]"""

text = TextBlob(text)
return text.sentiment.polarity

@pytest.fixture
def example_data():
return 'Today I found a duck and I am
happy'

def test_extract_sentiment(example_data):
sentiment =
extract_sentiment(example_data)
assert sentiment > 0

On your terminal, type:

$ pytest pytest_fixture.py

Output:
Pytest repeat

$ pip install pytest-repeat

It is a good practice to test your functions to make sure they work

as expected, but sometimes you need to test 100 times until you
found the rare cases when the test fails. That is when pytest-repeat
comes in handy.

To use pytest-repeat, add the decorator

@pytest.mark.repeat(N) to the test function you want to
repeat N times

# pytest_repeat_example.py
import pytest
import random

def generate_numbers():
return random.randint(1, 100)

@pytest.mark.repeat(100)
def test_generate_numbers():
assert generate_numbers() > 1 and
generate_numbers() < 100
# pytest_repeat_example.py
import pytest
import random

def generate_numbers():
return random.randint(1, 100)

@pytest.mark.repeat(100)
def test_generate_numbers():
assert generate_numbers() > 1 and
generate_numbers() < 100

On your terminal, type:

pytest pytest_repeat_example.py

We can see that 100 experiments are executed and passed:

Link to pytest-repeat
pytest-sugar: Show the Failures and
Errors Instantly With a Progress Bar
$ pip install pytest-sugar

It can be frustrating to wait for a lot of tests to run before knowing

the status of the tests. If you want to see the failures and errors
instantly with a progress bar, use pytest-sugar.

pytest-sugar is a plugin for pytest. The code below shows how the
outputs will look like when running pytest.

$ pytest

Link to pytest-sugar.
Pandera: a Python Library to Validate
Your Pandas DataFrame
$ pip install pandera

The outputs of your pandas DataFrame might not be like what you
expected either due to the error in your code or the change in the
data format. Using data that is different from what you expected
can cause errors or lead to decrease performance.

Thus, it is important to validate your data before using it. A good

tool to validate pandas DataFrame is pandera. Pandera is easy to
read and use.
import pandera as pa
from pandera import check_input
import pandas as pd

df = pd.DataFrame({"col1": [5.0, 8.0, 10.0],

"col2": ["text_1", "text_2", "text_3"]})
schema = pa.DataFrameSchema(
{
"col1": pa.Column(float,
pa.Check(lambda minute: 5 <= minute)),
"col2": pa.Column(str,
pa.Check.str_startswith("text_")),
}
)
validated_df = schema(df)
validated_df

col1 col2

0 5.0 text_1

1 8.0 text_2

2 10.0 text_3

You can also use the Pandera's decorator check_input to

validates input pandas DataFrame before entering the function.
@check_input(schema)
def plus_three(df):
df["col1_plus_3"] = df["col1"] + 3
return df

plus_three(df)
DeepDiff Find Deep Differences of
Python Objects
$ pip install deepdiff

When testing the outputs of your functions, it can be frustrated to

see your tests fail because of something you don't care too much
about such as:

order of items in a list

different ways to specify the same thing such as abbreviation
exact value up to the last decimal point, etc

Is there a way that you can exclude certain parts of the object from
the comparison? That is when DeepDiff comes in handy.

from deepdiff import DeepDiff

DeepDiff can output a meaningful comparison like below:

price1 = {'apple': 2, 'orange': 3, 'banana':
[3, 2]}
price2 = {'apple': 2, 'orange': 3, 'banana':
[2, 3]}

DeepDiff(price1, price2)

{'values_changed': {"root['banana'][0]":
{'new_value': 2, 'old_value': 3},
"root['banana'][1]": {'new_value': 3,
'old_value': 2}}}

With DeepDiff, you also have full control of which characteristics

of the Python object DeepDiff should ignore. In the example below,
since the order is ignored [3, 2] is equivalent to [2, 3].

# Ignore orders

DeepDiff(price1, price2, ignore_order=True)

{}

We can also exclude certain part of our object from the comparison.
In the code below, we ignore ml and machine learning since ml
is a abbreviation of machine learning.
experience1 = {"machine learning": 2,
"python": 3}
experience2 = {"ml": 2, "python": 3}

DeepDiff(
experience1,
experience2,
exclude_paths={"root['ml']",
"root['machine learning']"},
)

{}

Compare 2 numbers up to a specific decimal point:

num1 = 0.258
num2 = 0.259

DeepDiff(num1, num2, significant_digits=2)

{}

Link to DeepDiff.
hypothesis: Property-based Testing in
Python
$ pip install hypothesis

If you want to test some properties or assumptions, it can be

cumbersome to write a wide range of scenarios. To automatically
run your tests against a wide range of scenarios and find edge cases
in your code that you would otherwise have missed, use hypothesis.

In the code below, I test if the addition of two floats is

commutative. The test fails when either x or y is NaN.

# test_hypothesis.py

from hypothesis import given

from hypothesis.strategies import floats

@given(floats(), floats())
def test_floats_are_commutative(x, y):
assert x + y == y + x

$ pytest test_hypothesis.py
Now I can rewrite my code to make it more robust against these
edge cases.

Link to hypothesis.
Deepchecks: Check Category Mismatch
Between Train and Test Set
$ pip install deepchecks

Sometimes, it is important to know if your test set contains the

same categories in the train set. If you want to check the category
mismatch between the train and test set, use Deepchecks.

In the example below, the result shows that there are 2 new
categories in the test set. They are 'd' and 'e'.

from deepchecks.checks.integrity.new_category
import CategoryMismatchTrainTest
from deepchecks.base import Dataset
import pandas as pd

train = pd.DataFrame({"col1": ["a", "b",

"c"]})
test = pd.DataFrame({"col1": ["c", "d", "e"]})

train_ds = Dataset(train, cat_features=

["col1"])
test_ds = Dataset(test, cat_features=["col1"])
CategoryMismatchTrainTest().run(train_ds,
test_ds)

Link to Deepchecks

Rhiannon
100% (1)
Rhiannon
4 pages
Capriccio Pov Libretto
100% (1)
Capriccio Pov Libretto
31 pages
KS-40 - Volume
No ratings yet
KS-40 - Volume
2 pages
AHFII Nokia AirScale Radio Description Compressed
No ratings yet
AHFII Nokia AirScale Radio Description Compressed
10 pages
HL Essay Draft
100% (2)
HL Essay Draft
7 pages
NumPy Cookbook - Second Edition - Sample Chapter
100% (4)
NumPy Cookbook - Second Edition - Sample Chapter
32 pages
Pytest Framework Part-1
No ratings yet
Pytest Framework Part-1
24 pages
Improve Your Python Code Automatically
No ratings yet
Improve Your Python Code Automatically
16 pages
Py Introduction
No ratings yet
Py Introduction
49 pages
Chapter 4
No ratings yet
Chapter 4
43 pages
Firmware Test Automation Using Open Source Tools
No ratings yet
Firmware Test Automation Using Open Source Tools
9 pages
TESTS RST
No ratings yet
TESTS RST
7 pages
03 Test Automation
No ratings yet
03 Test Automation
39 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Testing Machine Learning Systems - Code, Data and Models - Made With ML
No ratings yet
Testing Machine Learning Systems - Code, Data and Models - Made With ML
33 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Python in Research
No ratings yet
Python in Research
18 pages
Beginners Python Cheat Sheet PCC Testing
No ratings yet
Beginners Python Cheat Sheet PCC Testing
2 pages
Pytest Pres
No ratings yet
Pytest Pres
51 pages
Practical 7 Thsem
No ratings yet
Practical 7 Thsem
50 pages
Testing
No ratings yet
Testing
84 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Preprocessing Data For Machine Learning: Sarah Guido
No ratings yet
Preprocessing Data For Machine Learning: Sarah Guido
21 pages
Pytest
No ratings yet
Pytest
5 pages
Efficient Python Tricks and Tools For Data Scientists - by Khuyen Tran
No ratings yet
Efficient Python Tricks and Tools For Data Scientists - by Khuyen Tran
20 pages
Handouts Pytest
No ratings yet
Handouts Pytest
9 pages
Unit Testing Python-Chapter1
No ratings yet
Unit Testing Python-Chapter1
86 pages
Pytest
No ratings yet
Pytest
11 pages
Exp No. 1-3 (MLC)
No ratings yet
Exp No. 1-3 (MLC)
12 pages
Practical File Machine Learning
No ratings yet
Practical File Machine Learning
29 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
Assert and Exceptions Presentation
No ratings yet
Assert and Exceptions Presentation
4 pages
MLC Practical
No ratings yet
MLC Practical
51 pages
CS Unit 4-1
No ratings yet
CS Unit 4-1
12 pages
Unit 7: Problem Solving Real World Programming Problems
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
COMP3613 UnitTesting Lab
No ratings yet
COMP3613 UnitTesting Lab
12 pages
Why Unit Test?: Dibya Chakravorty
No ratings yet
Why Unit Test?: Dibya Chakravorty
86 pages
Machine learnine experiment by priyanka
No ratings yet
Machine learnine experiment by priyanka
6 pages
30 Python Best Practices, Tips, and Tricks by Erik Van Baaren Python Land Medium
No ratings yet
30 Python Best Practices, Tips, and Tricks by Erik Van Baaren Python Land Medium
23 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
Common Python Data Science Interview Questions1
No ratings yet
Common Python Data Science Interview Questions1
5 pages
Data Science Lab Experiments
No ratings yet
Data Science Lab Experiments
32 pages
ML Lab
No ratings yet
ML Lab
30 pages
Pytest Documentation: Release 3.5
No ratings yet
Pytest Documentation: Release 3.5
251 pages
Python Lab File Example
No ratings yet
Python Lab File Example
20 pages
2C Slides
No ratings yet
2C Slides
33 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
Testing in Data Science
No ratings yet
Testing in Data Science
2 pages
Debugging and Testing - Finding and Preventing Errors
No ratings yet
Debugging and Testing - Finding and Preventing Errors
8 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
Pythonexameple Code For Builtins Fumctipns
No ratings yet
Pythonexameple Code For Builtins Fumctipns
12 pages
Lecture 11
No ratings yet
Lecture 11
38 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Report Format (1) .Docx - 20240508 - 124537 - 0000
No ratings yet
Report Format (1) .Docx - 20240508 - 124537 - 0000
11 pages
Experiment 1 to 4
No ratings yet
Experiment 1 to 4
15 pages
ML Record
No ratings yet
ML Record
19 pages
Beginners Python Cheat Sheet PCC Testing PDF
No ratings yet
Beginners Python Cheat Sheet PCC Testing PDF
2 pages
DS Final
No ratings yet
DS Final
46 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Titanic - Banda Joroches
No ratings yet
Titanic - Banda Joroches
15 pages
How To Read Ports Table in RET Antenna - v3
No ratings yet
How To Read Ports Table in RET Antenna - v3
9 pages
Tabulation of Genres and Types of Songs and Poetry
No ratings yet
Tabulation of Genres and Types of Songs and Poetry
4 pages
Med. Swing Bart Howard
No ratings yet
Med. Swing Bart Howard
1 page
Eng New Syllabus
No ratings yet
Eng New Syllabus
9 pages
Advantages of Television
No ratings yet
Advantages of Television
10 pages
Rosicrucian Digest v17 n10 1939 Nov
No ratings yet
Rosicrucian Digest v17 n10 1939 Nov
40 pages
Synergy Deck Cadet Exam Preparation
No ratings yet
Synergy Deck Cadet Exam Preparation
104 pages
Microstrip Patch Antenna Design
No ratings yet
Microstrip Patch Antenna Design
19 pages
WWW Deviantart Com Supersilver467 Art Rock Out Sonia The Hedgehog TF 751364075&p
No ratings yet
WWW Deviantart Com Supersilver467 Art Rock Out Sonia The Hedgehog TF 751364075&p
5 pages
MicomRadio Operation
No ratings yet
MicomRadio Operation
20 pages
Elegy of Unity
No ratings yet
Elegy of Unity
20 pages
African Symphony - Arr Naohiro Lwai-19
No ratings yet
African Symphony - Arr Naohiro Lwai-19
1 page
Wicked All Night A Night Rebel Novel Jeaniene Frost Frost Jeaniene pdf download
100% (1)
Wicked All Night A Night Rebel Novel Jeaniene Frost Frost Jeaniene pdf download
30 pages
Text Structure and Purpose - Easy (Cũ)
No ratings yet
Text Structure and Purpose - Easy (Cũ)
16 pages
Magnet Ron
No ratings yet
Magnet Ron
164 pages
101035848-2953104-Cowboy JC Song Book
0% (1)
101035848-2953104-Cowboy JC Song Book
164 pages
Aural Grade8ABRSM
50% (2)
Aural Grade8ABRSM
3 pages
Supplementary Exercises Unit 8 b2 Key
No ratings yet
Supplementary Exercises Unit 8 b2 Key
5 pages
For Information On Testing E1 Links
No ratings yet
For Information On Testing E1 Links
3 pages
Nat Reviewer 6
No ratings yet
Nat Reviewer 6
5 pages
Handel's Messiah (Text)
No ratings yet
Handel's Messiah (Text)
5 pages
SS 00
No ratings yet
SS 00
8 pages
Czech Theatre 18
No ratings yet
Czech Theatre 18
62 pages
Network+ Guide To Networks 5 Edition
No ratings yet
Network+ Guide To Networks 5 Edition
60 pages

Tips For Testing in Python 1646539645

Uploaded by

Tips For Testing in Python 1646539645

Uploaded by

Efficient Python Tricks and Tools for

Testing - By Khuyen Tran

To use pytest-benchmark works, add benchmark to the test

On your terminal, type:

To use pytest.mark.parametrize, add

def text_contain_word(word: str, text: str):

return word in text

assert text_contain_word(word, sample) ==

Sweet! 2 tests passed when running pytest.

Link to my article about pytest.

def average(n1, n2):

def perc_difference(n1, n2):

# Test the combinations of operations and

def average(n1, n2):

def perc_difference(n1, n2):

# Test the combinations of operations and

On your terminal, run:

To use pytest fixtures, add the decorator @pytest.fixture to the

def extract_sentiment(text: str):

On your terminal, type:

$ pip install pytest-repeat

It is a good practice to test your functions to make sure they work

To use pytest-repeat, add the decorator

On your terminal, type:

We can see that 100 experiments are executed and passed:

It can be frustrating to wait for a lot of tests to run before knowing

Thus, it is important to validate your data before using it. A good

df = pd.DataFrame({"col1": [5.0, 8.0, 10.0],

You can also use the Pandera's decorator check_input to

When testing the outputs of your functions, it can be frustrated to

order of items in a list

from deepdiff import DeepDiff

DeepDiff can output a meaningful comparison like below:

With DeepDiff, you also have full control of which characteristics

DeepDiff(price1, price2, ignore_order=True)

Compare 2 numbers up to a specific decimal point:

DeepDiff(num1, num2, significant_digits=2)

If you want to test some properties or assumptions, it can be

In the code below, I test if the addition of two floats is

from hypothesis import given

Sometimes, it is important to know if your test set contains the

train = pd.DataFrame({"col1": ["a", "b",

train_ds = Dataset(train, cat_features=

You might also like