Why unit test?
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N
Dibya Chakravorty
Test Automation Engineer
How can we test an implementation?
def my_function(argument): my_function(argument_1)
...
return_value_1
my_function(argument_2)
return_value_2
my_function(argument_3)
return_value_3
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test
PASS
Accepted
implementation
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test
FAIL PASS
Accepted
Bugfix
implementation
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test
FAIL PASS
Accepted
Bugfix
implementation
Feature request
or Refactoring
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test
FAIL PASS
Accepted
Bugfix
implementation
Feature request
or Refactoring
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test
FAIL PASS
Accepted
Bugfix
implementation
Feature request
Bug found
or Refactoring
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test
FAIL PASS
Accepted
Bugfix
implementation
Feature request
Bug found
or Refactoring
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Life cycle of a function
Implementation
Test 100 times
FAIL PASS
Accepted
Bugfix
implementation
Feature request
Bug found
or Refactoring
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Example
def row_to_list(row): area (sq. ft.) price (dollars)
... 2,081 314,942
1,059 186,606
293,410
1,148 206,186
1,506 248,419
1,210 214,114
1,697 277,794
1,268 194,345
2,318 372,162
1,463238,765
1,468 239,007
File: housing_data.txt
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Data format
def row_to_list(row): area (sq. ft.) price (dollars)
... 2,081 314,942
1,059 186,606
Argument Type Return value 293,410
1,148 206,186
["2,081",
"2,081\t314,942\n" Valid 1,506 248,419
"314,942"]
1,210 214,114
1,697 277,794
1,268 194,345
2,318 372,162
1,463238,765
1,468 239,007
File: housing_data.txt
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Data isn't clean
def row_to_list(row): area (sq. ft.) price (dollars)
... 2,081 314,942
1,059 186,606
Argument Type Return value 293,410 <-- row with missing area
1,148 206,186
["2,081",
"2,081\t314,942\n" Valid 1,506 248,419
"314,942"]
1,210 214,114
"\t293,410\n" Invalid None 1,697 277,794
1,268 194,345
2,318 372,162
1,463238,765
1,468 239,007
File: housing_data.txt
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Data isn't clean
def row_to_list(row): area (sq. ft.) price (dollars)
... 2,081 314,942
1,059 186,606
Argument Type Return value 293,410 <-- row with missing area
1,148 206,186
["2,081",
"2,081\t314,942\n" Valid 1,506 248,419
"314,942"]
1,210 214,114
"\t293,410\n" Invalid None 1,697 277,794
1,268 194,345
"1,463238,765\n" Invalid None 2,318 372,162
1,463238,765 <-- row with missing tab
1,468 239,007
File: housing_data.txt
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Time spent in testing this function
def row_to_list(row): row_to_list("2,081\t314,942\n")
...
["2,081", "314,942"]
Argument Type Return value
["2,081", row_to_list("\t293,410\n")
"2,081\t314,942\n" Valid
"314,942"]
None
"\t293,410\n" Invalid None
"1,463238,765\n" Invalid None row_to_list("1,463238,765\n")
None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Time spent in testing this function
Implementation
Test 100 times
FAIL PASS
Accepted
Bugfix
implementation
Feature request
Bug found
or Refactoring
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Time spent in testing this function
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Manual testing vs. unit tests
Unit tests automate the repetitive testing process and saves time.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Learn unit testing - with a data science spin
area (sq. ft.) price (dollars)
2,081 314,942
1,059 186,606
293,410
1,148 206,186
1,506 248,419
1,210 214,114
1,697 277,794
1,268 194,345
2,318 372,162
1,463238,765
1,468 239,007 Linear regression of housing price against
area
UNIT TESTING FOR DATA SCIENCE IN PYTHON
GitHub repository of the course
Implementation of functions like row_to_list() .
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Develop a complete unit test suite
data/
src/
|-- data/
|-- features/
|-- models/
|-- visualization/
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Develop a complete unit test suite
data/
src/
|-- data/
|-- features/
|-- models/
|-- visualization/
tests/ # Test suite
|-- data/
|-- features/
|-- models/
|-- visualization/
Write unit tests for your own projects.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Let's practice these
concepts!
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N
Write a simple unit
test using pytest
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N
Dibya Chakravorty
Test Automation Engineer
Testing on the console
row_to_list("2,081\t314,942\n")
["2,081", "314,942"]
row_to_list("\t293,410\n")
None
row_to_list("1,463238,765\n")
None
Unit tests improve this process.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Python unit testing libraries
pytest
uni est
nosetests
doctest
We will use pytest!
Has all essential features.
Easiest to use.
Most popular.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Step 1: Create a file
Create test_row_to_list.py .
test_ indicate unit tests inside (naming convention).
Also called test modules.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Step 2: Imports
Test module: test_row_to_list.py
import pytest
import row_to_list
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Step 3: Unit tests are Python functions
Test module: test_row_to_list.py
import pytest
import row_to_list
def test_for_clean_row():
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Step 3: Unit tests are Python functions
Test module: test_row_to_list.py Argument Type Return value
["2,081",
import pytest "2,081\t314,942\n" Valid
"314,942"]
import row_to_list
def test_for_clean_row():
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Step 4: Assertion
Test module: test_row_to_list.py Argument Type Return value
["2,081",
import pytest "2,081\t314,942\n" Valid
"314,942"]
import row_to_list
def test_for_clean_row():
assert ...
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Theoretical structure of an assertion
assert boolean_expression
assert True
assert False
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Step 4: Assertion
Test module: test_row_to_list.py Argument Type Return value
["2,081",
import pytest "2,081\t314,942\n" Valid
"314,942"]
import row_to_list
def test_for_clean_row():
assert row_to_list("2,081\t314,942\n") == \
["2,081", "314,942"]
UNIT TESTING FOR DATA SCIENCE IN PYTHON
A second unit test
Test module: test_row_to_list.py Argument Type Return value
["2,081",
import pytest "2,081\t314,942\n" Valid
"314,942"]
import row_to_list
"\t293,410\n" Invalid None
def test_for_clean_row():
assert row_to_list("2,081\t314,942\n") == \
["2,081", "314,942"]
def test_for_missing_area():
assert row_to_list("\t293,410\n") is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Checking for None values
Do this for checking if var is None .
assert var is None
Do not do this.
assert var == None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
A third unit test
Test module: test_row_to_list.py Argument Type Return value
["2,081",
import pytest "2,081\t314,942\n" Valid
"314,942"]
import row_to_list
"\t293,410\n" Invalid None
def test_for_clean_row():
assert row_to_list("2,081\t314,942\n") == \ "1,463238,765\n" Invalid None
["2,081", "314,942"]
def test_for_missing_area():
assert row_to_list("\t293,410\n") is None
def test_for_missing_tab():
assert row_to_list("1,463238,765\n") is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Step 5: Running unit tests
Do this in the command line.
pytest test_row_to_list.py
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Running unit tests in DataCamp exercises
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Running unit tests in DataCamp exercises
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Running unit tests in DataCamp exercises
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Running unit tests in DataCamp exercises
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Next lesson: test result report
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Let's write some unit
tests!
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N
Understanding test
result report
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N
Dibya Chakravorty
Test Automation Engineer
Unit tests for row_to_list()
Test module: test_row_to_list.py Argument Type Return value
["2,081",
import pytest "2,081\t314,942\n" Valid
"314,942"]
import row_to_list
"\t293,410\n" Invalid None
def test_for_clean_row():
assert row_to_list("2,081\t314,942\n") == \ "1,463238,765\n" Invalid None
["2,081", "314,942"]
def test_for_missing_area():
assert row_to_list("\t293,410\n") is None
def test_for_missing_tab():
assert row_to_list("1,463238,765\n") is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Test result report
!pytest test_row_to_list.py
============================= test session starts ==============================
platform linux -- Python 3.6.7, pytest-4.0.1, py-1.8.0, pluggy-0.9.0
rootdir: /tmp/tmpvdblq9g7, inifile:
plugins: mock-1.10.0
collecting ...
collected 3 items
test_row_to_list.py .F. [100%]
=================================== FAILURES ===================================
____________________________ test_for_missing_area _____________________________
def test_for_missing_area():
> assert row_to_list("\t293,410\n") is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 1: general information
============================= test session starts ==============================
platform linux -- Python 3.6.7, pytest-4.0.1, py-1.8.0, pluggy-0.9.0
rootdir: /tmp/tmpvdblq9g7, inifile:
plugins: mock-1.10.0
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 2: Test result
collecting ...
collected 3 items
test_row_to_list.py .F. [100%]
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 2: Test result
collecting ...
collected 3 items
test_row_to_list.py .F. [100%]
Character Meaning When Action
An exception is raised when running unit Fix the function or unit
F Failure
test. test.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 2: Test result
collecting ...
collected 3 items
test_row_to_list.py .F. [100%]
Character Meaning When Action
An exception is raised when running unit Fix the function or unit
F Failure
test. test.
assertion raises AssertionError
def test_for_missing_area():
assert row_to_list("\t293,410") is None # AssertionError from this line
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 2: Test result
collecting ...
collected 3 items
test_row_to_list.py .F. [100%]
Character Meaning When Action
An exception is raised when running unit Fix the function or unit
F Failure
test. test.
another exception
def test_for_missing_area():
assert row_to_list("\t293,410") is none # NameError from this line
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 2: Test result
collecting ...
collected 3 items
test_row_to_list.py .F. [100%]
Character Meaning When Action
An exception is raised when running unit Fix the function or unit
F Failure
test. test.
No exception raised when running unit Everything is ne. Be
. Passed
test happy!
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 3: Information on failed tests
=================================== FAILURES ===================================
____________________________ test_for_missing_area _____________________________
def test_for_missing_area():
> assert row_to_list("\t293,410\n") is None
E AssertionError: assert ['', '293,410'] is None
E + where ['', '293,410'] = row_to_list('\t293,410\n')
test_row_to_list.py:7: AssertionError
The line raising the exception is marked by > .
> assert row_to_list("\t293,410\n") is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 3: Information on failed tests
=================================== FAILURES ===================================
____________________________ test_for_missing_area _____________________________
def test_for_missing_area():
> assert row_to_list("\t293,410\n") is None
E AssertionError: assert ['', '293,410'] is None
E + where ['', '293,410'] = row_to_list('\t293,410\n')
test_row_to_list.py:7: AssertionError
the exception is an AssertionError .
E AssertionError: assert ['', '293,410'] is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 3: Information about failed tests
=================================== FAILURES ===================================
____________________________ test_for_missing_area _____________________________
def test_for_missing_area():
> assert row_to_list("\t293,410\n") is None
E AssertionError: assert ['', '293,410'] is None
E + where ['', '293,410'] = row_to_list('\t293,410\n')
test_row_to_list.py:7: AssertionError
the line containing where displays return values.
E + where ['', '293,410'] = row_to_list('\t293,410\n')
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Section 4: Test result summary
====================== 1 failed, 2 passed in 0.03 seconds ======================
Result summary from all unit tests that ran: 1 failed, 2 passed tests.
Total time for running tests: 0.03 seconds.
Much faster than testing on the interpreter!
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Let's practice
reading test result
reports
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N
More benefits and
test types
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N
Dibya Chakravorty
Test Automation Engineer
Unit tests serve as documentation
Test module: test_row_to_list.py
import pytest
import row_to_list
def test_for_clean_row():
assert row_to_list("2,081\t314,942\n") == \
["2,081", "314,942"]
def test_for_missing_area():
assert row_to_list("\t293,410\n") is None
def test_for_missing_tab():
assert row_to_list("1,463238,765\n") is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Unit tests serve as documentation
Test module: test_row_to_list.py Created from the test module
Argument Return value
import pytest
import row_to_list ["2,081",
"2,081\t314,942\n"
"314,942"]
def test_for_clean_row():
assert row_to_list("2,081\t314,942\n") == \ "\t293,410\n" None
["2,081", "314,942"]
"1,463238,765\n" None
def test_for_missing_area():
assert row_to_list("\t293,410\n") is None
def test_for_missing_tab():
assert row_to_list("1,463238,765\n") is None
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Guess function's purpose by reading unit tests
!cat test_row_to_list.py
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Guess function's purpose by reading unit tests
!cat test_row_to_list.py
UNIT TESTING FOR DATA SCIENCE IN PYTHON
More trust
Users can run tests and verify that the package works.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
More trust
Users can run tests and verify that the package works.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
More trust
Users can run tests and verify that the package works.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Reduced downtime
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Reduced downtime
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Reduced downtime
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Reduced downtime
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Reduced downtime
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Reduced downtime
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Reduced downtime
UNIT TESTING FOR DATA SCIENCE IN PYTHON
All benefits
Time savings.
Improved documentation.
More trust.
Reduced downtime.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Tests we already wrote
row_to_list()
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Tests we already wrote
row_to_list()
convert_to_int()
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Data module
Raw data Clean data
row_to_list()
convert_to_int()
Data
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Feature module
Raw data Clean data Features
row_to_list()
convert_to_int()
Data Feature
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Models module
Raw data Clean data Features
row_to_list()
convert_to_int()
Data Feature
Models
Housing area Predictive Housing price
model
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Unit test
Raw data Clean data Features
row_to_list()
convert_to_int()
Data Feature
Models
Housing area Predictive Housing price
model
UNIT TESTING FOR DATA SCIENCE IN PYTHON
What is a unit?
Small, independent piece of code.
Python function or class.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Integration test
Raw data Clean data Features
row_to_list()
convert_to_int()
Data Feature
Models
Housing area Predictive Housing price
model
UNIT TESTING FOR DATA SCIENCE IN PYTHON
End to end test
Raw data Clean data Features
row_to_list()
convert_to_int()
Data Feature
Models
Housing area Predictive Housing price
model
UNIT TESTING FOR DATA SCIENCE IN PYTHON
This course focuses on unit tests
Writing unit tests is the best way to learn pytest.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
In Chapter 2...
Learn more pytest.
Write more advanced unit tests.
Work with functions in the features and models modules.
UNIT TESTING FOR DATA SCIENCE IN PYTHON
Let's practice these
concepts!
U N I T T E S T I N G F O R D ATA S C I E N C E I N P Y T H O N