This course focuses on analyzing data of all types using the Python programming language. No programming experience is necessary.
We start with an introduction (or refresher) to the command line. We then cover the fundamentals of Python and its data types, followed by the data analysis packages Numpy and Pandas, and plotting packages Matplotlib and Seaborn, plus statistics and interactive visualization.
Jupyter (IPython) notebooks are used throughout. Conda is used for package management and virtual environments. All notebooks are in Python 3 unless otherwise noted.
Luke Thompson, Ph.D.
Lecturer, Scripps Institution of Oceanography
Research Associate, National Oceanographic and Atmospheric Administration
All lectures are available from the 2016 course on my YouTube channel Doc Thompson Data Science.
The lessons below match the Jupyter notebooks in the ipynb
directory. Any data files required by those notebooks are provided in the data
directory.
Lesson | Title | Readings | Topics | Assignment |
---|---|---|---|---|
0 | Introductions and Syllabus | Obtain Learn Python The Hard Way (Shaw), Python for Data Analysis (McKinney), and Learning Python (Lutz) | Introductions and overview of course | |
1 | Command Line and Bash | Shaw: The Hard Way Is Easier, Exercise 0, Appendix A: Command Line Crash Course | A full introduction to using the command line, the bash shell, and text editors | Assignment for Lesson 1 |
2 | Conda, IPython, and Jupyter Notebooks | Install: Miniconda 3 | Conda tutorial including conda environments, python packages, and PIP, Python and IPython in the command line, Jupyter notebook tutorial and Python crash course | Assignment for Lesson 2 |
3 | Python Basics, Strings, Printing | Shaw: Exercises 1-10; Lutz: Ch 1-7 | Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython | Homework 1 |
4 | Taking Input, Reading and Writing Files, Functions | Shaw: Exercises 11-26; Lutz: Ch 9, 14-17 | Taking input, reading files, writing files, functions | Homework 2 |
5 | Logic, Loops, Lists, Dictionaries, and Tuples | Shaw: Exercises 27-39; Lutz: Ch 8-13 | Logic and loops, lists and list comprehension, tuples, dictionaries, other types | Homework 3 |
6 | Python and IPython Review | McKinney: Appendix: Python Language Essentials, Ch 3 | Review of Python commands, IPython review -- enhanced interactive Python shells with support for data visualization, distributed and parallel computation and a browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media | |
7 | Regular Expressions | Grep tutorials: Drew's Grep Tutorial, Linux Grep Tutorial; Python Regular Expressions Tutorial | Regular expression syntax, Command-line tools: grep , sed , awk , perl -e , Python examples: built-in and re module |
|
8 | Numpy, Pandas and Matplotlib Crashcourse | (no readings) | Numpy overview, Pandas overview, Matplotlib overview | Homework 4 |
9 | Pandas Basics | McKinney: Ch 1-2, 4 (Introduction to Scientific Computing with NumPy and Pandas) | Series , DataFrame , index , columns , dtypes , info , describe , read_csv , head , tail , loc , iloc , ix , to_datetime |
|
10 | Pandas Advanced | McKinney: Ch 5-7 (Data Analysis with Pandas); Pandas Documentation: Indexing and Selecting Data | concat , append , merge , join , set_option , stack , unstack , transpose , dot-notation, values , apply , lambda , sort_index , sort_values , to_csv , read_csv , isnull |
|
11 | Plotting with Matplotlib | McKinney: Ch 8; J.R. Johansson: Matplotlib 2D and 3D plotting in Python | Matplotlib tutorial from J.R. Johansson | |
12 | Plotting with Seaborn | Seaborn Tutorial | Seaborn tutorial from Michael Waskom | Homework 5 |
13 | Pandas Time Series | McKinney: Ch 10, Pandas Documentation: Time Series and Date | Time series data in Pandas | |
14 | Pandas Group Operations | McKinney: Ch 9 | groupby , melt , pivot , inplace=True , reindex |
|
15 | Statistics Packages | (no readings) | Statitics capabilities of Pandas, Numpy, Scipy, and Scikit-bio | |
16 | Interactive Visualization with Bokeh | Bokeh User Guide | Quickstart guide to making interactive HTML and notebook plots with Bokeh |