Skip to content

gachet/python-for-data-analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python for Data Analysis

This course focuses on analyzing data of all types using the Python programming language. No programming experience is necessary.

We start with an introduction (or refresher) to the command line. We then cover the fundamentals of Python and its data types, followed by the data analysis packages Numpy and Pandas, and plotting packages Matplotlib and Seaborn, plus statistics and interactive visualization.

Jupyter (IPython) notebooks are used throughout. Conda is used for package management and virtual environments. All notebooks are in Python 3 unless otherwise noted.

Instructor

Luke Thompson, Ph.D.
Lecturer, Scripps Institution of Oceanography
Research Associate, National Oceanographic and Atmospheric Administration

YouTube Channel

All lectures are available from the 2016 course on my YouTube channel Doc Thompson Data Science.

Lessons

The lessons below match the Jupyter notebooks in the ipynb directory. Any data files required by those notebooks are provided in the data directory.

Lesson Title Readings Topics Assignment
0 Introductions and Syllabus Obtain Learn Python The Hard Way (Shaw), Python for Data Analysis (McKinney), and Learning Python (Lutz) Introductions and overview of course
1 Command Line and Bash Shaw: The Hard Way Is Easier, Exercise 0, Appendix A: Command Line Crash Course A full introduction to using the command line, the bash shell, and text editors Assignment for Lesson 1
2 Conda, IPython, and Jupyter Notebooks Install: Miniconda 3 Conda tutorial including conda environments, python packages, and PIP, Python and IPython in the command line, Jupyter notebook tutorial and Python crash course Assignment for Lesson 2
3 Python Basics, Strings, Printing Shaw: Exercises 1-10; Lutz: Ch 1-7 Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython Homework 1
4 Taking Input, Reading and Writing Files, Functions Shaw: Exercises 11-26; Lutz: Ch 9, 14-17 Taking input, reading files, writing files, functions Homework 2
5 Logic, Loops, Lists, Dictionaries, and Tuples Shaw: Exercises 27-39; Lutz: Ch 8-13 Logic and loops, lists and list comprehension, tuples, dictionaries, other types Homework 3
6 Python and IPython Review McKinney: Appendix: Python Language Essentials, Ch 3 Review of Python commands, IPython review -- enhanced interactive Python shells with support for data visualization, distributed and parallel computation and a browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media
7 Regular Expressions Grep tutorials: Drew's Grep Tutorial, Linux Grep Tutorial; Python Regular Expressions Tutorial Regular expression syntax, Command-line tools: grep, sed, awk, perl -e, Python examples: built-in and re module
8 Numpy, Pandas and Matplotlib Crashcourse MatPlotLib Cheatsheet Numpy overview, Pandas overview, Matplotlib overview Homework 4
9 Pandas Basics McKinney: Ch 1-2, 4 (Introduction to Scientific Computing with NumPy and Pandas) Series, DataFrame, index, columns, dtypes, info, describe, read_csv, head, tail, loc, iloc, ix, to_datetime
10 Pandas Advanced McKinney: Ch 5-7 (Data Analysis with Pandas); Pandas Documentation: Indexing and Selecting Data concat, append, merge, join, set_option, stack, unstack, transpose, dot-notation, values, apply, lambda, sort_index, sort_values, to_csv, read_csv, isnull
11 Plotting with Matplotlib McKinney: Ch 8; J.R. Johansson: Matplotlib 2D and 3D plotting in Python Matplotlib tutorial from J.R. Johansson
12 Plotting with Seaborn Seaborn Tutorial Seaborn tutorial from Michael Waskom Homework 5
13 Pandas Time Series McKinney: Ch 10, Pandas Documentation: Time Series and Date Time series data in Pandas
14 Pandas Group Operations McKinney: Ch 9 groupby, melt, pivot, inplace=True, reindex
15 Statistics Packages (no readings) Statitics capabilities of Pandas, Numpy, Scipy, and Scikit-bio
16 Interactive Visualization with Bokeh Bokeh User Guide Quickstart guide to making interactive HTML and notebook plots with Bokeh

About

An introduction to Python for data analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%