Excel Spreadsheets Pickled Files
Python For Data Science Cheat Sheet
>>> file = ‘urbanpop.xlsx’ >>> import pickle
>>> data = pd.ExcelFile(file) >>> with open(‘pickled_fruit.pkl’, ‘rb’) as file:
Importing Data
>>> df_sheet2 = data.parse(‘1960-1966’, pickled_data = pickle.load(file)
skiprows=[0],
names=[‘Country’,
‘AAM: War(2002)’])
>>> df_sheet1 = data.parse(0,
parse_cols=[0],
skiprows=[0], HDF5 Files
names=[‘Country’])
Learn Python for Data Science Interactively To access the sheet names, use the sheet_names attribute:
>>> import h5py
>>> filename = ‘H-H1_LOSC_4_v1-815411200-4096.hdf5’
>>> data = h5py.File(filename, ‘r’)
>>> data.sheet_names
Importing Data in Python
Most of the time, you’ll use either NumPy or pandas to import
your data:
>>> import numpy as np
SAS Files Matlab Files
>>> import pandas as pd
>>> from sas7bdat import SAS7BDAT
>>> with SAS7BDAT(‘urbanpop.sas7bdat’) as file: >>> import scipy.io
df_sas = file.to_data_frame() >>> filename = ‘workspace.mat’
>>> mat = scipy.io.loadmat(filename)
Help
>>> np.info(np.ndarray.dtype)
>>> help(pd.read_csv)
Stata Files
>>> data = pd.read_stata(‘urbanpop.dta’) Exploring Dictionaries
Text Files Accessing Elements with Functions
Relational Databases >>> print(mat.keys())
>>> for key in data.keys():
Print dictionary keys
Print dictionary keys
Plain Text Files print(key)
>>> from sqlalchemy import create_engine meta
>>> filename = ‘huck_finn.txt’ >>> engine = create_engine(‘sqlite://Northwind.sqlite’) quality
>>> file = open(filename, mode=’r’) Open the file for reading strain
>>> text = file.read() Read a file’s contents >>> pickled_data.values() Return dictionary values
>>> print(file.closed) Check whether file is closed Use the table_names() method to fetch a list of table names: >>> print(mat.items()) Returns items in list format
>>> file.close() Close file of (key, value)tuple pairs
>>> print(text) >>> table_names = engine.table_names()
Using the context manager with Accessing Data Items with Keys
Querying Relational Databases
>>> with open(‘huck_finn.txt’, ‘r’) as file:
>>> con = engine.connect() >>> for key in data [‘meta’].keys() Explore the HDF5 structure
print(file.readline()) Read a single line
>>> rs = con.execute(“SELECT * FROM Orders”) print(key)
print(file.readline())
>>> df = pd.DataFrame(rs.fetchall()) Description
print(file.readline())
>>> df.columns = rs.keys() DescriptionURL
>>> con.close() Detector
Duration
GPSstart
Using the context manager with Observatory
Table Data: Flat Files Type
>>> with engine.connect() as con:
UTCstart
rs = con.execute(“SELECT OrderID FROM Orders”)
Importing Flat Files with numpy >>> print(data[‘meta’][‘Description’].value) Retrieve the value for a key
df = pd.DataFrame(rs.fetchmany(size=5))
Files with one data type df.columns = rs.keys()
>>> filename = ‘mnist.txt’
>>> data = np.loadtxt(filename,
Querying relational databases with pandas
Navigating Your FileSystem
delimiter=’,’, String used to separate values
skiprows=2, Skip the first 2 lines
usecols=[0,2], Read the 1st and 3rd column >>> df = pd.read_sql_query(“SELECT * FROM Orders”, engine)
dtype=str) The type of the resulting array Magic Commands
Files with mixed data types !ls List directory contents of files and directories
%cd .. Change current working directory
>>> filename = ‘titanic.csv’ Exploring Your Data %pwd Return the current working directory path
>>> data = np.genfromtxt(filename,
delimiter=’,’,
names=True, Look for column header NumPy Arrays
dtype=None) os Library
>>> data_array.dtype Data type of array elements
>>> data_array.shape Array dimensions >>> import os
>>> data_array = np.recfromcsv(filename) >>> len(data_array) Length of array >>> path = “/usr/tmp”
>>> wd = os.getcwd() Store the name of current
directory in a string
Importing Flat Files with numpy pandas DataFrames >>> os.listdir(wd) Output contents of the di
rectory in a list
>>> filename = ‘winequality-red.csv’ >>> df.head() Return first DataFrame rows >>> os.chdir(path) Change current working
>>> data = pd.read_csv(filename, >>> df.tail() Return last DataFrame rows directory
nrows=5, Number of rows of file to read >>> df.index Describe index >>> os.rename(“test1.txt”, Rename a file
header=None, Row number to use as col names >>> df.columns Describe DataFrame columns “test2.txt”)
sep=’\t’, Delimiter to use >>> df.info() Info on DataFrame >>> os.remove(“test1.txt”) Delete an existing file
comment=’#’, Character to split comments >>> data_array = data.values Convert a DataFrame to an a >>> os.mkdir(“newdir”) Create a new directory
n a_values=[“”]) String to recognize as NA/NaN NumPy array