0% found this document useful (0 votes)
84 views

Cheat Sheet: Python For Data Science

Python for Data Science Cheat Sheet provides a concise summary of key Python concepts for data science including: - Common Python data types like numbers, strings, lists, tuples, sets, and dictionaries. - Common operators for numeric, comparison, boolean, and string operations. - Key flow control statements like if/else, for loops, while loops, and loop control statements. - Common list, string, and dictionary operations. - Concepts in OOP like inheritance, polymorphism, and encapsulation. - Functions, lambda functions, and comments. - NumPy array basics like creating arrays of different dimensions initialized to zeros and copying/viewing arrays.

Uploaded by

EVELIN VERA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Cheat Sheet: Python For Data Science

Python for Data Science Cheat Sheet provides a concise summary of key Python concepts for data science including: - Common Python data types like numbers, strings, lists, tuples, sets, and dictionaries. - Common operators for numeric, comparison, boolean, and string operations. - Key flow control statements like if/else, for loops, while loops, and loop control statements. - Common list, string, and dictionary operations. - Concepts in OOP like inheritance, polymorphism, and encapsulation. - Functions, lambda functions, and comments. - NumPy array basics like creating arrays of different dimensions initialized to zeros and copying/viewing arrays.

Uploaded by

EVELIN VERA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

PYTHON FOR DATA List Operations

Operations Flow Control Method Generic Operations

SCIENCE
• if-else (Conditional Statement) • range(5): 0,1,2,3,4
• List=[]: Defines an empty list
• list[i]=a: Stores a at the ith position
if price>=700: • S=input(“Enter:”)

CHEAT SHEET
print(“Buy.”)
• list[i]: Retrieves the character at the ith position else: • Len(a): Gives item count in a
print(“Don’t buy.”)
• list[i:j]: Retrieves characters in the range i to j • For loop (Iterative Loop Statement) • min(a): Gives minimum value in a

Python Basics • list.append(val): Adds item at the end a=“New Text”


• max(a): Gives minimum value in a
count=0
• list.pop([i]): Removes and returns item at index i for i in a: • sum(a): Adds up items of an iterable and returns
if i==‘e’:
String Operations count=count+1 sum
Datatypes • String[i]: Retrieves the character at the ith position print(count)
• sorted(a): Sorted list copy of a
• While loop (Conditional Loop Statement)
• String[i:j]: Retrieves characters in the range i to j
a=0 • importing modules: import random
• Numbers: a=2(Integer), • String: a=“New String” Dictionary Operations i=1
while i <10:
b=2.0(Float), c=1+2j(Complex)
• Sets: a= {2,3,4,5} • dict={} : Defines an empty dictionary a=a*2
• List: a=[1,2,3,’Word’] i=i+1
File Operations
• dict[i]=a: stores “a” to the key “i”
• Dictionary: x= {‘a’: print(a)
• Tuple: a= (1,2,4) • dict[i]: Retrieves the item with the key “i” • Loop Control: Break, Pass and continue f= open(“File Name”,“opening mode”)
[1,2],‘b’: [4,6]}
• dict.key: Gives all the key items
(Opening modes: r: read, w: write, a: append, r+: both read
• dict.values: Gives all the values Functions
Operators and write)

def new_function():
Numeric Operator: (Say, a holds 5, b holds 10) OOPS print("Hello World") Try & Except Block
• a + b = 15 • b/a = 2 Inheritance: try:
• a – b = -5 new_function()
• b%a=0 A process of using details from a new class without
• a * b = 50 modifying existing class. [Statement body block]
• a**b =9765625
• 7.0//2.0 = 3.0, -11//3 = -4 Polymorphism: Lambda Function raise Exception()

Comparison Operator: A concept of using common operation in different ways for except Exception as e:
different data input. lambda a,b: a+b
• (a == b): not true • (a > b): not true [Error processing block]
Encapsulation:
• (a!= b): true • (a >= b): not true lambda a,b: a*b
Hiding the private details of a class from other objects.
• (a > b): not true • (a <= b) is true
Boolean Operator: Comments
Class/object
• a and b
Class: class Pen: # Single Line Comment
• a or b
pass """
• not a
Multi-line comment
Object: obj=Pen() FURTHERMORE:
"""
Python for Data Science Certification Training Course
PYTHON FOR DATA •
Initial Placeholders Operations Array Mathematics

SCIENCE
np.zeros(3) - 1D array of length 3 all zeros Copying: Arithmetic Operations:
• np.copy(array) - Copies array to new memory array. • Addition: np.add(a,b)
• np.zeros((2,3)) - 2D array of all zeros • view(dtype) - Creates view of array elements with type • Subtraction: np.subtract(a,b)

CHEAT SHEET dtype


Sorting:


Multiplication: np.multiply(a,b)
Division: np.divide(a,b)
• array.sort() - Sorts array

Python NumPy
np.zeros((3,2,4)) - 3D array of all zeros • Exponentiation: np.exp(a)
• array.sort(axis=0) - Sorts specific axis of array
• Square Root: np.sqrt(b)
• array.reshape(2,3) - Reshapes array to 2 rows, 3 columns
without changing data. Comparison:
Adding: • Element-wise: a==b
• np.append(array,values) - Appends values to end of array • Array-wise: np.array_equal(a,b)
What is NumPy?
• np.insert(array,4,values) - Inserts values into array before
index 4
A library consisting of multidimensional array objects and a Functions
Removing:
collection of routines for processing those arrays.
• np.delete(array,2,axis=0) - Deletes row on index 2 of array • Array-wise Sum: a.sum()
• np.full((3,4),2) - 3x4 array with all values 2
• np.delete(array,3,axis=1) - Deletes column on index 3 of • Array-wise min value: a.min()
• np.random.rand(3,5) - 3x5 array of random floats array
Why NumPy? between 0-1 • Array row max value: a.max(axis=0)
Combining:
• np.ones((3,4)) - 3x4 array with all values 1 • Mean: a.mean()
• np.concatenate((array1,array2),axis=0) - Adds array2 as
Mathematical and logical operations on arrays can be • np.eye(4) - 4x4 array of 0 with 1 on diagonal • Median: a.median()
rows to the end of array1
performed. Also provides high performance. • np.concatenate((array1,array2),axis=1) - Adds array2 as
Saving and Loading columns to end of array1 • Learn from industry experts and be sought-after by
Splitting:
On disk: the industry!
Import Convention • np.split(array,3) - Splits array into 3 sub-arrays
• np.save("new_array",x) • Learn any technology, show exemplary skills and have
• np.load("new_array.npy") Indexing:
import numpy as np – Import numpy an unmatched career!
Text/CSV files: • a[0]=5 - Assigns array element on index 0 the value 5
• The most trending technology courses to help you
• np.loadtxt('New_file.txt') - From a text file • a[2,3]=1 - Assigns array element on index [2][3] the value 1
• np.genfromtxt('New_file.csv',delimiter=',') - From a CSV Subseting: fast-track your career!
ND Array file
• a[2] - Returns the element of index 2 in array a. • Logical modules for both beginners and mid-level
• np.savetxt('New_file.txt',arr,delimiter=' ') - Writes to a
Space efficient multi-dimensional array, which provides • a[3,5] - Returns the 2D array element on index [3][5] learners
text file
vectorized arithmetic operations. • np.savetxt('New_file.csv',arr,delimiter=',') - Writes to a Slicing:
CSV file • a[0:4] - Returns the elements at indices 0,1,2,3
Properties:
• a[0:4,3] - Returns the elements on rows 0,1,2,3 at column 3
Creating Array • array.size - Returns number of elements in array
• a[:2] - Returns the elements at indices 0,1
• array.shape - Returns dimensions of array(rows,
• a=np.array([1,2,3]) columns) • a[:,1] - Returns the elements at index 1 on all rows
• b=np.array([(1,2,3,4),(7,8,9,10)],dtype=int) • array.dtype - Returns type of elements in array FURTHERMORE:
Python for Data Science Certification Training Course
PYTHON FOR DATA Importing Data Operations Oper
Arithmetic Operations:


ations -
G r - oReturns
df.groupby(column) u paBgroupby
y object for values
SCIENCE •


pd.read_csv(filename)

pd.read_table(filename)
View DataFrame Contents:
• df.head(n) - look at first n rows of the DataFrame. •
from one column
df.groupby([column1,column2]) - Returns a groupby

CHEAT SHEET •


pd.read_excel(filename)

pd.read_sql(query, connection_object)


df.tail(n) – look at last n rows of the DataFrame.
df.shape() - Gives the number of rows and columns. •
object values from multiple columns
df.groupby(column1)[column2].mean() - Returns the
• df.info() - Information of Index, Datatype and Memory. mean of the values in column2, grouped by the values in
• pd.read_json(json_string)
Python Pandas • df.describe() -Summary statistics for numerical column1
columns. • df.groupby(column1)[column2].median() - Returns the
Selection: mean of the values in column2, grouped by the values in

What is Pandas? Exporting Data • iloc column1


• df.iloc[0] - Select first row of data frame
• df.to_csv(filename)
It is a library that provides easy to use data structure and • df.iloc[1] - Select second row of data frame
data analysis tool for Python Programming Language. • df.to_excel(filename) • df.iloc[-1] - Select last row of data frame
Functions
• df.to_sql(table_name, connection_object) • df.iloc[:,0] - Select first column of data frame
Mean:
• df.to_json(filename) • df.iloc[:,1] - Select second column of data
Import Convention • df.mean() - mean of all columns
frame
Median
• loc
import pandas as pd – Import pasdas • df.median() - median of each column
• df.loc([0], [column labels])- Select single
Create Test/Fake value by row position & column labels
Standard Deviation
Data • df.loc['row1':'row3', 'column1':'column3’]-
• df.std() - standard deviation of each column
Pandas Data Max
• pd.DataFrame(np.random.rand(4,3)) - 3 columns and 4 Select and slicing on labels
Structure • df.max() - highest value in each column
rows of random floats Sort:
• df.sort_index() - Sorts by labels along an axis Min
• pd.Series(new_series) - Creates a series from an
• df.sort_values by='Column label’ - Sorts by the values • df.min() - lowest value in each column
• Series: iterable new_series
along an axis Count
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
• • df.count() - number of non-null values in each DataFrame
• Data Frame: df.sort_values(column1) - Sorts values by column1 in
ascending order column
data_mobile = {'Mobile': ['iPhone', 'Samsung',
• Describe
'Redmi'], 'Color': ['Red', 'White', 'Black'], 'Price': [High, Plotting df.sort_values(column2,ascending=False) - Sorts
values by column2 in descending order • df.describe() - Summary statistics for numerical columns
Medium,Low]}
• Histogram: df.plot.hist()
df = pd.DataFrame(data_mobile,
• Scatter Plot: df.plot.scatter(x='column1',y='column2')
columns=['Mobile', 'Color', 'Price'])
FURTHERMORE:
Python for Data Science Certification Training Course
P Y T H O N F O R D ATA Working On Model

SCIENCE Model Choosing Train-Test


Data
C H E AT S H E E T Supervised Learning Estimator:
• Linear Regression:
• Naive Bayes:
>>> from sklearn.naive_bayes import
GaussianNB
Unsupervised Learning Estimator:
• Principal Component Analysis (PCA):
>>> from sklearn.decomposition import
Supervised:
>>> from sklearn.linear_model import >>>new_ lr.fit(X, y)
LinearRegression >>> new_gnb = GaussianNB() PCA
>>> knn.fit(X_train, y_train)
• KNN:
Python Scikit-Learn >>> new_lr =
LinearRegression(normalize=True) >>> from sklearn import neighbors
>>>
>>> new_pca= PCA(n_components=0.95)
• K Means:
>>> from sklearn.cluster import KMeans
>>>new_svc.fit(X_train, y_train)
Unsupervised :
• Support Vector Machine: >>> k_means.fit(X_train)
>>> from sklearn.svm import SVC knn=neighbors.KNeighborsClassifier(n_ne >>> k_means = KMeans(n_clusters=5,
random_state=0) >>> pca_model_fit =
>>> new_svc = SVC(kernel='linear') ighbors=1)
new_pca.fit_transform(X_train)
Introduction
Scikit-learn:“sklearn" is a machine learning library for the Python programming language.
Simple and efficient tool for data mining, Data analysis and Machine Learning. Post-Processing
Importing Convention - import sklearn

Preprocessing Prediction Model Tuning


Supervised: Grid Search: Randomized Parameter Optimization:

>>> y_predict = >>> from sklearn.grid_search import GridSearchCV >>> from sklearn.grid_search import RandomizedSearchCV
Data Loading Train-Test new_svc.predict(np.random.random((3,5))) >>> params = {"n_neighbors": np.arange(1,3), "metric": >>> params = {"n_neighbors": range(1,5), "weights":
• Using NumPy: >>> y_predict = new_lr.predict(X_test) ["euclidean", "cityblock"]} ["uniform", "distance"]}
Data >>> y_predict = knn.predict_proba(X_test) >>> grid = GridSearchCV(estimator=knn, >>> rsearch = RandomizedSearchCV(estimator=knn,
>>>import numpy as np param_grid=params) param_distributions=params, cv=4, n_iter=8, random_state=5)
>>>a=np.array([(1,2,3,4),(7,8,9,10)],dtype=int) >>> grid.fit(X_train, y_train) >>> rsearch.fit(X_train, y_train)
>>>data = np.loadtxt('file_name.csv', >>>from sklearn.model_selection Unsupervised:
>>> y_pred = k_means.predict(X_test) >>> print(grid.best_score_) >>> print(rsearch.best_score_)
delimiter=',') import train_test_split
• Using Pandas:
>>> print(grid.best_estimator_.n_neighbors)

>>>import pandas as pd >>> X_train, X_test, y_train, y_test =


>>>df=pd.read_csv file_name.csv ,header=0) train_test_split(X,y,random_state=0) Evaluate Performance
Classification: Regression: Clustering: Cross-validation:
Data Preparation 1. Confusion Matrix: 1. Mean Absolute Error:
>>> from sklearn.metrics import mean_absolute_error
1. Homogeneity: >>> from
>>> from sklearn.metrics import >>> from sklearn.metrics import sklearn.cross_validation

• Standardization • Normalization
confusion_matrix homogeneity_score import cross_val_score
>>> print(confusion_matrix(y_test, >>> y_true = [3, -0.5, 2] >>> homogeneity_score(y_true, >>>
>>>from sklearn.preprocessing import >>>from sklearn.preprocessing import y_pred)) >>> mean_absolute_error(y_true, y_predict) y_predict) print(cross_val_score(knn,
StandardScaler Normalizer 2. Accuracy Score: 2. Mean Squared Error: 2. V-measure: X_train, y_train, cv=4))
>>>get_names = df.columns >>> knn.score(X_test, y_test) >>> from sklearn.metrics import mean_squared_error >>> from sklearn.metrics import >>>
>>>pd.read_csv("File_name.csv")
>>>scaler = >>> from sklearn.metrics import >>> mean_squared_error(y_test, y_predict) v_measure_score print(cross_val_score(new_
>>>x_array = np.array(df[ Column1 ]
preprocessing.StandardScaler() accuracy_score 3. R² Score : >>> metrics.v_measure_score(y_true, lr, X, y, cv=2))
#Normalize Column1
>>>scaled_df = scaler.fit_transform(df) >>> accuracy_score(y_test, y_pred) >>> from sklearn.metrics import r2_score y_predict)
>>>normalized_X =
>>>scaled_df = >>> r2_score(y_true, y_predict)
preprocessing.normalize([x_array])
pd.DataFrame(scaled_df,
columns=get_names)m
FURTHERMORE:
Python for Data Science Certification Training Course

You might also like