0% found this document useful (0 votes)
3 views124 pages

Machine Learning With Python 2021

This document is a comprehensive guide on machine learning (ML) using Python, covering various concepts such as regression, classification, and model evaluation. It introduces essential libraries and techniques for building predictive models, including supervised and unsupervised learning methods. The document also provides practical examples and labs for hands-on experience with different ML algorithms.

Uploaded by

hamidzmz.zmz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views124 pages

Machine Learning With Python 2021

This document is a comprehensive guide on machine learning (ML) using Python, covering various concepts such as regression, classification, and model evaluation. It introduces essential libraries and techniques for building predictive models, including supervised and unsupervised learning methods. The document also provides practical examples and labs for hands-on experience with different ML algorithms.

Uploaded by

hamidzmz.zmz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

‫ﯾﺎدﮔﯾری ﻣﺎﺷﯾن ﺑﺎ ﭘﺎﯾﺗون‬

‫ﻣﻘدﻣﮫای ﻋﻣﻠﯽ‬

‫‪ / ١۴٠٠‬ﺟﺎدی ﺑرای ﻣﮑﺗبﺧوﻧﮫ‬


‫ﯾﺎدﮔﯾری ﻣﺎﺷﯾن و ﭘﺎﯾﺗون‬

‫ﺑر اﺳﺎس دوره ‪ IBM‬در ‪ edx‬ﺑﺎ آﻣوزﺷﮕری‬ ‫•‬


‫ﺳﻌﯾد آﻗﺎﺑزرﮔﯽ‪ ،‬داﻧﺷﻣﻧد داده آی‪.‬ﺑﯽ‪.‬ام‪.‬‬
Welcome

• We are going to talk about ML and how it helps in different areas (loans,
segmentations, medicine, recommendations, …)

• We will use python libraries to create models, say building a model to estimate
the CO2 emission of the cars using scikit learn or will predict customer churn

• All codes are provided in notepad / jupyther

• After this course you will have new skills like regression, classification,
clustering, scikit learn, numpy, pandas AND new projects specially if you start
working on the datasets you can freely available on the internet.
Intro
• Say we want to understand if this is a fraud trx
or not, is this a malignant or benign cell, what
should I show next to this customer, …

• This can be done by ML by looking at some


characteristics of data.

• clean the data

• select the proper algorithm

• train the model

• predict new cases


Intro

Machine Learning is the subfield of computer


science that gives “computers the ability to learn
without being explicitly programmed”
- Arthur Samuel who coined the phrase in 1959
Intro
Examples:
- CO2 emission (Regression)
- Is this cancer? (Classification)
- Bank Loans (Clustering)
- Anomaly detection (credit card fraud)
- Netflix recommendations (recommenders)
Intro
AI (mimics human I)

- Computer vision
- Language Processing
- Creativity

ML (Subset of AI, more statistical)

- Classification
- Clustering
- Neural Network

Revolution in ML (Special field of ML)

- Deep Learning
Intro
Python

- You should know the basics


- It is easy
- Libraries like Numpy & Pandas + SciKit
Learn are used with a quick intro
- We are python dependent. You can do
with anything else.. but why? :D
Supervised vs.
Unsupervised
- Supervised: we “teach the model” by labeling & just then, the model can
predict the unknown or future instances
Supervised vs.
Unsupervised
- UnSupervised: Model works on its own to discover information.
Supervised vs.
Unsupervised
‫رﮔرﺳﯾون‬
Regression Intro

• regression is the process of


predicting a continuous value

• Independent (x, desc, ...) vs


Dependent (y, goal, prediction,
...) variables

• y is continuous
Regression Intro
Model
Regression Intro
Types

• Simple (only one independent)


• Linear
• Non-Linear
• Multiple (multiple independent)
• Linear
• Non-Linear
Regression Intro
Samples

• Household Price
• Customer Satisfaction
• Sales Forecast
• Employment Income
Regression Intro
Algorithms

• Ordinal
• Poisson
• Fast Forest quantile
• Linear, Polynominal, Lasso, Stepwise, Ridge
• Bayesian Linear
• Nerural Network
• Decision Forest
• Boosted decision tree
• K-nearest neighbors
Simple Linear
Regression

• Can we predict co2 emission


from one of the independents
(this is why we call it Simple)

• Lets try engine size...


Simple Linear
Regression
• Relationship is obvious
• There is a line, we assume a straight line
• we can predict an emission for say, a car with 2.4

• y hat is the dependent variable of the predicted


value.

• x1 is the independent variable.


• Theta 0 and theta 1 are the parameters of the line
• Theta 1 is known as the slope or gradient of the
fitting line and theta 0 is known as the intercept.

• Theta 0 and theta 1 are also called the


coefficients of the linear equation.
Simple Linear
Regression
MSE

• Residual error for each point is the


distance of the prediction from the
actual point. So Mean Square
Error (MSE should be minimized)

• Minimum MSE can be achieved


with two methods: Math or
Optimization
Simple Linear
Regression
MSE (Math)
Simple Linear
Regression
Pros

• Very Fast
• Easy to understand and interpret
• No need for parameter tuning
(say like in KNN)
Model Evaluation

• goal is to build a model to accurately


predict an unknown case.

• You need to evaluate to see how


much you can trust your
model/prediction

• Two main methods:


• Train and Test on Same data
• Train / Test split
• Regression Evaluation Metrics
Model Evaluation
Train and Test on Same data

• High "training accuracy"


• not always good
• overfitting the data
(say capture noise and
produce non
generalized model)

• Low "out of sample


accuracy"

• Important to have
Model Evaluation
Train/Test split

• Mutually exclusive split


• More accurate on
out-of-sample

• ensure that you train your


model with the testing set
afterwards, as you don't
want to lose potentially
valuable data.

• Dependent on which
datasets the data is trained
and tested
Model Evaluation
Evaluation Matrix

• used to explain the performance of a


model

• say comparing actual with predicted


• error of the model is the difference
between the data points and the
trend line generated by the algorithm

• There different metrics (next slide)


but the choice is based on the
model, data type, domain, ...
Model Evaluation
Errors

• mean absolute error (MAE)


• mean squared error (MSE)
• root mean squared error (RMSE); interpretable in the
same units as the response vector or y units

• Relative absolute error, also known as residual sum of


square (RAE)

• Relative squared error (RSE)


• R2; Popular metric for the accuracy of your model.
represents how close the data values are to the fitted
regression line. The higher the better
Lets see some
libraries!

• Notebook
• Numpy
• Matplotlib
• pandas
Lab: Simple Linear
Regression

• ML0101EN-Reg-Simple-Linear-R
egression-Co2.ipynb
Multiple Linear
Regression

• Simple / Multiple
• kind of same as simple
• usages:
• find the strength of each independent variable
• predict the impact of the change on one of the independent
variables
Multiple Linear
Regression
Formula
Multiple Linear
Regression
Finding parameters

• Again we can find the MSE


• the best model is the one with he minimized MSE
• The method is called Ordinary Least square
• linear algebra
• slow! for less than 10K samples
• Optimization Algorithms
• Gradient Descent (Starts with random, then
changes in multiple iterations)
Multiple Linear
Regression
Some notes

• Try to have theoretical defense when choosing


the independent variables. too many Xs might
result in over fitting

• Xs do not need to be continues. If they are not


try to assign values (like 1 and 2) to categories

• there needs to be a linear relationship. Test your


Xs with scatter plots or use your logic. If the
relationship displayed in your scatter plot is not
linear, then you need to use non-linear
regression.
Lab: Multiple Linear
Regression

• ML0101EN-Reg-Mulitple-Linear-
Regression-Co2.ipynb
Non-Linear
Regression
Non Linear
regression
Polynomial

• Deferent types
• if you have X^2, it is possible to
define a new X as X^2. So it
can be represented as a special
case of multiple linear
regression. This is called
polynomial
Non Linear
regression
Non Linear

• Models a non linear relationship


between Xs and Y

• Y is non linear function of


parameters Theta
Non Linear
regression
Notes

• How to say if nonlinear:


• plot and see. Y based on each X /
calculate coefficient

• use non-linear if you can not solve


by linear

• How to model data of its non-linear


• polynomial
• non-linear
• transform data!
Lab: Polynomial
Regression

• ML0101EN-Reg-Polynomial-Reg
ression-Co2.ipynb
Lab: Non Linear Linear
Regression

• ML0101EN-Reg-NoneLinearRegr
ession.ipynb
‫ﺑﺧش ﺳوم‬
Classification
Classification
Intro

• Understand Classification
• Understand different methods
such as KNN, Decision Trees,
Logistic Regression and SVM

• Apply on datasets
• Evaluate
Classification
Intro

• Supervised
• Categorizing unknown items in
classes

• Target is categorical with


discrete values (called classifier)

• Binary: 2 values vs Multi Class


Classification
Intro
Classification
Intro

• Loan (age, income, load size,


previous records, ...)

• Churn (age, address, income,


equip, data usage, calls, ...)

• Spam / Important email


• Handwriting/Speech recognition
• Biometric identification
Classification
Intro

• Decision Trees (ID3, C4.5, C5.0)


• Naive Bayes
• Linear Discriminant Analysis
• K-Nearest Neighbor
• Logistic Regression
• Neural Networks
• Support Vector Machines
Classification
KNN


Classification
KNN

• pick K
• calculate of the unknown points
distance from all cases

• predict based on the K nearest points

• How to find "distance" (Euclidean can


be one way)

• How to choose K (low -> noise & overfit;


high -> too general). Use the different Ks
with test set and see which K is good.
Classification
KNN

• KNN can be used to compute a


continuous target (regression)

• Say find 3 of the closest cases


and find the median
Classification
KNN Evaluation

• Evaluation explains the


performance of our model

• On test data we have y and ŷ


• There are different model
evaluation metrics: Jaccard
index, F1-score, and Log Loss.
Classification
KNN Evaluation / Jaccard Index

• Jaccard Index
Classification
KNN Evaluation / F1-Score
Classification
KNN Evaluation / LogLoss
Lab: KNN


Classification
Decision Trees / Intro


Classification
Decision Trees / Intro

Internal Node (test), branch (result


of test) & leaf (class)

1. Choose attribute from dataset


2. Calculate the significance of the
attribute in the splitting of data
3. split data based on value of the
best attribute
4. replete !
Classification Decision trees are built using recursive
partitioning to classify the data.
Cholesterol? Sex? ...
Decision Trees / Building
Classification
Decision Trees / Building
Classification
Decision Trees / Building
• The best? the one with the most information gain
Classification • Information gain is the information that can increase the
level of certainty after splitting.
Decision Trees / Building
• IG = Entropy before split - Weighted entropy after split.
Lab: Decision Trees


Classification
Logistic Regression/ Intro

• who is leaving and why


• Close to Regression but
here, Y is a categorical
(here binary) value

• All Xs should be
continues, or converted to
“continues”
Classification
Logistic Regression/ Intro

• Predicting a disease
• chance of mortality based on a situation

• halting a subscription

• purchase

• failure of a product

• ...
Classification
Logistic Regression/ Intro

• Target should be category (or better, binary)


• We need the probability of prediction

• we need a linear decision boundary (line or


even polynomial)

• We need to understand the impact of


features (Theta is closer to 0 or is high)
Classification
Logistic Regression vs Linear Regression
Classification
Logistic Regression vs Linear Regression

• On Previous data, try linear with


age vs income

• now repeat, trying with age vs


churn: funny and we should have
a step function as threshold
Classification
Logistic Regression vs Linear Regression /
Sigmoid
Classification
Logistic Regression Training
Classification
Logistic Regression Training

• Cost Function

• we have to minimize the Cost

• Can be done via derivative but its difficult


Classification
Logistic Regression Training

• We can define a new Cost function!

• here there are more approaches to minimize


the function; say Gradient Descent (iterative
technique)
Classification
Logistic Regression Training

• Gradient descent is
an iterative approach
to finding the
minimum of a
function. It uses the
derivative of a cost
function to change the
parameter values to
minimize the cost or
error.
Classification
Logistic Regression Training

• Gradient descent is
an iterative approach
to finding the
minimum of a
function. It uses the
derivative of a cost
function to change the
parameter values to
minimize the cost or
error.
Lab: Logistic Regression


Classification
Support Vector Machines

• supervised
• classifier based on
separator
• mapping data to
high-dimensional so a
hyperplane separator can
be drawn
• Lots of real world datas are
Linearly non separable ,
but what if we go to a
higher dimension? ;)
Classification
Support Vector Machines

• but… how to move


to n-dimention?

• there are different


kernel functions

• our libraries will do,


we will just compare

• How to find the


hyperplane?
Classification
Support Vector Machines

• to find the
hyperplane, we are
looking for largest
margins from
support vectors
• can also be solved
using gradient
descent
• when learned, we
can just check the
data and see if its
above the line or
below it and decide
Classification
Support Vector Machines

• Pros

• accurate in high dimensional spaces

• memory efficient

• Cons

• Prone to over-fitting if we have lots of features

• No probability estimation

• Not computationally efficient for large dataset (n>1000)


Classification
Support Vector Machines

• Image recognition

• Text Category Assignment

• spam

• category

• sentiment analysis

• Gene Expression Classification

• Outlier detection and clustering


Lab: SVM


Clustering
Intro
• Partitioning a customer base into groups of individuals based on
characteristics
• Allows a business to target different groups (high profit&low risk, …)
• we can cross-reference the groups with their purchases
Clustering
Intro

• finding “clusters” in datasets, unsupervised

• Cluster: a group of data points or objects in a


dataset that are similar to other objects in the
group, and dissimilar to datapoints in other
clusters.

• Different than classficiation:

• no need to be labeled

• Prediction is not the goal


Clustering
Intro / Samples

• Retail & Marketing: identify buying patterns / recommendation systems

• Banking: Fraud detection / identify clusters (loyal, churn, …)

• Insurance: Fraud detection / Risk

• Publication: auto-categorize / recommend

• Medicine: characterize behaviour

• Biology: group genes / cluster genetic markers (family ties)


Clustering
Intro / Where

• Exploratory data analysis

• summary generation

• outlier detection

• finding duplicates

• pre-processing step
Clustering
Intro / algorithms

• Partitioned-based (K-means, K-Median, Fuzzy


c-means, …): sphere like clusters / Medium or large
data

• Hierarchical (Agglomerative, Divisive): Trees of clusters


/ small size datasets

• Density-based (DBSCAN): arbitrary shaped / good for


special clusters or noisy data
Clustering
K Means
• Unsupervised, Divides data into K non-overlapping
subset/cluster without any cluster internal structure
Clustering
K Means

• We need to understand the similarity


and dissimilarity.

• Golad: minimize intra-cluster


distances ( Dis(x1, x2) ) and
maximize inter-cluster distances (
Dis(c1, c2) )

• It is always good to Normalize!

• different formulas: Euclidean,


Cosine, Average distance, … so first
understand the domain knowledge
Clustering
K Means
• decide the number of cluster (K)

• init K “centroids” by:

• random points from the dataset

• random points

• assign each customer to the closest centroid and create the


distance matrix

• update the centroid to the mean of its datapoints

• continue till the centroids stop moving

• Notes:

• iterative

• does not guarantee the best result. may catch a local


optimum; but its fast so we can run it many times!
Clustering
K Means / More Points

• Review the algorithm

• but how can we evaluate?

• External: compare with truth

• Internal: Average distance between datapoints within a cluster or the


distance between clusters

• Choosing K is difficult so we run with different Ks and check the accuracy


(say mean mean distance inside a cluster) BUT decreasing K will always
reduces this. So we do the elbow method
Clustering
K Means / More Points

• Partition based

• unsupervised

• medium and large datasets (relatively efficient)

• sphere like clusters

• K should be known / guessed


Clustering
K Means / LAB


Clustering
Hierarchical / Intro

• 48000 genetic markers makes this chart


from similarity

• Hierarchy of clusters where each node is a


cluster consisting of the clusters of its
daughter nodes
Clustering
Hierarchical / Intro

• Divisive is top down, so you start with


all observations in a large cluster and
break it down into smaller pieces.

• Agglomerative is the opposite of


divisive. So it is bottom up, where
each observation starts in its own
cluster and pairs of clusters are
merged together as they move up the
hierarchy.
Clustering
Hierarchical / Intro

• Finding Similarity of
city locations in
Canada

• Dendrogram Y is
the similarity

• We can cut Y
somewhere to have
N number of
clusters (say 3)
Clustering
Hierarchical / Intro

• Finding Similarity of
city locations in
Canada

• Dendrogram Y is
the similarity

• We can cut Y
somewhere to have
N number of
clusters (say 3)
Clustering
Hierarchical / Intro

• Finding Similarity of
city locations in
Canada

• Dendrogram Y is
the similarity

• We can cut Y
somewhere to have
N number of
clusters (say 3)
Clustering
Hierarchical / Intro

• Finding Similarity of
city locations in
Canada

• Dendrogram Y is
the similarity

• We can cut Y
somewhere to have
N number of
clusters (say 3)
Clustering
Hierarchical / Intro

• Finding Similarity of
city locations in
Canada

• Dendrogram Y is
the similarity

• We can cut Y
somewhere to have
N number of
clusters (say 3)
Clustering
Hierarchical / Intro

• Finding Similarity of
city locations in
Canada

• Dendrogram Y is
the similarity

• We can cut Y
somewhere to have
N number of
clusters (say 3)
Clustering
Hierarchical / Intro

• Finding Similarity of
city locations in
Canada

• Dendrogram

• We can cut Y
somewhere to have
N number of
clusters (say 3)
Clustering
Hierarchical / More


Clustering
Hierarchical / More

• We should be able to calculate distances between data


points (again say age, BMI, BP)

• but also need the distance “between” clusters:

• Single Linkage Clustering: Minimum distance

• Complete Linkage Clustering: Maximum distance

• Average Linkage Clustering: average of distances from


each point to all other points

• Centroid Linkage Clustering: centroids of clusters


Clustering
Hierarchical / More

• Pros
• Works with unknown N
• Easy to implement
• Useful dentograms; good for understanding
• Cons
• Impossible to undo via algorithm
• long runtimes
• sometimes difficult to identify the number of clusters
(specially for large datasets)
Clustering
Hierarchical / More

• Hierarchical vs K-Means

• Can be slower

• Does not require the number of clusters to run

• Gives more than one partitioning

• Always generate the same clusters


Clustering
Hierarchical / Lab


Clustering
DBSCAN

• K-Means will assign


every datapoint to a
cluster; no outlier

• Density Based clusters


will find dense areas
and will separate
outliers. Good for
anomaly detection

• Density: number of
points within a radius
Clustering
DBSCAN

• DBSCAN algorithms is
effective for tasks like
class identification

• effective even in
presence of noise

• Grouping same weather


on dense areas
Clustering
DBSCAN


Clustering
DBSCAN
• point types:
• core: within our
neighborhood of the point
there are at least M
points.
• Border:
• less than M in
neighborhood
• reachable from a core
point
• outlier is not core neigher
a border
Clustering
DBSCAN

• Arbitrarily shaped clusters

• Robust to outliers

• Does not require specification of the number of


clusters
Clustering
DBSCAN / Lab
Recommenders
Intro
• Peoples tastes follow patterns (say books)

• Recommender systems capture the pattern of people’s behaviour and use it to


predict what else they might want or like

• Many applications. Netflix, Amazon, facebook, twitter, News, Digikala,


SnapFood

• Broader Exposure -> More Usage


Recommenders
Intro / types
Recommenders
Intro / types
• Memory Based

• Uses the entire user-item dataset to generate a recommendation

• Uses statistical techniques to approximate users or items (Pearson


Correlation, Cosine Similarity, Euclidean Distance, …)

• Model Based

• Develops a model of users in an attempt to learn their preferences

• Models can be created using ML techniques like regression, clustering,


classification, ...
Recommenders
Content Based
• Works based on users profiles

• Works with user ratings (like, view, …) and then finds the similarity between
content of those contents (tags, category, genres, …)


Recommenders
Content Based
• Works based on users profiles

• Works with user ratings (like, view, …) and then finds the similarity between
content of those contents (tags, category, genres, …)


Recommenders
Content Based
• Works based on users profiles

• Works with user ratings (like, view, …) and then finds the similarity between
content of those contents (tags, category, genres, …)


Recommenders
Content Based
• Works based on users profiles

• Works with user ratings (like, view, …) and then finds the similarity between
content of those contents (tags, category, genres, …)


Recommenders
Content Based
• LAB
Recommenders
Collaborative Filtering
• User Based

• Based on the user’s similarity or neighborhoods

• Finds similarity between users (say likings history)

• Item-Based

• Based on items similarity


Recommenders
Collaborative Filtering / User Based


Recommenders
Collaborative Filtering / User Based

Recommenders
Collaborative Filtering / Item Based

Recommenders
Collaborative Filtering / Challenges

• Data Sparsity

• Large users but they are rating only a limited number of items

• Cold Start

• What if a new user joins the system? What if a new Item is added?

• Scalability

• Drops performance when items/users are increased. Matrix becomes larger


and larger

• There are solutions.. like using hybrid solutions


Recommenders
Collaborative Filtering
• LAB

You might also like