0% found this document useful (0 votes)
4 views

Python ML Algorithm

The document provides information about various machine learning algorithms and concepts: - It discusses supervised algorithms like logistic regression, KNN, naive Bayes, random forest, and support vector machines (SVM). Unsupervised algorithms like K-means and mean shift clustering are also covered. - Example datasets like iris flowers, breast cancer tumors, and social network ads are used to demonstrate how different algorithms can be applied. - Key concepts around kernels, centroids, and iterative processes in algorithms are introduced.

Uploaded by

janhavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Python ML Algorithm

The document provides information about various machine learning algorithms and concepts: - It discusses supervised algorithms like logistic regression, KNN, naive Bayes, random forest, and support vector machines (SVM). Unsupervised algorithms like K-means and mean shift clustering are also covered. - Example datasets like iris flowers, breast cancer tumors, and social network ads are used to demonstrate how different algorithms can be applied. - Key concepts around kernels, centroids, and iterative processes in algorithms are introduced.

Uploaded by

janhavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

About Python Software's

 Processor  32 or 64 bit
 Windows OS  7, 8 or 10
 Python  2.7, 3.7
 Python  Idle
 OpenCV  opencv2.4, opencv contrib 3.4
 Anaconda3  4.4
 Data Base  MySQL-Front
Basic Points About Python
 Numbers are mainly of two types - integers and
floats.
 There is no separate long type. The int type can be
an integer of any size.
 string like 'This is a string' or "It's a string!" .
 You can specify strings using single quotes such as
'Quote me on this' .
 Strings in double quotes work exactly the same way
as strings in single quotes. An example is "What's
your name?" .
Basic Points About Python
 There is no separate char data type in Python.
 Python is strongly object-oriented in the sense that
everything is an object including numbers, strings
and functions.
 The Python for loop is radically different from the
C/C++ for loop.
 There is no switch statement in Python. You can use
an if..elif..else statement
 Variables are used by just assigning them a value.
No declaration or data type definition is
needed/used.
Types of Machine Learning Algorithms

Supervised machine learning algorithms


Unsupervised machine learning algorithms
Reinforcement machine learning algorithms
Supervised – Logistic Regression
logistic regression model is one of the members of
supervised classification algorithm.
Logistic regression measures the relationship between
dependent variables (y) and independent variables (x) by
estimating the probabilities using a logistic function. (Sigmoid
curve)
Linear & Logistic Regression
Ex – Marriage – Linear (which year marriage will happen),
Logistic (marry or not prediction)
Independent variable – age, sex, family, job, salary, friends
etc.
Supervised – Logistic Regression
Dataset – Social_Network_Ads
dataset, categories such as id, gender, age etc. Now based on
these categories, we are going to train our machine and
predict the no. of purchases. So here, we have independent
variables as ‘age’, ‘expected salary’ and dependent
variable as ‘purchased’. Logistic regression algorithm to find
out the number of purchase using the existing data.
Logistic Regression produces results in a binary format.
The usual outputs of logistic regression are –
Yes and No
True and False
High and Low
Pass and Fail
Supervised – Logistic Regression
Dataset – Social_Network_Ads
Supervised – Logistic Regression
Dataset – Social_Network_Ads
KNN – K Nearest Neighbor
KNN – K Nearest Neighbor
Supervised - Gaussian naïve bayes
Naïve Bayes Classifier  There are three types of Naïve
Bayes models named Gaussian, Multinomial and Bernoulli
under scikit learn package.
Naïve Bayes is a classification technique used to build
classifier using the Bayes theorem.
Naive Bayes can be extended to real valued attributes, most
commonly by assuming a Gaussian distribution. This
extension of naive Bayes is called Gaussian Naive Bayes.
Gaussian (or Normal distribution) is the easiest to work with
because you only need to estimate the mean and the standard
deviation from your training data.
Supervised - Gaussian naïve bayes
Dataset – Breast Cancer Tumors
dataset 
Breast Cancer Wisconsin Diagnostic Database.
The dataset includes various information about
breast cancer tumors, as well as classification labels
of malignant or benign. The dataset has 569 instances,
or data, on 569 tumors and includes information on 30
attributes, or features, such as the radius of the tumor,
texture, smoothness, and area. We can import this
dataset from sklearn package.
Naïve Bayes
Naïve Bayes
Supervised – Random Forest Classifier
Dataset – Breast Cancer Tumors
It can be used both for classification and regression. It is
also the most flexible and easy to use algorithm.
A forest is comprised of trees. It is said that the more trees it
has, the more robust a forest is. Random forests creates
decision trees on randomly selected data samples, gets
prediction from each tree and selects the best solution by
means of voting.
It also provides a pretty good indicator of the feature
importance.
Random forest algorithm is an ensemble
classification algorithm. Ensemble classifier means a group of
classifiers.
Supervised – Random Forest Classifier
Ex - Want to go on a trip and you would like to travel to a
place which you will enjoy. To find a place  search online,
read reviews on travel blogs and portals, or you can also ask
your friends.
Decided to ask friends, and talk with them about their past
travel experience to various places. Will get some
recommendations from every friend.
Now you have to make a list of those recommended places.
Then, you ask them to vote (or select one best place for the
trip) from the list of recommended places you made. The place
with the highest number of votes will be your final choice for
the trip.
In the above decision process, there are two parts. First, asking your
friends about their individual travel experience and getting one
recommendation out of multiple places they have visited. This part is
like using the decision tree algorithm. Here, each friend makes a
selection of the places he or she has visited so far.
The second part, after collecting all the recommendations, is the voting
procedure for selecting the best place in the list of recommendations.
This whole process of getting recommendations from friends and voting
on them to find the best place is known as the random forests algorithm.
The collection of decision tree classifiers is also known as the forest.
The individual decision trees are generated using an attribute selection
indicator such as information gain, gain ratio, and Gini index for each
attribute. Each tree depends on an independent random sample. In a
classification problem, each tree votes and the most popular class is
chosen as the final result. In the case of regression, the average of all the
tree outputs is considered as the final result.
Supervised – Random Forest Classifier
Supervised – Support Vector Machines (SVM)
SVM  supervised machine learning algorithm that can be
used for both regression and classification.
The main concept of SVM is to plot each data item as a
point in n-dimensional space with the value of each feature
being the value of a particular coordinate.
simple graphical representation -
Supervised – Support Vector Machines (SVM)
Dataset – Iris Flower
dataset  iris flower
iris dataset which contains 3 classes of 50 instances each,
where each class refers to a type of iris plant. Each instance
has the four features namely sepal length, sepal width, petal
length and petal width. The SVM classifier to predict the class
of the iris plant based on 4 features.
Supervised – Support Vector Machines (SVM)
Dataset – Iris Flower
The three classes in the Iris dataset:
Iris-setosa (n=50)
 Iris-versicolor (n=50)
Iris-virginica (n=50)
The four features of the Iris dataset:
sepal length in cm
 sepal width in cm
 petal length in cm
 petal width in cm
Supervised – SVM
Kernel  It is a technique used by SVM. Basically these are
the functions which take low-dimensional input space and
transform it to a higher dimensional space.
Kernal function  linear, polynomial, gaussian (rbf) and
sigmoid. In this example, we will use the linear kernel.
SVM and Kernel SVM with Python's Scikit-Learn
Unsupervised (Clustering) – K-Means algorithm
Clustering  is a task of dividing the set of observations
into subsets, called clusters, in such a way that observations in
the same cluster are similar in one sense and they are
dissimilar to the observations in other clusters. In simple
words, we can say that the main goal of clustering is to group
the data on the basis of similarity and dissimilarity.
K-means algorithm is one of the well-known algorithms for
clustering the data. We need to assume that the numbers of
clusters are already known. The steps for this algorithm −
Step 1 − specify the desired number of K subgroups.
Step 2 − Fix the number of clusters and randomly assign
each data point to a cluster. Or in other words we need to
classify our data based on the number of clusters.
Unsupervised (Clustering) – K-Means algorithm
K-Means  flat clustering, iterative clustering algorithm,
centroid-based clustering.
As this is an iterative algorithm, we need to update the
locations of K centroids with every iteration until we find the
global optima or in other words the centroids reach at their
optimal locations.
In centroid-based clustering, clusters are represented by a
central vector or a centroid. This centroid might not
necessarily be a member of the dataset. Centroid-based
clustering is an iterative algorithm in which the notion of
similarity is derived by how close a data point is to the
centroid of the cluster.
https://mubaris.com/posts/kmeans-clustering/
Unsupervised (Clustering) – K-Means algorithm
Unsupervised (Clustering) – Mean Shift algorithm
It is another popular and powerful clustering algorithm used in
unsupervised learning.
It does not make any assumptions hence it is a non-parametric
algorithm. It is also called hierarchical clustering or mean shift cluster
analysis.
Basic steps of this algorithm −
First of all, we need to start with the data points assigned to a
cluster of their own.
Now, it computes the centroids and update the location of new
centroids.
By repeating this process, we move closer the peak of cluster i.e.
towards the region of higher density.
This algorithm stops at the stage where centroids do not move
anymore.

You might also like