Part 2
Part 2
Part 2
Logistic regression:-
Logistic regression is a supervised machine learning algorithm used for classification tasks where the goal is to
predict the probability that an instance belongs to a given class or not. Logistic regression is a statistical
algorithm which analyze the relationship between two data factors. The article explores the fundamentals of
logistic regression, it’s types and implementations.
Logistic regression is used for binary classification where we use sigmoid function, that takes input as
independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an input is greater
than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class 0. It’s referred to as regression
because it is the extension of linear regression but is mainly used for classification problems.
Key Points:
Logistic regression predicts the output of a categorical dependent variable. Therefore, the outcome must
be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between 0 and 1.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic function, which
predicts two maximum values (0 or 1).
The sigmoid function is a mathematical function used to map the predicted values to probabilities.
It maps any real value into another value within a range of 0 and 1. The value of the logistic regression
must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the “S” form.
The S-form curve is called the Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines the probability of either 0
or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to
0.
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the
dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as “low”, “Medium”, or “High”.
Perceptron:-
A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons to learn and processes
elements in the training set one at a time.
Basic Components of Perceptron
Perceptron is a type of artificial neural network, which is a fundamental concept in machine learning. The basic
components of a perceptron are:
1. Input Layer: The input layer consists of one or more input neurons, which receive input signals from the external
world or from other layers of the neural network.
2. Weights: Each input neuron is associated with a weight, which represents the strength of the connection between the
input neuron and the output neuron.
3. Bias: A bias term is added to the input layer to provide the perceptron with additional flexibility in modeling complex
patterns in the input data.
4. Activation Function: The activation function determines the output of the perceptron based on the weighted sum of
the inputs and the bias term. Common activation functions used in perceptrons include the step function, sigmoid
function, and ReLU function.
5. Output: The output of the perceptron is a single binary value, either 0 or 1, which indicates the class or category to
which the input data belongs.
6. Training Algorithm: The perceptron is typically trained using a supervised learning algorithm such as the perceptron
learning algorithm or backpropagation. During training, the weights and biases of the perceptron are adjusted to
minimize the error between the predicted output and the true output for a given set of training examples.
7. Overall, the perceptron is a simple yet powerful algorithm that can be used to perform binary classification tasks and
has paved the way for more complex neural networks used in deep learning today.
Types of Perceptron:
1. Single layer: Single layer perceptron can learn only linearly separable patterns.
2. Multilayer: Multilayer perceptrons can learn about two or more layers having a greater processing power.
The Perceptron algorithm learns the weights for the input signals in order to draw a linear decision boundary.
Note: Supervised Learning is a type of Machine Learning used to learn models from labeled training data. It
enables output prediction for future or unseen data. Let us focus on the Perceptron Learning Rule in the next
section.
Exponential family:-
In supervised learning, many commonly used models belong to the exponential family of distributions. These
models are powerful because they offer a unified framework for a wide range of probability distributions and
have convenient mathematical properties that simplify learning and inference. Here’s an overview of the
exponential family in the context of supervised learning:
Definition: The exponential family is a class of probability distributions defined by the following form:
Properties
Sufficient Statistics: The function T(y) captures all the necessary information from the data for the purpose of
parameter estimation.
Conjugate Priors: The exponential family often has conjugate prior distributions, making Bayesian inference
more tractable.
Log-Concavity: The log-partition function A(θ) is convex, facilitating optimization.
Generalized Linear Models (GLMs): GLMs extend linear models to accommodate response variables that have
distributions from the exponential family. They include:
o Logistic regression for binary outcomes.
o Poisson regression for count data.
o Gaussian regression for continuous outcomes.
Regularization: L1 and L2 regularization techniques can be easily applied to models in the exponential family to
prevent overfitting.
Bayesian Inference: Conjugate priors simplify the computation of posterior distributions, making Bayesian
methods more practical.
Advantages
Flexibility: The exponential family covers many common distributions, making it applicable to a wide range of
problems.
Convenience: Mathematical properties such as the existence of sufficient statistics and conjugate priors
facilitate efficient learning and inference.
Unified Framework: Provides a consistent approach to handle different types of response variables in
supervised learning.
In summary, the exponential family of distributions offers a powerful framework for modeling a variety of
supervised learning problems, providing both flexibility and mathematical convenience. This framework
underlies many commonly used models, enabling effective learning and inference across different types of data
and applications.
Generative Machine Learning is an interesting subset of artificial intelligence, where models are trained to
generate new data samples similar to the original training data. In this article, we’ll explore the fundamentals of
generative machine learning, compare it with discriminative models, delve into its applications, and conclude
with insights into its significance in the AI landscape.
What is Generative Machine Learning?
Generative machine learning involves the development of models that learn the underlying distribution of the
training data. These models are capable of generating new data samples, which have similar characteristics to
the original dataset. Fundamentally, generative models aim to understand the core of the data in order to
generate unique and diverse outputs.
Generative
The basic components of generative learning involve appreciation probability distributions, which are used to
carry out the process of generating a sample data set. As GANs, VAEs and MCMCs are among the most
popular methods that are employed in generative learning.
Discriminative
One of the main things that differentiates machine learning models from each other is whether they are
generative or discriminative ones. Classifying variables use the boundary to separate different classes or
categories in the data. For instance, a classifier for discriminating between cats and dogs would learn to do so
depending on their features (such as size and color). Applications of Generative Machine Learning
Natural Language Generation (NLG): Instances like GPT-3 can process human-like written text when
prompted, thereby leading to possible applications in chatbots, content generation or language
translation.
Image Synthesis: Using Generative Adversarial Networks (GANs) is possible to create the pictures
which can be used in not only arts and design, but also computer graphics.
Data Augmentation: Generative models are capable of generating new data points and adding it to the
training datasets to improve their ability to tolerate anomalies and generalize what they have learnt.
Anomaly Detection: Generative models can be trained to determine that data follows a normal
distribution pattern and look for any abnormalities that vary out from this distribution in a significant
way.
Drug Discovery: Generative models can be applied to new molecular structures so that the experts in
drug discovery can try unseen chemicals compounds rapidly and widely.
Gaussian Discriminant Analysis (GDA) is a supervised learning algorithm used for classification tasks in
machine learning. It is a variant of the Linear Discriminant Analysis (LDA) algorithm that relaxes the
assumption that the covariance matrices of the different classes are equal.
1. GDA works by assuming that the data in each class follows a Gaussian (normal) distribution, and then
estimating the mean and covariance matrix for each class. It then uses Bayes’ theorem to compute the
probability that a new data point belongs to each class, and chooses the class with the highest probability
as the predicted class.
2. GDA can handle data with arbitrary covariance matrices for each class, unlike LDA, which assumes that
the covariance matrices are equal. This makes GDA more flexible and able to handle more complex
datasets. However, the downside is that GDA requires estimating more parameters, as it needs to
estimate a separate covariance matrix for each class.
3. One disadvantage of GDA is that it can be sensitive to outliers and may overfit the data if the number of
training examples is small relative to the number of parameters being estimated. Additionally, GDA may
not perform well when the decision boundary between classes is highly nonlinear.
Overall, GDA is a powerful classification algorithm that can handle more complex datasets than LDA, but it
requires more parameters to estimate and may not perform well in all situations.
GDA is a flexible algorithm that can handle datasets with arbitrary covariance matrices for each class, making it
more powerful than LDA in some situations.
GDA produces probabilistic predictions, which can be useful in many applications where it is important to have
a measure of uncertainty in the predictions.
GDA is a well-studied and well-understood algorithm, making it a good choice for many classification tasks
GDA requires estimating more parameters than LDA, which can make it computationally more expensive and
more prone to overfitting if the number of training examples is small relative to the number of parameters.
GDA assumes that the data in each class follows a Gaussian distribution, which may not be true for all datasets.
GDA may not perform well when the decision boundary between classes is highly nonlinear, as it is a linear
classifier.
GDA may be sensitive to outliers in the data, which can affect the estimated parameters and lead to poor
performance.
Gaussian Discriminant Analysis is a Generative Learning Algorithm and in order to capture the distribution of
each class, it tries to fit a Gaussian Distribution to every class of the data separately. The below images depict
the difference between the Discriminative and Generative Learning Algorithms. The probability of a prediction
in the case of the Generative learning algorithm will be high if it lies near the centre of the contour
corresponding to its class and decreases as we move away from the centre of the contour.
Support Vector Machines (SVMs) are powerful supervised learning models used for classification and
regression tasks. The use of kernels in SVMs allows them to efficiently handle non-linear decision boundaries.
Here’s an overview of how kernels are used in SVMs:
What is a Kernel?
A kernel is a function that computes the dot product of two vectors in a high-dimensional space without
explicitly mapping the vectors to that space. This is known as the "kernel trick." Kernels enable SVMs to
perform complex classifications efficiently by implicitly mapping input features into a higher-dimensional
space where a linear separator can be found.
1. Non-linear Classification: Kernels allow SVMs to create non-linear decision boundaries in the original feature
space by transforming the input features into a higher-dimensional space.
2. Efficiency: The kernel trick makes it computationally feasible to work in very high-dimensional spaces without
explicitly performing the transformation.
1. Linear Kernel:
o Usage: Suitable for linearly separable data.
o Interpretation: No transformation; the decision boundary is a hyperplane in the original feature space.
2. Polynomial Kernel:
o Usage: Useful for data that requires a polynomial decision boundary.
o Interpretation: Maps the input features into a polynomial feature space.
3. Radial Basis Function (RBF) / Gaussian Kernel:
o Usage: Suitable for most types of data, especially when the decision boundary is complex and non-
linear.
o Interpretation: Maps the input features into an infinite-dimensional space.
4. Sigmoid Kernel:
o Usage: Inspired by neural networks, suitable for specific types of data.
o Interpretation: Maps the input features into a space where the decision boundary resembles that of a
neural network.
1. Transformation: Kernels implicitly transform the input features into a higher-dimensional space where a linear
decision boundary can be found.
2. Dot Product Computation: Instead of computing the dot product explicitly in the high-dimensional space,
kernels compute it directly in the original feature space, making the process computationally efficient.
3. Optimization: SVM optimization finds the hyperplane in the transformed space that maximizes the margin
between different classes.
1. Select a Kernel: Choose an appropriate kernel function based on the nature of the data.
2. Kernel Computation: Compute the kernel function for all pairs of data points to form the kernel matrix.
3. Optimization: Solve the SVM optimization problem using the kernel matrix to find the optimal hyperplane in the
high-dimensional space.
4. Classification: Use the learned model to classify new data points by computing the decision function using the
kernel.