0% found this document useful (0 votes)
8 views6 pages

Machine Learning

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

Machine Learning

1. What is machine learning ?


● Machine learning is a branch of AI and CS which focuses on use of data and
algorithms to copy the way that humans learn, gradually improving it’s
accuracy.
2. Types of ML :
● Supervised ML
● Unsupervised ML
● Reinforcement ML : (reward maximisation)
3. Issues in ML :
● Inadequate training data
● Poor quality of data
● Overfitting & underfitting
● Lack of skilled resources
4. Applications of ML
● Image filtering
● Auto captioning in image
● Recommendation system
5. Steps in Deploying ML application
● Problem scoping
● Data Acquisition
● Data Pre-processing
● Modelling
● Evaluation
● Deployment
6. Training Error :
● Training Error refers to the error when model is evaluate on the same dataset
which it was trained on
● High training error suggests that the model is too simplistic and is unable to
capture the underlying patterns in the data, potentially indicating bias error or
underfitting.
7. Generalization Error :
● Error occurred when the model is evaluated on a new, unseen dataset that
was not used during training.
8. Overfitting :
● When a model performs very well on training data but fails to generalise new
unseen data.
● It happens when model become too complex or too closely fits the noise
present in data rather than identifying underline patterns.
9. Underfitting :
● Underfitting occurs when a model is too simplistic and fails to capture the
underlying patterns in the training data

10. Bias Error :


● The inability of a model to capture the relationship between the training data.
● Model is too simplistic.
● Underfitting : High bias
● Overfitting : Low bias
11. Variance :
● Variance is the error due to the model's sensitivity to small fluctuations in the
training data.
● Mode is too complex
● Underfitting : Low variance
● Overfitting : High variance.
12.

13. What is linear regression ?


● Linear regression is a statistical method used to model the relationship
between a dependent variable (target) and one or more independent variables
(features). The goal is to find the best-fit line, known as the "regression line,"
that represents this relationship.
14. Simple Linear Regression : In simple linear regression, there is one dependent
variable and one independent variable. The relationship between them is represented
as:
y=mx+c

where:

● y is the dependent variable.


● x is the independent variable.
● m (or β1​) is the slope of the line, showing how much y changes with a change in x.
● c (or β0​) is the y-intercept, the value of y when x=0.

15. Multiple Linear Regression : This involves multiple independent variables.


● y=β0​+β1​x1​+β2​x2​+⋯+βn​xn.​
● Here, each β coefficient represents the weight or influence of a corresponding
independent variable.
16. Mean Squared Error (MSE): The Mean Squared Error (MSE) is a metric that
measures the average squared difference between predicted and actual values. It
quantifies how well a regression model fits the data; the lower the MSE, the better the
model fits.

17. Cost function :


18.

19. Logistic Regression :


● Logistic regression is a statistical method used in machine learning for binary
classification, where the outcome variable has two possible classes (e.g., "yes"
or "no," "spam" or "not spam"). Unlike linear regression, which predicts
continuous values, logistic regression predicts the probability that a given
input belongs to a particular class. It does this by applying the logistic function
(or sigmoid function), which maps any input to a value between 0 and 1.
20. Sigmoid Function :

21. Performance measures used to evaluate the quality of a machine learning model
include:
● Confusion Matrix
● Accuracy
● Recall
● Precision
● Specificity
● F1 Score
22. Define the terms Recall, Precision, and Specificity in the context of performance
evaluation. :
● Recall (Sensitivity): Recall measures the ability of a classifier to correctly
identify positive instances out of all actual positive instances. It is calculated
as the ratio of true positives to the sum of true positives and false negatives.
● Precision: Precision measures the accuracy of positive predictions made by the
classifier. It is calculated as the ratio of true positives to the sum of true
positives and false positives.
● Specificity: Specificity measures the ability of a classifier to correctly identify
negative instances out of all actual negative instances. It is calculated as the
ratio of true negatives to the sum of true negatives and false positives.
23. What is the F1 Score in machine learning, and why is it useful?
● The F1 Score is a measure of a model’s accuracy that considers both precision
and recall. It is the harmonic mean of precision and recall and is calculated as

24.

25. Bagging
● Bagging, or Bootstrap Aggregating, is a technique that aims to reduce variance
and avoid overfitting. Here's how it works:
1. Data Sampling: Multiple subsets of the training data are created by random sampling
with replacement (bootstrap sampling).
2. Model Training: A separate model (usually the same type) is trained on each subset
independently.
3. Prediction Aggregation: For classification, the final output is typically decided by
majority voting among models; for regression, it’s the average prediction.

Example Algorithm: Random Forest is a popular bagging method that uses multiple
decision trees to improve accuracy and stability.

Boosting
Boosting focuses on improving model accuracy by correcting errors of previous models,
aiming to reduce both bias and variance. Here’s the process:

1. Sequential Learning: Models are trained sequentially, with each new model
attempting to correct the errors made by the previous one.
2. Weighted Samples: Incorrectly predicted samples are given higher weights, making
the next model focus on these harder-to-classify examples.
3. Weighted Prediction: The final prediction is often a weighted sum of the individual
models’ predictions.

26. Support Vector machine

● A Support Vector Machine (SVM) is a supervised machine learning algorithm


commonly used for classification and regression tasks. The primary goal of SVM is to
find an optimal hyperplane that best separates data points of different classes in a
high-dimensional space. This optimal hyperplane maximizes the margin between the
closest points (support vectors) of each class, thus enhancing the model's ability to
generalize to new data.

27. Types of Kernel :

The kernel trick is a method used to implicitly map the input features into a
higher-dimensional space without explicitly calculating the transformation. It allows SVMs
to efficiently handle non-linear decision boundaries by computing the dot product in the
higher-dimensional space using a kernel function. This makes SVMs versatile and powerful
in capturing complex patterns in the data.

● Linear Kernel
● Polynomial Kernel
● Radial Basis function Kernel

28. What is clustering?

● The Process of organising the objects into a group whose members are similar in
some ways.

29. Types of clustering :

1. Centroid Based Clustering


2. Density Based Clustering
3. Distributed Clustering
4. Hierarchical Clustering

You might also like