0% found this document useful (0 votes)
22 views26 pages

ML NOTES

Uploaded by

A. Hassan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views26 pages

ML NOTES

Uploaded by

A. Hassan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

1

MACHINE BY: Yogesh

LEARNING
Gunjal

"Machine learning is the bridge where human curiosity


meets computational power to solve real-world
problems."
Types of Machine Learning............................................................................................................................. 1
Supervised Learning.................................................................................................................................... 1
Unsupervised Learning ............................................................................................................................... 1
Reinforcement Learning ............................................................................................................................. 1
Overfitting & Underfitting .............................................................................................................................. 4
Bias & Variance ............................................................................................................................................... 5
Supervised Machine Learning Algorithms ..................................................................................................... 6
1. Linear Regression .................................................................................................................................... 6
2. Logistic Regression .................................................................................................................................. 7
3. Decision Trees ......................................................................................................................................... 8
4. Random Forest ........................................................................................................................................ 9
5. Support Vector Machines (SVM) ............................................................................................................ 9
6. K-Nearest Neighbors (KNN) .................................................................................................................. 10
Un-Supervised Machine Learning Algorithms ............................................................................................. 11
1. K-Means Clustering ............................................................................................................................... 11
2. Hierarchical Clustering .......................................................................................................................... 12
3.DBSCAN .................................................................................................................................................. 13
4. Principal Component Analysis (PCA) .................................................................................................... 14
Ensemble Techniques ................................................................................................................................... 15
1. Bagging .................................................................................................................................................. 15
2. Boosting ................................................................................................................................................ 16
3. Stacking ................................................................................................................................................. 16
4. Voting .................................................................................................................................................... 17
Performance Metrics .................................................................................................................................... 17
Classification Problem Statement ............................................................................................................ 18
Regression Problem Statement ................................................................................................................ 20
Cross-Validation ............................................................................................................................................ 21
Hyperparameter Tuning ............................................................................................................................... 22
Model Fit ....................................................................................................................................................... 23
Model Transform .......................................................................................................................................... 23
Model Fit-Transform..................................................................................................................................... 23
Model Predict................................................................................................................................................ 23
One Hot Encoder ........................................................................................................................................... 24
1
Machine Learning
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables systems to learn and improve
from experience without being explicitly programmed. Instead of following hard-coded instructions, ML
models are trained on data to identify patterns, make decisions, or predict outcomes.

At its core, ML involves three main steps:

1. Data Collection and Preprocessing: Gathering and preparing data for analysis.
2. Model Building: Training algorithms to understand patterns in the data.
3. Evaluation and Deployment: Testing the model's accuracy and deploying it for real-world use.

Types of Machine Learning

Supervised Learning
• Definition: The model is trained on labeled data, where both input and the corresponding
output are provided. The goal is to learn the mapping between inputs and outputs.
• Examples:
o Predicting house prices (input: features like area, location; output: price).
o Classifying emails as spam or not spam.
• Common Algorithms:
o Regression: Linear Regression, Logistic Regression.
o Classification: Decision Trees, Support Vector Machines (SVM), Random Forests,
Neural Networks.

Unsupervised Learning
• Definition: The model is trained on unlabeled data, where only the input is provided. The goal
is to find hidden patterns or groupings in the data.
• Examples:
o Customer segmentation in marketing.
o Dimensionality reduction using PCA (Principal Component Analysis).
• Common Algorithms:
o Clustering: K-Means, Hierarchical Clustering, DBSCAN.
o Dimensionality Reduction: PCA, t-SNE.

Reinforcement Learning
• Definition: The model learns by interacting with an environment and receiving feedback in
the form of rewards or penalties. The goal is to take actions that maximize cumulative
rewards.
• Examples:
o Game-playing AI (e.g., AlphaGo, chess engines).
o Self-driving cars optimizing routes.

• Key Components:
o Agent: The decision-maker (e.g., the model).
o Environment: Where the agent operates (e.g., the game or driving conditions).
o Actions, Rewards, and States.

Choosing the Right Type


• Use Supervised Learning when you have labeled data and a clear prediction goal.
• Use Unsupervised Learning to explore data or find hidden patterns without labeled outcomes.
• Use Reinforcement Learning for decision-making tasks in dynamic environments.

Each type of ML plays a crucial role in advancing AI applications in fields like healthcare, finance, robotics,
and natural language processing.

Supervised Learning Algorithms

Category Algorithm Use Case

Regression Linear Regression Predicting continuous values (e.g., house prices).

Polynomial Regression Predicting nonlinear relationships.

Ridge/Lasso Regression Handling multicollinearity in regression.

Classification Logistic Regression Binary classification (e.g., spam detection).

Decision Trees Interpretability in decision-making tasks.

Random Forests Handling complex classification tasks.

Support Vector Machines (SVM) High-dimensional classification problems.

Naïve Bayes Text classification and sentiment analysis.


K-Nearest Neighbors (KNN) Instance-based classification.

Neural Networks (NNs) Image and speech recognition.

Unsupervised Learning Algorithms

Category Algorithm Use Case


Clustering K-Means Customer segmentation in
marketing.

Hierarchical Clustering Building hierarchies of data


relationships.

DBSCAN Identifying noise and outliers in data.

Dimensionality PCA (Principal Component Analysis) Reducing features for visualization or


Reduction efficiency.

t-SNE Visualizing high-dimensional


datasets.

UMAP (Uniform Manifold Approximation Faster and scalable dimensionality


and Projection) reduction.

Anomaly Detection Isolation Forest Fraud detection in transactions.


Autoencoders Detecting rare patterns or
anomalies.

Reinforcement Learning Algorithms

Category Algorithm Use Case


Value-Based Q-Learning Optimizing agent rewards in static
environments.

Deep Q-Learning (DQN) Game-playing agents (e.g., Atari games).


Policy-Based REINFORCE Directly learning policies for decision-making.
Proximal Policy Optimization (PPO) Stability and efficiency in RL.
Actor-Critic Advantage Actor-Critic (A2C) Balancing exploration and exploitation.
Deep Deterministic Policy Gradient Continuous action spaces.
(DDPG)

Model- AlphaZero Chess, Go, and other strategic games.


Based

Overfitting & Underfitting

Overfitting
• Definition: The model learns not only the underlying pattern in the training data but also
noise and outliers. This leads to excellent performance on training data but poor performance
on unseen data.
• Characteristics:
o High accuracy on training data.
o Low accuracy on validation/test data.
o Model is too complex (e.g., too many parameters).

Example:

• Dataset: Predict house prices based on features like size, location, and number of bedrooms.
• Scenario:
o A decision tree model with depth=20 fits every minor fluctuation in the training data.
o Result:
▪ Training accuracy: 98%
▪ Validation accuracy: 65%
o Reason: The model memorized the data rather than learning the general pattern.

Solution:

• Simplify the model (e.g., reduce depth of decision trees, add regularization).
• Use cross-validation.

Underfitting
• Definition: The model is too simple to capture the underlying pattern in the data. It performs
poorly on both training and validation/test data.
• Characteristics:
o Low accuracy on training data.
o Low accuracy on validation/test data.
o Model lacks complexity (e.g., too few parameters).

Example:

• Dataset: Predict house prices based on features like size, location, and number of bedrooms.
• Scenario:
o A linear regression model tries to fit a non-linear relationship between features and
target.
o Result:
▪ Training accuracy: 50%
▪ Validation accuracy: 45%
o Reason: The model fails to capture the complex relationship between features and
target.

Solution:

• Increase model complexity (e.g., use polynomial regression, neural networks).


• Add more relevant features.

Bias & Variance

1. Bias
• Definition: Bias refers to the error due to overly simplistic assumptions in the model. High
bias can cause the model to miss important relationships in the data, leading to underfitting.
• Characteristics:
o The model is too simple.
o High training error and high validation/test error.
o Cannot capture the underlying patterns in the data.

Example:

• Dataset: Predict house prices based on features like size, location, and number of bedrooms.
• Model: Linear regression for data with a clear non-linear relationship.
o Prediction fails to capture the curved trend in the data.
o Both training and validation errors are high because the model is biased towards
linearity.
2. Variance
• Definition: Variance refers to the error due to the model's sensitivity to small fluctuations in
the training data. High variance means the model learns noise and performs poorly on unseen
data, leading to overfitting.
• Characteristics:
o The model is overly complex.
o Low training error but high validation/test error.
o Fails to generalize to new data.

Example:

• Dataset: Predict house prices based on features like size, location, and number of bedrooms.
• Model: A decision tree with very high depth.
o The model memorizes the training data but cannot generalize to new data points.

Supervised Machine Learning Algorithms

1. Linear Regression

Overview:
Linear Regression is used for predicting continuous numerical values. It models the relationship
between the dependent variable and one or more independent variables xxx by fitting a straight
line (or a hyperplane for multiple variables) to the data.
Advantages:

• Simple to implement and interpret.


• Works well when relationships are linear.

Limitations:

• Assumes linearity between variables.


• Sensitive to outliers.
• Limited performance on high-dimensional or complex datasets.

Use Cases:

• Predicting house prices based on features like size, location, and age.
• Forecasting sales based on historical data.

2. Logistic Regression

Overview:

Logistic Regression is used for binary classification tasks (e.g., yes/no, spam/not spam). It predicts
the probability of the dependent variable belonging to a particular class.

Advantages:

• Works well for binary outcomes.


• Probabilistic interpretation.
• Can be extended to multiclass problems (using techniques like one-vs-rest or softmax).

Limitations:

• Assumes a linear relationship between features and the log-odds of the target variable.
• Sensitive to multicollinearity and outliers.

Use Cases:

• Spam email detection.


• Medical diagnosis (e.g., predicting disease presence).

3. Decision Trees

Overview:

Decision Trees are non-linear models used for both classification and regression tasks. They split
data into subsets based on feature conditions, forming a tree-like structure.

How it Works:

• At each node, the algorithm chooses the feature and split point that best separates the
data using criteria like Gini Impurity or Entropy.
• Stops splitting when leaf nodes meet criteria (e.g., pure class or max depth reached).

Advantages:

• Easy to interpret (if the tree is small).


• Handles non-linear relationships and categorical data.
• No need for feature scaling.

Limitations:

• Prone to overfitting (solved using pruning).


• Not robust to small changes in data (can lead to different splits).

Use Cases:

• Credit risk assessment.


• Fraud detection in financial transactions.
4. Random Forest
Overview:

Random Forest is an ensemble method that builds multiple decision trees and combines their
outputs (via averaging for regression or voting for classification).

How it Works:

• Each tree is trained on a random subset of data (bagging) and features.


• Reduces overfitting by averaging results across trees.

Advantages:

• Handles large datasets with higher dimensionality.


• Reduces overfitting compared to individual decision trees.
• Robust to noise.

Limitations:

• Computationally expensive for large datasets.


• Interpretability is lower than a single decision tree.

Use Cases:

• Predicting customer churn.


• Feature selection (through feature importance scores).

5. Support Vector Machines (SVM)


Overview:
SVM is a classification algorithm that finds the hyperplane with the maximum margin to separate
data points into classes.

How it Works:

• Constructs a decision boundary (hyperplane) such that the margin between classes is
maximized.
• Uses kernel functions (e.g., linear, polynomial, RBF) to handle non-linear separations.
Advantages:

• Effective in high-dimensional spaces.


• Works well for small datasets.
• Can handle non-linear relationships with kernel tricks.

Limitations:

• Computationally intensive for large datasets.


• Requires careful tuning of hyperparameters.

Use Cases:

• Image classification.
• Text categorization and sentiment analysis.

6. K-Nearest Neighbors (KNN)


Overview:

KNN is an instance-based algorithm that assigns a data point to the class of its 𝜿 nearest
neighbors.

How it Works:

• Compute the distance (e.g., Euclidean) between the query point and all data points.
• Find the 𝜿-closest points and assign the majority class (classification) or average value
(regression).

Advantages:

• Simple to implement.
• No training phase; only stores data points.

Limitations:

• Computationally expensive at inference (distance computation for all points).


• Sensitive to noisy data and irrelevant features.

Use Cases:

• Recommender systems.
• Handwritten digit recognition.

Summary Table:

Algorithm Type Key Strength Primary Use Case

Linear Regression Regression Simplicity and interpretability Forecasting sales

Logistic Regression Classification Probabilistic predictions Spam detection

Decision Trees Both Easy to interpret Credit scoring

Random Forest Both Robustness to overfitting Customer churn


prediction
SVM Classification Works well in high-dimensional spaces Text categorization

KNN Both Simple, no training phase Recommender systems

Un-Supervised Machine Learning Algorithms

1. K-Means Clustering

Overview:

K-Means is one of the simplest and most popular clustering algorithms. It partitions a dataset into
𝑘 clusters, where each cluster is represented by its centroid.

How it Works:

1. Initialization: Randomly initialize 𝑘 centroids.


2. Assignment: Assign each data point to the nearest centroid (based on distance, e.g.,
Euclidean).
3. Update: Recalculate centroids as the mean of all points in the cluster.
4. Repeat: Alternate between assignment and update steps until convergence (e.g.,
centroids stop changing).

Advantages:

• Easy to understand and implement.


• Works well on spherical and well-separated clusters.
• Scales well with large datasets.

Limitations:

• Requires pre-specifying 𝑘 (number of clusters).


• Sensitive to outliers.
• Poor performance on non-spherical or overlapping clusters.

Use Cases:

• Customer segmentation in marketing.


• Image compression (grouping similar pixels).

2. Hierarchical Clustering

Overview:
Hierarchical clustering creates a tree-like structure (dendrogram) that groups data points into clusters
based on their similarity.

How it Works:

1. Start with each data point as its own cluster.


2. Agglomerative Approach:
o Merge the two closest clusters based on a distance metric (e.g., single-linkage,
complete-linkage).
o Repeat until a single cluster remains.
3. Alternatively, Divisive Approach:
o Start with one cluster containing all points.
o Recursively split clusters until each data point forms its own cluster.

Advantages:

• No need to predefine the number of clusters.


• Produces a dendrogram, which provides insight into cluster hierarchy.

Limitations:

• Computationally expensive for large datasets.


• Sensitive to noise and outliers.

Use Cases:

• Gene expression analysis in bioinformatics.


• Document clustering for text data.

3.DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Overview:

DBSCAN is a density-based clustering algorithm that identifies clusters based on regions of high
density and can detect outliers.

How it Works:

1. Define two parameters:


o ε (epsilon): The radius of a neighborhood.
o MinPts: Minimum number of points required to form a dense region.
2. Classify points:
o Core points: Have at least MinPts neighbors within ε.
o Border points: Lie within ε of a core point but have fewer than MinPts neighbors.
o Noise points: Neither core nor border points.
3. Form clusters by connecting core points and their neighborhoods.

Advantages:

• Detects clusters of arbitrary shape.


• Identifies noise and outliers.
• No need to predefine the number of clusters.

Limitations:

• Sensitive to the choice of ε and MinPts.


• Struggles with varying density clusters.
Use Cases:

• Anomaly detection in financial transactions.


• Geospatial data analysis.

4. Principal Component Analysis (PCA)

Overview:

PCA is a dimensionality reduction technique that projects data onto a lower-dimensional space
while retaining as much variance as possible.

How it Works:

1. Standardize the data.


2. Compute the covariance matrix to identify relationships between features.
3. Calculate eigenvalues and eigenvectors of the covariance matrix.
4. Select the top 𝑘 eigen vectors (principal components) based on eigenvalues.
5. Transform the data onto these 𝑘 components.

Advantages:

• Reduces dimensionality and computational complexity.


• Removes multicollinearity between features.
• Enhances interpretability by reducing noise.

Limitations:

• Loses interpretability of original features.


• Assumes linear relationships.

Use Cases:

• Visualizing high-dimensional data.


• Preprocessing for machine learning pipelines.
Summary Table

Algorithm Type Key Strength Use Case


K-Means Clustering Simple and scalable Customer
segmentation

Hierarchical Clustering Insightful hierarchical Gene expression


Clustering relationships analysis
DBSCAN Density-based Detects outliers and arbitrary Anomaly detection
shapes

PCA Dimensionality Simplifies high-dimensional data Data visualization


Reduction

Ensemble Techniques

Ensemble techniques combine predictions from multiple models to improve performance and
generalization compared to individual models. The idea is to leverage the strengths of multiple models
while mitigating their weaknesses.

Types of Ensemble Techniques


1. Bagging (Bootstrap Aggregating)
• Concept: Creates multiple subsets of the original dataset by sampling with replacement. A
model is trained on each subset, and their outputs are aggregated (e.g., averaging for
regression, voting for classification).
• Goal: Reduces variance and prevents overfitting by averaging out noise in predictions.

Example Algorithms:

• Random Forest:
o Builds multiple decision trees on bootstrapped datasets.
o Aggregates their predictions (majority vote or average).

Advantages:

• Reduces overfitting in high-variance models.


• Improves stability and accuracy.
Disadvantages:

• Can be computationally expensive with large datasets.

2. Boosting
• Concept: Builds models sequentially, where each new model tries to correct the errors of the
previous ones. Models are weighted based on their performance.
• Goal: Reduces bias and creates a strong model from weak learners.

Example Algorithms:

• AdaBoost:
o Assigns weights to data points; misclassified points get higher weights in the next
iteration.
• Gradient Boosting:
o Minimizes the loss function by training models sequentially to correct errors of the
previous model.
• XGBoost:
o An optimized version of Gradient Boosting, faster and more efficient.

Advantages:

• Improves performance on complex datasets.


• Highly accurate.

Disadvantages:

• Prone to overfitting if not regularized.

3. Stacking (Stacked Generalization)


• Concept: Combines predictions from multiple base models using a meta-model (e.g., logistic
regression). Base models are trained independently, and their predictions are used as inputs
for the meta-model.
• Goal: Leverages the strengths of multiple models and combines them optimally.

Example:

• Use a decision tree, SVM, and k-NN as base models.


• Train a logistic regression model on their predictions to produce the final output.
Advantages:

• Very flexible.
• Can achieve high accuracy.

Disadvantages:

• Computationally expensive.
• Complex to implement and tune.

4. Voting
• Concept: Combines predictions from multiple models and uses majority voting (classification)
or averaging (regression) to make the final prediction.
• Goal: Simple way to aggregate model outputs for better accuracy.

Types:

• Hard Voting: Majority vote among classifiers.


• Soft Voting: Averages probabilities for each class and chooses the class with the highest
probability.

Advantages:

• Easy to implement.
• Works well with diverse models.

Disadvantages:

• Limited by the performance of individual models.

Performance Metrics
Performance metrics evaluate how well a machine learning model performs on a dataset. The choice of a
metric depends on the type of problem being solved (classification, regression, etc.).

True Positive (TP): The model correctly predicts the positive class.
True Negative (TN): The model correctly predicts the negative class.

False Positive (FP): The model incorrectly predicts the positive class for a negative instance
(Type I Error).

False Negative (FN): The model incorrectly predicts the negative class for a positive instance
(Type II Error).

Classification Problem Statement


Used when the output is categorical (e.g., spam vs. not spam).

1. Accuracy

• Definition: The proportion of correct predictions out of the total predictions.

• When to Use: Works well for balanced datasets.


• Limitation: Misleading for imbalanced datasets (e.g., predicting 99% "no" in a dataset with 1%
"yes").

2. Precision

• Definition: The proportion of true positive predictions out of all positive predictions.

• When to Use: Useful when false positives are costly (e.g., spam detection).

3. Recall (Sensitivity or True Positive Rate)


• Definition: The proportion of true positive predictions out of all actual positives.
• When to Use: Useful when false negatives are costly (e.g., detecting diseases).

4. F1-Score

• Definition: The harmonic mean of precision and recall.

• When to Use: For imbalanced datasets where both precision and recall are important.

5. ROC-AUC (Receiver Operating Characteristic - Area Under the Curve)

• Definition: Measures the tradeoff between true positive rate (TPR) and false positive rate
(FPR) at different thresholds.
• When to Use: Evaluates model performance across all classification thresholds.

6. Confusion Matrix
A tabular representation of predictions:

Predicted Positive Predicted Negative

Actual Positive True Positive (TP) False Negative (FN)

Actual Negative False Positive (FP) True Negative (TN)


Regression Problem Statement
Used when the output is continuous (e.g., predicting house prices).

1. Mean Absolute Error (MAE)

• Definition: The average of absolute differences between actual and predicted values.

• When to Use: Simpler to interpret but penalizes all errors equally.

2. Mean Squared Error (MSE)

• Definition: The average of squared differences between actual and predicted values.

• When to Use: Penalizes larger errors more heavily.

3. Root Mean Squared Error (RMSE)

• Definition: The square root of MSE.

• When to Use: More interpretable than MSE (in same units as target variable).
4. R-Squared (R²)

• Definition: Represents the proportion of variance explained by the model.

• When to Use: Measures how well the model fits the data.

Cross-Validation

Definition:
Cross-validation is a technique to evaluate the performance of a machine learning model by
splitting the data into multiple subsets for training and testing.

Purpose:

• Reduces overfitting.
• Provides reliable performance metrics.
• Ensures efficient use of the dataset.

Types:

• K-Fold Cross-Validation: Splits the data into 𝜅 folds; each fold is used as a test set once.
• Stratified K-Fold: Maintains class distribution across folds (useful for imbalanced data).
• Leave-One-Out (LOOCV): Each data point is used as a test set once.
• Time Series CV: Ensures training data precedes test data (for temporal data).
• Nested Cross-Validation: Combines inner and outer loops for hyperparameter tuning and
evaluation.

How It Works:

• Split data into subsets (folds).


• Train the model on some folds, test on the remaining fold.
• Repeat for all folds and average results.

Advantages:
• Improves model generalization.
• Reduces bias from specific train-test splits.
• Evaluates model stability.

Disadvantages:

• Computationally expensive.
• May not always be necessary for very large datasets.

Hyperparameter Tuning

Definition:
Hyperparameter tuning is the process of optimizing the hyperparameters of a machine learning
model to improve its performance. Hyperparameters are settings that cannot be learned directly
from the data (e.g., learning rate, number of trees in a random forest).

Why Use It?

• To enhance model performance by finding the best combination of hyperparameters.


• To prevent overfitting or underfitting by balancing model complexity.
• To customize the model to the specific dataset.

How It Works:

• Grid Search: Exhaustively tries all combinations of hyperparameter values.


• Random Search: Randomly samples combinations of hyperparameters.
• Bayesian Optimization: Uses probabilistic models to find the best hyperparameters more
efficiently.
• Automated Tuning: Tools like Optuna or Hyperopt automate the search process.

Examples of Hyperparameters:

• Random Forest: Number of trees, max depth.


• Gradient Boosting: Learning rate, number of estimators.
• Neural Networks: Learning rate, batch size, number of layers.

Tools for Tuning:

• GridSearchCV and RandomizedSearchCV in scikit-learn.


• Specialized libraries like Optuna, Hyperopt, and Ray Tune.
Model Fit
• Definition: The process where a machine learning model learns patterns from the training data.
• How: Uses the fit() method in most libraries (e.g., scikit-learn).
• Why Use It:
o To train the model by estimating parameters (e.g., coefficients in linear regression).
o Establishes a relationship between input features and the target variable.

Model Transform
• Definition: Applies the learned transformation or prediction logic to the data.
• How: Uses the transform() method for transformation tasks (e.g., scaling data) or predict() for
predictions.
• Why Use It:
o To process data using the trained model.
o For prediction or feature transformations.

Model Fit-Transform
• Definition: Combines fit and transform in one step.
• How: Commonly used with preprocessing tools like StandardScaler or PCA.
• Why Use It:

o Fits the transformation logic and immediately applies it to the data.

Model Predict
• Definition: Predicts outcomes based on the trained model.
• How: Uses the predict() method.
• Why Use It:
o To generate predictions on unseen (test) data.
One Hot Encoder
One Hot Encoder is a technique used in machine learning to convert categorical data into a
numerical format suitable for algorithms. It creates a binary column for each category in a
categorical variable and assigns a 1 or 0 to indicate the presence of a category in a given record.

Why Use One Hot Encoding?

1. Machine Learning Algorithms Need Numerical Input: Algorithms work better with
numerical data rather than categorical strings.
2. Avoid Ordinal Misinterpretation: Unlike label encoding, one-hot encoding does not imply
any order or hierarchy among the categories.
3. Ensures Better Performance: It allows algorithms to understand categorical distinctions
effectively without introducing bias.

Example:

For a categorical feature like Color = ["Red", "Blue", "Green"]:

• One hot encoding would create:


o Red: [1, 0, 0]
o Blue: [0, 1, 0]
o Green: [0, 0, 1]

You might also like