ml1

UNIT-1
#Algorithms and Machine Learning

Supervised Learning
1. Regression
o Linear Regression
 Concept: Models the relationship between dependent and
independent variables with a straight line.
 Key Equation: y=mx+b
 Loss Function: Mean Squared Error (MSE).
 Practice: Predict house prices using features like size, location.
o Logistic Regression
 Concept: Used for binary classification problems.
 Key Equation: Uses the sigmoid function: P(y=1∣x)=11+e−zP(y=1|x) = \

frac{1}{1 + e^{-z}}P(y=1∣x)=1+e−z1, where z=wTx+bz = w^Tx +
bz=wTx+b.
 Practice: Spam email classification.
2. Classification
o k-Nearest Neighbors (k-NN)
 Concept: Classifies a data point based on its nearest neighbors.
 Key Idea: Calculate distances (e.g., Euclidean), vote for the majority
class.
 Practice: Handwritten digit recognition.
o Support Vector Machines (SVM)
 Concept: Finds a hyperplane that best separates the classes.
 Key Idea: Maximizes the margin between classes.
 Kernel Trick: Use for non-linearly separable data.
o Decision Trees
 Concept: Splits data based on feature thresholds to form a tree.
 Key Idea: Measures purity using Gini Index or Entropy.
 Practice: Predict loan approvals.
Unsupervised Learning
1. Clustering
o k-Means
 Concept: Groups data into kkk clusters based on distance from
centroids.
 Key Steps: Initialize centroids → Assign points → Update centroids →
Repeat.
 Practice: Customer segmentation.
o Hierarchical Clustering
 Concept: Builds a tree of clusters by iteratively merging/splitting.
 Types: Agglomerative (bottom-up), Divisive (top-down).
 Practice: Gene expression analysis.
2. Dimensionality Reduction
o PCA (Principal Component Analysis)
 Concept: Reduces high-dimensional data by projecting it onto a few
principal components.
 Mathematics: Eigenvalues/Eigenvectors.
 Practice: Visualize high-dimensional datasets.
Reinforcement Learning
 Concept: Learns optimal actions through trial and error by maximizing rewards.
 Key Terms: Agent, Environment, Policy, Reward.
 Algorithms: Q-Learning, Deep Q-Learning.
 Practice: Build a simple game-playing agent (e.g., Tic-Tac-Toe).
Machine learning algorithms are computational models that allow computers to understand
patterns and forecast or make judgments based on data without explicit programming.
These algorithms form the foundation of modern artificial intelligence and are used in
various applications, including image and speech recognition, natural language processing,
recommendation systems, fraud detection, autonomous cars, etc.
1. Supervised Learning
Supervised learning involves training a model on labeled data, where the desired output is
known. The model learns to map inputs to outputs based on the provided examples.
A. Classification
1. Logistic Regression
 Description: Logistic regression models the probability of a binary outcome using a
logistic function. It outputs probabilities and classifies instances by setting a
threshold (usually 0.5).
 Key Points:
o Simple and easy to implement.
o Assumes linear relationship between the input features and the log-odds of
the outcome.
o Works well for binary classification problems.
 Applications: Email spam detection, disease diagnosis, credit scoring.
2. Support Vector Machines (SVM)
 Description: SVMs find the hyperplane that best separates different classes by
maximizing the margin between them.
 Key Points:
o Effective in high-dimensional spaces.
o Works well for both linear and non-linear classification using kernel trick.
o Sensitive to the choice of kernel and regularization parameter.
 Applications: Image classification, text categorization, bioinformatics.
3. k-Nearest Neighbors (k-NN)
 Description: k-NN classifies instances based on the majority class among the k-
nearest neighbors in the feature space.
 Key Points:
o Simple and intuitive.
o No explicit training phase, making it a lazy learner.
o Sensitive to the choice of k and the distance metric.
 Applications: Recommender systems, pattern recognition, anomaly detection.
4. Naive Bayes
 Description: Naive Bayes uses Bayes’ theorem with the assumption of feature
independence to classify instances.
 Key Points:
o Fast and efficient.
o Performs well with high-dimensional data.
o Assumption of feature independence might not hold in all cases.
 Applications: Text classification, sentiment analysis, spam filtering.
5. Decision Trees
 Description: Decision trees split data into subsets based on the value of input
features, creating a tree-like model of decisions.
 Key Points:
o Easy to interpret and visualize.
o Can handle both numerical and categorical data.
o Prone to overfitting without proper pruning.
 Applications: Risk assessment, fraud detection, customer segmentation.
6. Random Forest
 Description: Random forest is an ensemble of decision trees that improves accuracy
and controls overfitting by averaging multiple trees trained on different subsets of
data.
 Key Points:
o Reduces overfitting compared to individual decision trees.
o Handles large datasets with higher dimensionality.
o Requires more computational resources.
 Applications: Financial forecasting, image classification, healthcare diagnostics.
7. Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost)
 Description: Gradient boosting builds models sequentially to correct errors made by
previous models, optimizing for accuracy.
 Key Points:
o Highly accurate and efficient.
o Can handle different types of data.
o Prone to overfitting if not properly tuned.
 Applications: Web search ranking, customer churn prediction, insurance risk
prediction.
8. Neural Networks (e.g., Multilayer Perceptron)
 Description: Neural networks use layers of interconnected nodes to model complex
patterns in data.
 Key Points:
o Capable of learning non-linear relationships.
o Requires large amounts of data and computational power.
o Can be prone to overfitting.
 Applications: Image recognition, speech recognition, natural language processing.
B. Regression
1. Linear Regression
 Description: Linear regression models the relationship between dependent and
independent variables using a linear approach.
 Key Points:
o Assumes a linear relationship between the variables.
o Sensitive to outliers.
 Applications: House price prediction, sales forecasting, risk management.
2. Ridge Regression
 Description: Ridge regression adds L2 regularization to linear regression to handle
multicollinearity and prevent overfitting.
 Key Points:
o Shrinks coefficients to reduce overfitting.
o Handles multicollinearity well.
o Requires tuning of the regularization parameter.
 Applications: Economic forecasting, portfolio optimization, marketing analysis.
3. Lasso Regression
 Description: Lasso regression adds L1 regularization to linear regression to perform
feature selection by shrinking some coefficients to zero.
 Key Points:
o Performs feature selection.
o Can produce sparse models.
o Requires tuning of the regularization parameter.
 Applications: Gene selection, model selection, finance.
4. Support Vector Regression (SVR)
 Description: SVR uses support vector machines for regression tasks by finding a
function that deviates from the actual target values by a value no greater than a
specified margin.
 Key Points:
o Effective in high-dimensional spaces.
o Robust to outliers.
o Sensitive to the choice of kernel and regularization parameter.
 Applications: Time series prediction, stock price forecasting, real estate valuation.
5. Decision Trees Regression
 Description: Decision trees regression splits data into subsets to predict continuous
values.
 Key Points:
o Easy to interpret and visualize.
o Can handle both numerical and categorical data.
o Prone to overfitting without proper pruning.
 Applications: Business forecasting, medical diagnosis, engineering.
6. Random Forest Regression
 Description: Random forest regression is an ensemble of decision trees for
regression tasks, averaging the predictions to improve accuracy and control
overfitting.
 Key Points:
o Reduces overfitting compared to individual decision trees.
o Handles large datasets with higher dimensionality.
o Requires more computational resources.
 Applications: Environmental modeling, energy demand forecasting, market analysis.
7. Gradient Boosting Regression
 Description: Gradient boosting regression sequentially builds models to improve
predictions by correcting errors made by previous models.
 Key Points:
o Highly accurate and efficient.
o Can handle different types of data.
o Prone to overfitting if not properly tuned.
 Applications: Housing price prediction, customer lifetime value prediction, demand
forecasting.
8. Neural Networks Regression
 Description: Neural networks for regression use layers of interconnected nodes to
predict continuous values.
 Key Points:
o Capable of learning non-linear relationships.
o Requires large amounts of data and computational power.
o Can be prone to overfitting.
 Applications: Energy consumption forecasting, algorithmic trading, weather
prediction.
2. Unsupervised Learning
Unsupervised learning works with unlabeled data and aims to find hidden patterns or
intrinsic structures in the input data.
A. Clustering
1. k-Means
 Description: k-Means partitions data into k clusters based on feature similarity,
minimizing the sum of squared distances from each point to the centroid of its
assigned cluster.
 Key Points:
o Simple and efficient.
o Sensitive to the initial placement of centroids.
o Assumes clusters are spherical.
 Applications: Customer segmentation, market research, image compression.
2. Hierarchical Clustering
 Description: Hierarchical clustering builds a hierarchy of clusters using either a
bottom-up (agglomerative) or top-down (divisive) approach.
 Key Points:
o Does not require a predefined number of clusters.
o Produces a dendrogram for visualizing the hierarchy.
o Computationally intensive for large datasets.
 Applications: Social network analysis, gene sequence analysis, document clustering.
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
 Description: DBSCAN groups together points that are close to each other based on
distance and density, and identifies outliers as points that lie alone in low-density
regions.
 Key Points:
o Can find arbitrarily shaped clusters.
o Robust to noise and outliers.
o Requires tuning of the density parameters.
 Applications: Geographic data analysis, fraud detection, biology.
4. Gaussian Mixture Models (GMM)
 Description: GMM assumes data is generated from a mixture of several Gaussian
distributions, each representing a cluster.
 Key Points:
o Can model clusters with different shapes and sizes.
o Uses probabilistic soft assignments of points to clusters.
o Sensitive to initialization and can converge to local optima.
 Applications: Image segmentation, anomaly detection, finance.
B. Dimensionality Reduction
1. Principal Component Analysis (PCA)
 Description: PCA reduces the dimensionality of data by transforming it to a new set
of orthogonal features (principal components) that capture the maximum variance.
 Key Points:
o Reduces complexity of data.
o Helps in visualizing high-dimensional data.
o Assumes linear relationships among features.
 Applications: Data compression, noise reduction, feature extraction.
2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
 Description: t-SNE reduces dimensions for visualization by preserving the local
structure of the data, making similar points stay close together.
 Key Points:
o Effective for visualizing high-dimensional data.
o Computationally intensive.
o Does not preserve global structure well.
 Applications: Visualizing clusters, exploring high-dimensional data, anomaly
detection.
3. Linear Discriminant Analysis (LDA)
 Description: LDA reduces dimensions by maximizing class separability, transforming
data to a space that best discriminates between classes.
 Key Points:
o Maximizes class separability.
o Assumes normally distributed classes with identical covariances.
o Useful for supervised dimensionality reduction.
 Applications: Pattern recognition, face recognition, bioinformatics.
4. Independent Component Analysis (ICA)
 Description: ICA separates a multivariate signal into additive, independent
components.
 Key Points:
o Assumes statistical independence of components.
o Useful for blind source separation.
o Sensitive to noise.
 Applications: Signal processing, brain imaging, finance.
5. UMAP (Uniform Manifold Approximation and Projection)
 Description: UMAP reduces dimensions while preserving the global structure of the
data, using a manifold learning technique.
 Key Points:
o Preserves both local and global structure.
o Computationally efficient.
o Requires tuning of parameters.
 Applications: Data visualization, clustering, pattern recognition.
C. Association
1. Apriori Algorithm
 Description: The Apriori algorithm identifies frequent itemsets in transactional data
and generates association rules.
 Key Points:
o Can handle large datasets.
o Computationally expensive for large itemsets.
 Applications: Market basket analysis, cross-selling strategies, web usage mining.
2. Eclat Algorithm
 Description: The Eclat algorithm uses depth-first search to find frequent itemsets,
improving efficiency by reducing the number of database scans.
 Key Points:
o More efficient than Apriori for large datasets.
o Uses vertical data format.
o Requires sufficient memory for large itemsets.
 Applications: Market basket analysis, bioinformatics, text mining.
3. Reinforcement Learning
Reinforcement learning involves training agents to make a sequence of decisions by
rewarding them for good actions and penalizing them for bad ones.
A. Model-Free Methods
1. Q-Learning
 Description: Q-Learning learns the value of actions in states to maximize cumulative
reward, updating Q-values based on the Bellman equation.
 Key Points:
o Off-policy learning method.
o Can handle problems with stochastic transitions.
o Convergence can be slow.
 Applications: Robotics, game playing, autonomous vehicles.
2. Deep Q-Network (DQN)
 Description: DQN uses deep learning to approximate Q-values, enabling
reinforcement learning in high-dimensional state spaces.
 Key Points:
o Combines Q-Learning with deep neural networks.
o Handles large state spaces.
o Requires extensive training.
 Applications: Video games, robotics, control systems.
3. SARSA (State-Action-Reward-State-Action)
 Description: SARSA learns the value of the policy being followed by updating Q-
values based on the state-action pairs encountered.
 Key Points:
o On-policy learning method.
o Takes into account the policy’s behavior.
o Sensitive to the choice of policy.
 Applications: Path planning, robotics, autonomous navigation.
4. Policy Gradient Methods (e.g., REINFORCE)
 Description: Policy gradient methods directly learn the policy that maps states to
actions by optimizing the expected reward.
 Key Points:
o Suitable for continuous action spaces.
o Can handle complex policies.
o Prone to high variance in gradient estimates.
 Applications: Robotics, control systems, game playing.
B. Model-Based Methods
1. Deep Deterministic Policy Gradient (DDPG)
 Description: DDPG uses actor-critic methods for continuous action spaces, combining
deep learning with deterministic policy gradients.
 Key Points:
o Handles continuous action spaces.
o Stable learning process.
o Requires tuning of hyperparameters.
 Applications: Robotics, autonomous driving, financial trading.
2. Proximal Policy Optimization (PPO)
 Description: PPO balances exploration and exploitation using a clipped objective
function, improving policy stability and performance.
 Key Points:
o Stable and efficient.
o Suitable for complex tasks.
o Requires careful tuning of clipping parameter.
 Applications: Robotics, game playing, simulation-based optimization.
3. Trust Region Policy Optimization (TRPO)
 Description: TRPO ensures stable policy updates by optimizing within a trust region,
preventing large updates that could degrade performance.
 Key Points:
o Stable policy updates.
o Suitable for high-dimensional problems.
o Computationally intensive.
 Applications: Robotics, game playing, resource management.
C. Value-Based Methods
1. Monte Carlo Methods
 Description: Monte Carlo methods estimate value functions based on averaging
sample returns from multiple episodes.
 Key Points:
o Requires complete episodes for updating.
o High variance in estimates.
 Applications: Game playing, inventory management, financial modeling.
2. Temporal Difference (TD) Learning
 Description: TD Learning combines Monte Carlo and dynamic programming ideas,
updating value estimates based on bootstrapped predictions.
 Key Points:
o Does not require complete episodes.
o Balances bias and variance.
o Convergence can be slow.
 Applications: Game playing, robotics, control systems.
#Divide&Conquer
Divide and Conquer Algorithm is a problem-solving technique used to solve problems by
dividing the main problem into subproblems, solving them individually and then merging
them to find solution to the original problem. In this article, we are going to discuss how
Divide and Conquer Algorithm is helpful and how we can use it to solve problems.
Working of Divide and Conquer Algorithm:
Divide and Conquer Algorithm can be divided into three steps: Divide, Conquer and Merge.
1. Divide:
 Break down the original problem into smaller subproblems.
 Each subproblem should represent a part of the overall problem.
 The goal is to divide the problem until no further division is possible.
2. Conquer:
 Solve each of the smaller subproblems individually.
 If a subproblem is small enough (often referred to as the “base case”), we solve it
directly without further recursion.
 The goal is to find solutions for these subproblems independently.
3. Merge:
 Combine the sub-problems to get the final solution of the whole problem.
 Once the smaller subproblems are solved, we recursively combine their solutions to
get the solution of larger problem.
 The goal is to formulate a solution for the original problem by merging the results
from the subproblems.
Characteristics of Divide and Conquer Algorithm:
Divide and Conquer Algorithm involves breaking down a problem into smaller, more
manageable parts, solving each part individually, and then combining the solutions to solve
the original problem. The characteristics of Divide and Conquer Algorithm are:
 Dividing the Problem: The first step is to break the problem into smaller, more
manageable subproblems. This division can be done recursively until the
subproblems become simple enough to solve directly.
 Independence of Subproblems: Each subproblem should be independent of the
others, meaning that solving one subproblem does not depend on the solution of
another. This allows for parallel processing or concurrent execution of subproblems,
which can lead to efficiency gains.
 Conquering Each Subproblem: Once divided, the subproblems are solved
individually. This may involve applying the same divide and conquer approach
recursively until the subproblems become simple enough to solve directly, or it may
involve applying a different algorithm or technique.
 Combining Solutions: After solving the subproblems, their solutions are combined to
obtain the solution to the original problem. This combination step should be
relatively efficient and straightforward, as the solutions to the subproblems should
be designed to fit together seamlessly.
Applications
QuickSort, MergeSort, Strassen’s Algorithm, Binary Search,
Advantages
Solving difficult problems, Algo efficiency, Memory Access
# Randomization
Randomization is a statistical technique that uses chance to assign data samples, models, or
parameters to different groups or conditions. It's a key part of machine learning experiments
and model evaluation, and can help to:
 Improve accuracy
Randomization can help to improve the accuracy of machine learning algorithms by making
them more robust against outlier data and avoiding overfitting.
 Reduce bias
Randomization can help to reduce bias and confounding factors that might affect results,
such as the order of data or the selection of hyperparameters.
 Estimate uncertainty
Randomization can help to estimate the variability and uncertainty of outcomes.
 Perform statistical tests
Randomization can help to perform statistical tests and inference.
Here are some ways that randomization is used in machine learning:
 Splitting data
A common way to use randomization is to split data into training and test sets. This allows
you to train your model on one set of data and evaluate it on another set.
 Resampling
Resampling methods randomize the training and test datasets to help estimate the skill of
the model.
 Repeating experiments
Repeating evaluation experiments helps to estimate the skill of the model with different
random initialization and learning decisions.
 Randomized algorithms
Randomized algorithms, such as LSH and MinHash, can be used to find similar items in large
data sets.
 Randomized models
Randomized models, such as random forests, can be used to ensure unbiased learning and
evaluation.
Randomization is an algorithmic technique where randomness is introduced into
computations or decision-making to achieve certain goals like simplicity, speed, or
robustness. Randomized algorithms are especially powerful in areas like optimization,
cryptography, and machine learning.
Key Characteristics of Randomized Algorithms

1. Randomness: The algorithm uses random bits or numbers during its execution.
2. Probabilistic Guarantees: Results are often analyzed in terms of expected
performance or probabilistic bounds.
3. Repeatability: Different runs of the algorithm on the same input may yield different
results.
Types of Randomized Algorithms

1. Las Vegas Algorithms
 Definition: Always produce the correct result but may have a random runtime.
 Example:
o Randomized Quick Sort: Randomly selects a pivot for partitioning but
guarantees sorted output.
 Advantages: Reliability of correctness.
 Drawback: Unpredictable runtime.
2. Monte Carlo Algorithms
 Definition: May produce incorrect results with a small probability but typically run
within a fixed time.
 Example:
o Probabilistic primality testing (e.g., Miller-Rabin test).
 Advantages: Faster execution with high probability of correctness.
 Drawback: Possibility of errors.
Applications of Randomized Algorithms

1. Sorting and Searching
 Randomized Quick Sort:
o Chooses a pivot randomly to reduce the chance of worst-case scenarios
(O(n2)O(n^2)O(n2)).
o Expected Time Complexity: O(nlog⁡n)O(n \log n)O(nlogn).

 Randomized Binary Search:
o Enhances performance by exploring random directions in unstructured search
spaces.
2. Graph Algorithms
 Randomized Min-Cut:
o Used to find the minimum cut of a graph by randomly contracting edges.
o Time Complexity: O(n2log⁡n)O(n^2 \log n)O(n2logn).

o Correctness: High probability, with repeatable runs to improve accuracy.
 PageRank:
o Google's algorithm uses random walks on graphs to rank web pages.
3. Optimization
 Simulated Annealing:
o Mimics the physical process of annealing to find near-optimal solutions in
complex spaces.
o Uses randomness to escape local minima.
 Randomized Gradient Descent:
o Picks random directions or subsets of data points to speed up convergence in
optimization.
4. Machine Learning
 Random Forests:
o Builds an ensemble of decision trees using random sampling and feature
selection.
 Stochastic Gradient Descent (SGD):
o Optimizes models by randomly sampling subsets of data (minibatches).
5. Cryptography
 Randomized encryption schemes like RSA use randomness for security.
 Random number generation forms the basis of secure keys and protocols.
6. Numerical and Geometric Algorithms
 Monte Carlo Integration:
o Approximates integrals by sampling random points and averaging.
 Randomized Convex Hulls:
o Speeds up computation of geometric properties in high dimensions.
Advantages of Randomization
1. Simplicity: Randomized algorithms often avoid complex deterministic rules.
2. Efficiency: May outperform deterministic counterparts in expected runtime.
3. Robustness: Effective against adversarial inputs in problems like sorting.
4. Scalability: Random sampling can handle large datasets efficiently.
Challenges with Randomized Algorithms

1. Non-Determinism: Different outputs for the same input may complicate debugging.
2. Performance Variability: Runtime or accuracy may vary between runs.
3. Need for Good Randomness: Poor random number generators can degrade
performance or security.

ml1

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

ml1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ml1

Uploaded by

Copyright:

Available Formats

UNIT-1

#Algorithms and Machine Learning

 Key Equation: Uses the sigmoid function: P(y=1∣x)=11+e−zP(y=1|x) = \

Key Characteristics of Randomized Algorithms

Types of Randomized Algorithms

Applications of Randomized Algorithms

o Expected Time Complexity: O(nlog⁡n)O(n \log n)O(nlogn).

o Time Complexity: O(n2log⁡n)O(n^2 \log n)O(n2logn).

Challenges with Randomized Algorithms

You might also like