AI - W6L12
AI - W6L12
AI - W6L12
WEEK 06
LECTURE 12
TOPICS TO COVER IN THIS LECTURE
• Hypothesis class
• Feature extraction + learning
• Non-linear/flexible functions
• Piece wise constant functions
• Basics of neural nets,
• Feature learning via neural networks
INTRODUCTION TO HYPOTHESIS CLASS
A hypothesis class refers to the set of all possible functions that can be used to model a relationship
between input features and output labels.
Importance: Understanding the hypothesis class helps in selecting appropriate models and assessing
their performance. It also impacts generalization abilities—how well the model performs on unseen
data.
Key Concepts:
o Bias-Variance Trade-off: Balancing model complexity (variance) and accuracy (bias) to prevent
overfitting or underfitting.
o Overfitting vs. Underfitting: Overfitting occurs when a model learns noise; underfitting
happens when it’s too simple to capture underlying patterns.
INTRODUCTION TO HYPOTHESIS CLASS
The key notion is that of a hypothesis class, which is the set of all possible predictors that you can
get by varying the weight vector w. Thus, the feature extractor specifies a hypothesis class F. This
allows us to take data and learning out of the picture.
FEATURE EXTRACTION
Feature extraction is the process of transforming raw data into a structured format that can be used
for machine learning models.
Importance: Good feature extraction enhances model performance, reduces noise, and simplifies
the computational process.
Example: In image recognition, converting RGB pixel values into a set of features like edges or
shapes that are more representative of the objects in the image.
FEATURE EXTRACTION
• First, we perform feature extraction (given domain knowledge) to specify a hypothesis class F. Second, we
perform learning (given training data) to obtain a particular predictor fw belonging to F.
• Note that if the hypothesis class doesn't contain any good predictors, then no amount of learning can help. So the
question when extracting features is really whether they are powerful enough to express predictors which are
good. It's okay and expected that F will contain a bunch of bad ones as well.
In the context of email classification, it involves
identifying and quantifying relevant information
from emails to create a feature vector.
Steps in Feature Extraction for Email
Classification:
1.Text Processing:
1.Tokenization: Split the email text into words or
tokens.
2.Lowercasing: Convert all text to lowercase to
ensure uniformity.
3.Removing Stop Words: Filter out common
Feature Selection: Choose relevant features that may
indicate whether an email is spam or not:
•Count Features:
• Number of occurrences of specific keywords (e.g.,
"free," "offer").
• Number of links in the email.
• Number of exclamation marks.
•Length Features:
• Total length of the email in characters.
•Binary Features:
• Presence of certain keywords (e.g., "buy now," "limited
time") as a binary feature (1 if present, 0 if not).
•Metadata Features:
• Sender's domain (e.g., is it a known spam domain?).
Feature extraction transforms raw email data into a
structured numerical format, while the hypothesis class
defines the set of functions the learning algorithm uses to
make classifications based on those features. y. First, we
perform feature extraction (given domain knowledge) to
specify a hypothesis class . Second, we perform learning
TECHNIQUES OF FEATURE EXTRACTION
Statistical Methods:
o Linear Discriminant Analysis (LDA): A technique used to find a linear combination of features
that best separate two or more classes.
Domain-Specific Methods:
o Image: Techniques like SIFT (Scale-Invariant Feature Transform) to detect and describe local
features in images.
LEARNING AND HYPOTHESIS CLASSES
Learning Process: The learning algorithm iteratively adjusts parameters to minimize the difference
between predicted outputs and actual data.
NON-LINEAR/FLEXIBLE FUNCTIONS
Non-linear functions can model complex relationships and are essential for capturing the intricacies
of real-world data.
Non-linear functions are mathematical functions that do not produce a straight line
when graphed.
In contrast to linear functions, which have a constant rate of change, non-linear
functions can change their slope and curvature, making them capable of modeling
complex relationships.
Examples:
o Radial Basis Functions: Used in kernel methods to transform data into higher dimensions.
Common Non-linear Models:
Neural networks have been around for many decades, but they
fell out of favor because they were difficult to train. In the last
decade, there has been a huge resurgence of interest in neural
networks since they perform so well and training seems to not be
such an issue when you have tons of data and compute.
Structure:
o Neurons: Basic units that receive inputs, apply a weight, and produce an output
through an activation function.
o Layers: Composed of input, hidden, and output layers where transformations
occur.
Activation Functions: Crucial for introducing non-linearity.
The Iris dataset is a classic dataset in machine learning, commonly used for
classification tasks. It contains measurements of iris flowers from three
different species, with four features for each flower. Here’s how neurons
would represent values for the Iris dataset:
Features of the Iris Dataset
The dataset has the following four features (inputs) for each sample:
1.Sepal Length: The length of the sepal in centimeters.
2.Sepal Width: The width of the sepal in centimeters.
3.Petal Length: The length of the petal in centimeters.
4.Petal
• Example Width: The width of the petal in centimeters.
of Classification
• For a multi-class classification using a neural network on the Iris dataset:
• Input Layer: Four input neurons, one for each feature.
• Hidden Layers: Several hidden layers with a certain number of neurons,
applying weights and activation functions.
• Output Layer: Three output neurons, each representing one of the iris species.
The output from these neurons would be the probabilities of the input sample
belonging to each species, computed using the softmax activation function.
BASICS OF NEURAL NETWORKS
BASICS OF NEURAL NETWORKS
BASICS OF NEURAL NETWORKS
BASICS OF NEURAL NETWORKS
MATHEMATICAL FOUNDATIONS OF NEURAL NETWORKS
Feedforward Process:
o Each neuron in a layer computes a weighted sum of inputs and passes it through
an activation function:
Activation Function:
LOSS FUNCTION IN NEURAL NETWORKS
A loss function quantifies the difference between the predicted output and the true output,
guiding the optimization process.
Common Loss Functions:
o Cross-Entropy Loss:
Example: In CNNs (Convolutional Neural Networks), layers learn to detect edges, shapes, and textures, building
increasingly complex features.
CONVOLUTIONAL NEURAL NETWORKS (CNNS)
CNNs are designed specifically for processing grid-like data, such as images, utilizing local patterns.
Key Components:
o Pooling Layers: Down sample feature maps, reducing dimensionality while retaining important
information.
Mathematical Representation:
RECURRENT NEURAL NETWORKS (RNNS)
Definition: RNNs are designed for sequential data, where the order of inputs matters, making them
suitable for tasks like language modeling.
Applications: Used in natural language processing, speech recognition, and time-series forecasting.
Mathematical Representation:
ADVANTAGES OF NEURAL NETWORKS
Flexibility: Neural networks can model complex, non-linear relationships that traditional models
might miss.
Scalability: They are capable of handling large datasets and can be trained on distributed systems.
Performance: Many state-of-the-art results in various domains (computer vision, NLP) are achieved
using neural networks.
CHALLENGES IN NEURAL NETWORKS
Overfitting: When a model learns noise and details to the extent that it negatively impacts
performance on new data.
Training Time: Deep networks require substantial computational resources and time for training.
Hyperparameter Tuning: Finding the right combination of learning rates, batch sizes, and network
architectures can be complex and time-consuming.
F U T U R E D I R E C T I O N S I N F E AT U R E L E A R N I N G A N D N E U R A L N E T W O R K S
Transfer Learning: Reusing pre-trained models on similar tasks, significantly reducing training
time and resource requirements.
Explainable AI: Developing methods to understand and interpret how models make decisions,
which is crucial for trust and transparency in AI applications.
QUESTIONS?