AI - W6L12

ARTIFICIAL INTELLIGENCE
WEEK 06
LECTURE 12
TOPICS TO COVER IN THIS LECTURE
• Hypothesis class
• Feature extraction + learning
• Non-linear/flexible functions
• Piece wise constant functions
• Basics of neural nets,
• Feature learning via neural networks
INTRODUCTION TO HYPOTHESIS CLASS
 A hypothesis class refers to the set of all possible functions that can be used to model a relationship
between input features and output labels.
 Importance: Understanding the hypothesis class helps in selecting appropriate models and assessing
their performance. It also impacts generalization abilities—how well the model performs on unseen
data.
 Key Concepts:
o Bias-Variance Trade-off: Balancing model complexity (variance) and accuracy (bias) to prevent
overfitting or underfitting.
o Overfitting vs. Underfitting: Overfitting occurs when a model learns noise; underfitting
happens when it’s too simple to capture underlying patterns.
INTRODUCTION TO HYPOTHESIS CLASS
 The key notion is that of a hypothesis class, which is the set of all possible predictors that you can
get by varying the weight vector w. Thus, the feature extractor specifies a hypothesis class F. This
allows us to take data and learning out of the picture.
FEATURE EXTRACTION
 Feature extraction is the process of transforming raw data into a structured format that can be used
for machine learning models.
 Importance: Good feature extraction enhances model performance, reduces noise, and simplifies
the computational process.
 Example: In image recognition, converting RGB pixel values into a set of features like edges or
shapes that are more representative of the objects in the image.
FEATURE EXTRACTION
• First, we perform feature extraction (given domain knowledge) to specify a hypothesis class F. Second, we
perform learning (given training data) to obtain a particular predictor fw belonging to F.
• Note that if the hypothesis class doesn't contain any good predictors, then no amount of learning can help. So the
question when extracting features is really whether they are powerful enough to express predictors which are
good. It's okay and expected that F will contain a bunch of bad ones as well.
In the context of email classification, it involves
identifying and quantifying relevant information
from emails to create a feature vector.
Steps in Feature Extraction for Email
Classification:
1.Text Processing:
1.Tokenization: Split the email text into words or
tokens.
2.Lowercasing: Convert all text to lowercase to
ensure uniformity.
3.Removing Stop Words: Filter out common
Feature Selection: Choose relevant features that may
indicate whether an email is spam or not:
•Count Features:
• Number of occurrences of specific keywords (e.g.,
"free," "offer").
• Number of links in the email.
• Number of exclamation marks.
•Length Features:
• Total length of the email in characters.
•Binary Features:
• Presence of certain keywords (e.g., "buy now," "limited
time") as a binary feature (1 if present, 0 if not).
•Metadata Features:
• Sender's domain (e.g., is it a known spam domain?).
Feature extraction transforms raw email data into a
structured numerical format, while the hypothesis class
defines the set of functions the learning algorithm uses to
make classifications based on those features. y. First, we
perform feature extraction (given domain knowledge) to
specify a hypothesis class . Second, we perform learning
TECHNIQUES OF FEATURE EXTRACTION
 Statistical Methods:
o Principal Component Analysis (PCA): Reduces dimensionality by transforming data to a new

set of variables (principal components) that capture the most variance.
o Linear Discriminant Analysis (LDA): A technique used to find a linear combination of features
that best separate two or more classes.
o Independent Component Analysis (ICA): Decomposes multivariate signals into additive,

independent components.
 Domain-Specific Methods:
o Text: Utilizing TF-IDF to weigh the importance of words in a document.
o Image: Techniques like SIFT (Scale-Invariant Feature Transform) to detect and describe local
features in images.
LEARNING AND HYPOTHESIS CLASSES
 Learning Process: The learning algorithm iteratively adjusts parameters to minimize the difference
between predicted outputs and actual data.
NON-LINEAR/FLEXIBLE FUNCTIONS
 Non-linear functions can model complex relationships and are essential for capturing the intricacies
of real-world data.
 Non-linear functions are mathematical functions that do not produce a straight line
when graphed.
 In contrast to linear functions, which have a constant rate of change, non-linear
functions can change their slope and curvature, making them capable of modeling
complex relationships.
 Examples:
o Polynomials: Functions like can fit curves.

o Sigmoid Functions: Often used in logistic regression and neural networks, providing a smooth gradient.
o Radial Basis Functions: Used in kernel methods to transform data into higher dimensions.
Common Non-linear Models:
•Decision Trees: They partition the feature space into regions

based on feature values, allowing for complex decision
boundaries.
•Support Vector Machines (with non-linear kernels): These

can create non-linear decision boundaries in high-dimensional
space.
•Neural Networks: With activation functions like ReLU or

sigmoid, they can learn highly non-linear mappings from inputs to
outputs.
PIECEWISE CONSTANT FUNCTIONS
 These functions consist of segments where each segment has a constant value, allowing for abrupt
changes in output.
 A piecewise constant function is a type of function that is defined by different
constant values over specific intervals of its domain. In other words, the function
remains constant within each interval, but can take on different constant values in
different intervals.
 Use Cases: Frequently used in decision trees, which split the input space into regions, each assigned a
constant output.
 Mathematical Representation:
PIECEWISE CONSTANT FUNCTIONS
These functions can be used for
• Simplification of image data
• Feature extraction
• Efficient representation
• Training machine learning models
AN EXAMPLE TASK
AN EXAMPLE TASK
Using a piecewise constant function to detect whether the second message in
a conversation is a reply to the first message can be a useful approach.
Feature Engineering
First, you need to define the features that will help determine if the second
message is a reply to the first. You can categorize these features into different
intervals or conditions, which can then be represented by a piecewise
constant function.
Features to Consider
1.Textual Similarity:
1.Cosine Similarity: Measure the similarity between the embeddings of
the two messages.
2.Jaccard Similarity: Compare the overlap of words or phrases in both
messages.
2.Contextual Cues:
1.Keywords: Check for presence of reply indicators (e.g., "reply", "re:",
"to:").
2.Thread Indicators: Look for patterns in the message formatting that
suggest replies.
MOTIVATION OR NEURAL
NETWORKS
In the spirit of machine learning, we'd like to automate things as
much as possible. In this context, it means creating algorithms
that can take whatever crude features we have and turn them
into predictions, thereby shifting the burden of feature extraction
and moving it to learning.
Neural networks have been around for many decades, but they
fell out of favor because they were difficult to train. In the last
decade, there has been a huge resurgence of interest in neural
networks since they perform so well and training seems to not be
such an issue when you have tons of data and compute.
In a sense, neural networks allow one to automatically learn the

NE U RA L N E T W O R K S
• Neural Networks are computational models that mimic the

complex functions of the human brain.
• The neural networks consist of interconnected nodes or neurons
that process and learn from data, enabling tasks such as pattern
recognition and decision making in machine learning.
• Neural networks extract identifying features from data, lacking
pre-programmed understanding. Network components include
neurons, connections, weights, biases, propagation functions,
and a learning rule. Neurons receive inputs, governed by
thresholds and activation functions. Connections involve weights
and biases regulating information transfer. Learning, adjusting
weights and biases, occurs in three stages: input computation,
output generation, and iterative refinement enhancing the
These include:
1.The neural network is simulated
by a new environment.
2.Then the free parameters of the
neural network are changed as a
result of this simulation.
3.The neural network then responds
in a new way to the environment
because of the changes in its free
parameters.
TWO OR MORE HIDDEN LAYERS DEEP NEURAL NETWORKS
BASICS OF NEURAL NETWORKS
 Structure:
o Neurons: Basic units that receive inputs, apply a weight, and produce an output
through an activation function.
o Layers: Composed of input, hidden, and output layers where transformations
occur.
 Activation Functions: Crucial for introducing non-linearity.
The Iris dataset is a classic dataset in machine learning, commonly used for
classification tasks. It contains measurements of iris flowers from three
different species, with four features for each flower. Here’s how neurons
would represent values for the Iris dataset:
Features of the Iris Dataset
The dataset has the following four features (inputs) for each sample:
1.Sepal Length: The length of the sepal in centimeters.
2.Sepal Width: The width of the sepal in centimeters.
3.Petal Length: The length of the petal in centimeters.
4.Petal
• Example Width: The width of the petal in centimeters.
of Classification
• For a multi-class classification using a neural network on the Iris dataset:
• Input Layer: Four input neurons, one for each feature.
• Hidden Layers: Several hidden layers with a certain number of neurons,
applying weights and activation functions.
• Output Layer: Three output neurons, each representing one of the iris species.
The output from these neurons would be the probabilities of the input sample
belonging to each species, computed using the softmax activation function.
MATHEMATICAL FOUNDATIONS OF NEURAL NETWORKS
 Feedforward Process:
o Each neuron in a layer computes a weighted sum of inputs and passes it through
an activation function:
 Activation Function:
LOSS FUNCTION IN NEURAL NETWORKS
 A loss function quantifies the difference between the predicted output and the true output,
guiding the optimization process.
 Common Loss Functions:
o Mean Squared Error (MSE):
o Used for regression tasks, penalizing larger errors more heavily.
o Cross-Entropy Loss:
o Used in classification tasks, measuring the performance of a model whose output is a

probability value between 0 and 1.
TRAINING NEURAL NETWORKS
 Process Overview:
o Forward Pass: Compute predictions using current weights and
biases.
o Loss Calculation: Assess the difference between predictions and
actual values using a loss function.
o Backpropagation: Compute gradients of loss concerning weights to
optimize them.
 Gradient Descent Update Rule:
FEATURE LEARNING VIA NEURAL NETWORKS
 Neural networks can learn features automatically from raw data, reducing the need for manual feature extraction.
 Example: In CNNs (Convolutional Neural Networks), layers learn to detect edges, shapes, and textures, building
increasingly complex features.
CONVOLUTIONAL NEURAL NETWORKS (CNNS)
 CNNs are designed specifically for processing grid-like data, such as images, utilizing local patterns.
 Key Components:
o Convolutional Layers: Apply filters to capture local features.
o Pooling Layers: Down sample feature maps, reducing dimensionality while retaining important
information.
o Fully Connected Layers: At the end, combine features to make predictions.
RECURRENT NEURAL NETWORKS (RNNS)
 Definition: RNNs are designed for sequential data, where the order of inputs matters, making them
suitable for tasks like language modeling.
 Applications: Used in natural language processing, speech recognition, and time-series forecasting.
ADVANTAGES OF NEURAL NETWORKS
 Flexibility: Neural networks can model complex, non-linear relationships that traditional models
might miss.
 Scalability: They are capable of handling large datasets and can be trained on distributed systems.
 Performance: Many state-of-the-art results in various domains (computer vision, NLP) are achieved
using neural networks.
CHALLENGES IN NEURAL NETWORKS
 Overfitting: When a model learns noise and details to the extent that it negatively impacts
performance on new data.
o Mitigation Techniques: Regularization, dropout, and early stopping.
 Training Time: Deep networks require substantial computational resources and time for training.
 Hyperparameter Tuning: Finding the right combination of learning rates, batch sizes, and network
architectures can be complex and time-consuming.
F U T U R E D I R E C T I O N S I N F E AT U R E L E A R N I N G A N D N E U R A L N E T W O R K S
 Transfer Learning: Reusing pre-trained models on similar tasks, significantly reducing training
time and resource requirements.
 Generative Models: Explore Generative Adversarial Networks (GANs) and Variational

Autoencoders (VAEs) for unsupervised feature learning.
 Explainable AI: Developing methods to understand and interpret how models make decisions,
which is crucial for trust and transparency in AI applications.
QUESTIONS?

AI - W6L12

Uploaded by

Copyright:

Available Formats

AI - W6L12

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI - W6L12

Uploaded by

Copyright:

Available Formats

ARTIFICIAL INTELLIGENCE

o Principal Component Analysis (PCA): Reduces dimensionality by transforming data to a new

o Independent Component Analysis (ICA): Decomposes multivariate signals into additive,

o Text: Utilizing TF-IDF to weigh the importance of words in a document.

o Polynomials: Functions like can fit curves.

•Decision Trees: They partition the feature space into regions

•Support Vector Machines (with non-linear kernels): These

•Neural Networks: With activation functions like ReLU or

In a sense, neural networks allow one to automatically learn the

• Neural Networks are computational models that mimic the

o Mean Squared Error (MSE):

o Used for regression tasks, penalizing larger errors more heavily.

o Used in classification tasks, measuring the performance of a model whose output is a

o Convolutional Layers: Apply filters to capture local features.

o Fully Connected Layers: At the end, combine features to make predictions.

o Mitigation Techniques: Regularization, dropout, and early stopping.

 Generative Models: Explore Generative Adversarial Networks (GANs) and Variational

You might also like