Introduction to
Artificial Intelligent
(AI)
4. Learning
Algorithms
Motivation
Real world example:
• Fish packing plant: separate sea
bass from salmon using optical
sensing
• Features: Physical differences such
as length, lightness, width, number
and shape of fins, position of the
mouth
• Noise: variations in lighting,
position of the fish on the conveyor,
“static” due to the electronics of 2
Motivation
Histograms for the length feature for the two categories
3
Motivation
Histograms for the lightness feature for the two categories
4
decision boundary
The two features of lightness and width for sea bass and salmon
How would our system automatically determine the decision boundary? 5
Loss
Loss is the function of error over training data
Error is the difference between a single actual value and a single predicted
value 6
Loss
Regression loss functions
Python code
import numpy as np
def rmse(predictions, targets):
differences = predictions - targets
differences_squared = differences ** 2
mean_of_differences_squared =
Mean square loss differences_squared.mean()
rmse_val = np.sqrt(mean_of_differences_squared)
return rmse_val
Python code
import numpy as np
def mae(predictions, targets):
differences = predictions - targets
absolute_differences = np.absolute(differences)
mean_absolute_differences =
Mean absolute loss absolute_differences.mean()
return mean_absolute_differences 7
Learning algorithms
1969: Strassen's 1950s - 1970s: Early Foundations
algorithm for 1957: Perceptron (Frank Rosenblatt) – One of the earliest neural networks
matrix designed for binary classification.
multiplication
1960s: K-nearest neighbors (KNN) – A simple instance-based learning
method developed for classification tasks.
1980s: The Rise of Neural Networks
1980: Multi-layer Perceptron training by Backpropagation – Developed by
Paul Werbos, later popularized in the 1980s for training neural networks.
1990s: Advancements in Ensemble Methods and Optimization
All algorithms 1995: Random Forest (Leo Breiman) – A decision tree-based ensemble
are implemented learning technique that reduces overfitting.
on the CPU
1995: Support Vector Machines gain practical relevance with the advent of
kernel methods.
8
Learning algorithms
2000s: Kernel Methods and Probabilistic Models
2006: NVIDIA
release CUDA 2001: Adaboost – An adaptive boosting method developed by Yoav Freund and
Robert Schapire.
2009: Andrew Ng
utilized GPUs to 2010s: Deep Learning Revolution
accelerate the 2012: AlexNet (Krizhevsky et al.) – A deep convolutional neural network that
training of large won the ImageNet competition, leading to breakthroughs in computer vision.
neural networks 2014: Generative Adversarial Networks (GANs) (Ian Goodfellow et al.) –
Introduced a new framework for generating synthetic data through adversarial
learning.
2017: Transformers (Vaswani et al.) – Revolutionized natural language
processing (NLP) by eliminating the need for recurrent neural networks.
2020s: Scalable AI and Further Innovations
2020: GPT-3 (OpenAI) – A large-scale transformer-based model demonstrating
significant progress in language understanding and generation.
9
Random forest
10
Adaboost
decision stumps or decision trees
‘Boosting’ : a family of algorithms which converts weak learners to
strong learners.
𝑛
𝐻 ( 𝑥 ) =𝑠𝑖𝑔𝑛 ∑ 𝛼 𝑖 h𝑖 ( 𝑥 )
𝑖=1
: learners
: weight of the leaner
11
Adaboost
12
Adaboost
Weak learners for image recognition
Haar filters
Common features 160,000+ possible features
associated with each 24 x 24 window
13
Cascade filter
14
Cascade filter
Prepare data
Negative Images Positive Images
images which do not contain the target object images which contain the target object
A proportion of 2:1 or higher between negative and positive samples is considered accept
15
Cascade filter
16
Biological Neurons
◉ A typical biological neuron is composed of:
○ A cell body;
○ Dendrites: input channels
○ Axon: output cable; it usually branches.
17
Biological Neurons
◉ The major job of neuron:
○ receives information, usually in the form of electrical pulses, from
many other neurons.
○ sum these inputs in a complex dynamic way
○ sends out information in the form of a stream of electrical impulses
down its axon and on to many other neurons.
○ The connections (synapses) are crucial for excitation, inhibition or
modulation of the cells.
○ Learning is possible by adjusting the synapses!
How to build a mathematical model of the neuron?
18
Model of a neuron
◉ Simplest model
inputs
outputs
System
19
Model of a neuron
Input x
outputs
System
𝒎
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑠h𝑖𝑝 𝒚 = ∑ 𝒙 𝒊
𝒊=𝟏
But:
• The neuron only fires when it is sufficiently
excited
• Firing rate has an upper bound
20
Model of a neuron
◉ Modified model:
○ b: threshold bias → the neuron will not fire till it “high” enough.
Based upon this model, is it possible for the inputs to
inhibit the activation of the neuron?
The synaptic weights!
21
Model of a neuron
𝒎
𝒖𝒌 = ∑ 𝒘 𝒊 𝒙 𝒊 𝒚 𝒌 =𝝋(𝒖𝒌 +𝒃𝒌 )
𝒊=𝟏
𝒗 𝒌 =𝒖𝒌 +𝒃𝒌
22
Model of a Neuron
◉ Three basic components for the model of a neuron:
○ A set of synapses or connecting links: characterized by a weight or
strength of its own.
○ An adder for summing the input signals, weighted by the
respective synapses of the neuron (a linear combiner).
○ An activation function for limiting the amplitude of the neuron
output
◉ Mathematical model:
𝒎
𝒖𝒌 = ∑ 𝒘 𝒊 𝒙 𝒊 𝒚 𝒌 =𝝋(𝒖𝒌 +𝒃𝒌 )
𝒊=𝟏
23
Type of Activation (Squash) Functions
◉ Threshold function
(McCulloch-Pitts model -
1943)
◉ Piecewise-linear function
24
Type of Activation (Squash) Functions
◉ Logistic function:
◉ Hyperbolic tangent
function:
25
Type of Activation (Squash) Functions
◉ Gaussian functions
26
Network Architectures
◉ Network architecture defines how nodes are connected.
27
Learning in neural networks
◉ Learning is a process by which the free parameters of a
neural network are adapted through a process of
stimulation by the environment in which the network is
embedded.
◉ Process of learning:
○ The NN is stimulated by an environment
○ The NN undergoes changes in its free parameters as a result of
this stimulation.
○ The NN responds in a new way to the environment because of the
changes that have occurred in its internal structure.
How can the network adjust the weights?
28
Simplest neural network: Perceptron
◉ Perceptron is built around the McCulloch-Pitts model.
29
Perceptron
◉ Goal: To correctly classify the set of externally
applied stimuli x1, x2,…, xn into one of two
classes, C1 and C2.
the input vector: the weight
vector:
Where n denotes the iteration step
Perceptron
◉ Output of the neuron y(n)
◉ What is the decision boundary?
31
Decision boundary
Decision boundary
◉ m = 1: ?
◉ m=2?
◉ m=2?
◉ How to choose the
proper weights?
32
Selection of weights
Two basic methods can be employed to select a suitable weight
vector
◉ By off-line calculation of weights (without learning).
○ Possible if the system is relative simple
◉ By learning procedure
○ The weight vector is determined from a given
(training) set of input-output vectors (exemplars) in
such a way to achieve the best classification of the
training vectors
33
Off-line calculation of weights
Example
Truth table of NAND
Three points (0,0), (0,1) and
(1,0) belong to one class. And (1,1)
belong to another class.
The decision boundary is the straight line described by the following
equation
x1 + x2 = 1.5 or − x1 − x2 + 1.5 = 0
w = (1.5, −1, − 1)
Is the decision line unique for this problem?
34
Perceptron Learning
◉ if C1 and C2 are linearly separable, there exist weight
vector such that:
Given a training set () where
◉ Training target: Find a weight vector w such that the
perceptron can correctly classify the training set X.
35
Perceptron Learning
◉ Feed a pattern x to the perceptron with weight vector w, it
will produce a binary output y (1 or 0). First consider the
case,
◉ If the correct label (all the labels of the training samples are
known) is d=0; should we update the weights?
◉ If the desired output is d=1, assume the new weight vector
is w’, then we have:
◉ But how to choose Δw ?
36
Perceptron Learning
◉ if the true label is d=1, and the perceptron makes a
mistake, its synaptic weights are adjusted by
37
Perceptron Learning
◉ consider the case,
◉ only adjust the weights when the perceptron makes
mistakes (d=0)
◉ If the true label is d=0, and the perceptron makes a
mistake, its synaptic weights are adjusted by
38
Perceptron Learning
◉ To unify this algorithm
○ consider the error signal: e=d-y
○ the error signal when d=1: e=1-0=1
○ the error signal when d=0: e=0-1=-1
◉ Then
39
Perceptron Learning
◉ Algorithm Perceptron
Start with a randomly chosen weight vector w(1);
while there exist input vectors that are misclassified by w(n)
Do Let x(n) be a misclassified input vector;
Update the weight vector to
Increment n
end-while
40
Perceptron Learning
◉ Example: Let us consider a simple classification problem
where the input space is one-dimensional space, i.e., a real
line:
○ Class 1 (d = 1) : x = 0.5, 2
○ Class 2 (d = 0) : x = -1, -2
◉ Solution:
41
Perceptron Learning
42
Perceptron Convergence Theorem
◉ Perceptron Convergence Theorem:
If C1 and C2 are linearly separable, after a finite number
of steps, the weights stop changing
43
Multilayer Perceptrons
◉ Multilayer perceptrons
(MLPs)
○ Generalization of the
single-layer perceptron
◉ Consists of
○ An input layer
○ One or more hidden
layers of computation
nodes
○ An output layer of
computation nodes
◉ Architectural graph of
a multilayer
perceptron with two
hidden layers:
44
Backpropagation
45
Data Augmentation
Data Augmentation is a technique that is used to create new artificial data from
already existing data sets
Motivation
Underfitting Overfitting
The model works well with the training When a model is trained with lots of data, it
data but performs poorly with the testing starts to pick up data from the noise and
data incorrect data entries.
Reasons: Reasons:
• Low variation of data and highly • High variation of the data and low bias.
biased model. • Model created is too complex and
• Model developed can’t handle advanced.
complex data. • The size of the training data is high.
• Small size of the training dataset.
• Training data is of poor quality
containing noise.
46
Data Augmentation
Data augmentation methods
Geometric Transformation Color Transformation AI Generative
• Flipping • Brightness • Generative
• Cropping • Darkness Adversarial
Networks
• Rotating • Sharpness
• Variation Auto-
• Zooming • Saturation Encoders
• Color Augmentation • Neural Style
Transfer
47
Benefits of Neural Networks
◉ High computational power
○ Generalization : Producing reasonable outputs for inputs not
encountered during training (learning).
○ Has a massively parallel distributed structure.
◉ Useful properties and capabilities
○ Nonlinearity : Most physical systems are nonlinear
○ Adaptivity (plasticity): Has built-in capability to adapt their
synaptic weights to changes in the environment
○ Fault tolerance : If a neuron or its connecting links are damaged,
the overall response may still be ok (due to the distributed nature
of information stored in a network).
48
Limitation Neural Network
◉ Fully connected -> different from biological neuron
◉ Input size is enormous
◉ Can’t share weight
First Covolution neural network
AlexNet (2012)
CNN layers
Convolution layer
Convolution layer
Convolution layer
Activation Layer
• Activation derivative saturate at 0 deeper layer weight
unchange
Pooling layer
58