0% found this document useful (0 votes)

26 views15 pages

Deep Learning Module-02 Search Creators

The document provides an overview of feedforward neural networks, detailing their architecture, activation functions, and gradient-based learning methods. It covers historical context, network components like input, hidden, and output layers, as well as concepts such as backpropagation, regularization techniques, and practical implementation strategies. Additionally, it includes exercises and common issues with solutions related to deep learning.

Uploaded by

Rajveer Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views15 pages

Deep Learning Module-02 Search Creators

Uploaded by

Rajveer Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

21CS743 | DEEP LEARNING | SEARCH CREATORS.

Module-02

Feedforward Networks and Deep Learning

Introduction to Feedforward Neural Networks

1.1 Basic Concepts

• A feedforward neural network is the simplest form of artificial neural network (ANN)

• Information moves in only one direction: forward, from input nodes through hidden nodes
to output nodes

• No cycles or loops exist in the network structure

1.2 Historical Context

1. Origins

o Inspired by biological neural networks

o First proposed by Warren McCulloch and Walter Pitts (1943)

o Significant advancement with perceptron by Frank Rosenblatt (1958)

2. Evolution

o Single-layer to multi-layer networks

o Development of backpropagation in 1986

o Modern deep learning revolution (2012-present)

Search Creators... Page 1

21CS743 | DEEP LEARNING | SEARCH CREATORS.

1.3 Network Architecture

1. Input Layer

o Receives raw input data

o No computation performed

o Number of neurons equals number of input features

o Standardization/normalization often applied here

2. Hidden Layers

o Performs intermediate computations

o Can have multiple hidden layers

o Each neuron connected to all neurons in previous layer

Search Creators... Page 2

21CS743 | DEEP LEARNING | SEARCH CREATORS.

o Feature extraction and transformation occur here

3. Output Layer

o Produces final network output

o Number of neurons depends on problem type

o Classification: typically one neuron per class

o Regression: usually one neuron

1.4 Activation Functions

1. Sigmoid (Logistic)

o Formula: σ(x) = 1/(1 + e^(-x))

o Range: [0,1]

o Used in binary classification

o Properties:

▪ Smooth gradient

▪ Clear prediction probability

▪ Suffers from vanishing gradient

2. Hyperbolic Tangent (tanh)

o Formula: tanh(x) = (e^x - e^(-x))/(e^x + e^(-x))

o Range: [-1,1]

o Often performs better than sigmoid

o Properties:

▪ Zero-centered

Search Creators... Page 3

21CS743 | DEEP LEARNING | SEARCH CREATORS.

▪ Stronger gradients

▪ Still has vanishing gradient issue

3. ReLU (Rectified Linear Unit)

o Formula: f(x) = max(0,x)

o Most commonly used

o Helps solve vanishing gradient problem

o Properties:

▪ Computationally efficient

▪ No saturation in positive region

▪ Dying ReLU problem

4. Leaky ReLU

o Formula: f(x) = max(0.01x, x)

o Addresses dying ReLU problem

o Small negative slope

o Properties:

▪ Never completely dies

▪ Allows for negative values

▪ More robust than standard ReLU

2. Gradient-Based Learning

2.1 Understanding Gradients

1. Definition

Search Creators... Page 4

21CS743 | DEEP LEARNING | SEARCH CREATORS.

o Gradient is a vector of partial derivatives

o Points in direction of steepest increase

o Used to minimize loss function

2. Properties

o Direction indicates fastest increase

o Magnitude indicates steepness

o Negative gradient used for minimization

2.2 Cost Functions

1. Mean Squared Error (MSE)

o Used for regression problems

o Formula: MSE = (1/n)Σ(y_true - y_pred)²

o Properties:

▪ Always positive

▪ Penalizes larger errors more

▪ Differentiable

2. Cross-Entropy Loss

o Used for classification problems

o Formula: -Σ(y_true * log(y_pred))

o Properties:

▪ Measures probability distribution difference

▪ Better for classification than MSE

Search Creators... Page 5

21CS743 | DEEP LEARNING | SEARCH CREATORS.

▪ Provides stronger gradients

3. Huber Loss

o Combines MSE and MAE

o Less sensitive to outliers

o Formula:

▪ L = 0.5(y - f(x))² if |y - f(x)| ≤ δ

▪ L = δ|y - f(x)| - 0.5δ² otherwise

2.3 Gradient Descent Types

1. Batch Gradient Descent

o Uses entire dataset for each update

o More stable but slower

o Formula: θ = θ - α∇J(θ)

o Memory intensive for large datasets

2. Stochastic Gradient Descent (SGD)

o Updates parameters after each sample

o Faster but less stable

o Better for large datasets

o High variance in parameter updates

3. Mini-batch Gradient Descent

o Compromise between batch and SGD

o Updates parameters after small batches

Search Creators... Page 6

21CS743 | DEEP LEARNING | SEARCH CREATORS.

o Most commonly used in practice

o Typical batch sizes: 32, 64, 128

4. Advanced Optimizers a) Adam (Adaptive Moment Estimation)

o Combines momentum and RMSprop

o Adaptive learning rates

o Formula includes first and second moments

b) RMSprop

o Adaptive learning rates

o Divides by running average of gradient magnitudes

c) Momentum

o Adds fraction of previous update

o Helps escape local minima

o Reduces oscillation

3. Backpropagation and Chain Rule

3.1 Chain Rule Fundamentals

1. Mathematical Basis

o df/dx = df/dy * dy/dx

o Allows computation of composite function derivatives

o Essential for neural network training

2. Application in Neural Networks

o Computes gradients layer by layer

Search Creators... Page 7

21CS743 | DEEP LEARNING | SEARCH CREATORS.

o Propagates error backwards

o Updates weights based on contribution to error

3.2 Forward Pass

1. Input Processing

o Data normalization

o Weight initialization

o Bias addition

2. Layer Computation

python

Copy

# Pseudo-code for forward pass

for layer in network:

Z = W * A + b # Linear transformation

A = activation(Z) # Apply activation function

3. Output Generation

o Final layer activation

o Prediction computation

o Error calculation

3.3 Backward Pass

1. Error Calculation

o Compare output with target

Search Creators... Page 8

21CS743 | DEEP LEARNING | SEARCH CREATORS.

o Calculate loss using cost function

o Initialize gradient computation

2. Weight Updates

o Calculate gradients using chain rule

o Update weights: w_new = w_old - learning_rate * gradient

o Update biases similarly

3. Detailed Steps

python

Copy

# Pseudo-code for backward pass

# Output layer

dZ = A - Y # For MSE

dW = (1/m) * dZ * A_prev.T

db = (1/m) * sum(dZ)

# Hidden layers

dZ = dA * activation_derivative(Z)

dW = (1/m) * dZ * A_prev.T

db = (1/m) * sum(dZ)

4. Regularization for Deep Learning

4.1 L1 Regularization

Search Creators... Page 9

21CS743 | DEEP LEARNING | SEARCH CREATORS.

1. Mathematical Form

o Adds absolute value of weights to loss

o Formula: L1 = λΣ|w|

o Promotes sparsity

2. Properties

o Feature selection capability

o Produces sparse models

o Less sensitive to outliers

4.2 L2 Regularization

1. Mathematical Form

o Adds squared weights to loss

o Formula: L2 = λΣw²

o Prevents large weights

2. Properties

o Smooth weight decay

o No sparse solutions

o More stable training

4.3 Dropout

1. Basic Concept

o Randomly deactivate neurons

o Probability p of keeping neurons

Search Creators... Page 10

21CS743 | DEEP LEARNING | SEARCH CREATORS.

o Different network for each training batch

2. Implementation Details

python

Copy

# Pseudo-code for dropout

mask = np.random.binomial(1, p, size=layer_size)

A = A * mask

A = A / p # Scale to maintain expected value

3. Training vs. Testing

o Used only during training

o Scaled appropriately during inference

o Acts as model ensemble

4.4 Early Stopping

1. Implementation

o Monitor validation error

o Save best model

o Stop when validation error increases

2. Benefits

o Prevents overfitting

o Reduces training time

o Automatic model selection

Search Creators... Page 11

21CS743 | DEEP LEARNING | SEARCH CREATORS.

5. Advanced Concepts

5.1 Batch Normalization

1. Purpose

o Normalizes layer inputs

o Reduces internal covariate shift

o Speeds up training

2. Algorithm

python

Copy

# Pseudo-code for batch normalization

mean = np.mean(x, axis=0)

var = np.var(x, axis=0)

x_norm = (x - mean) / np.sqrt(var + ε)

out = gamma * x_norm + beta

5.2 Weight Initialization

1. Xavier/Glorot Initialization

o Variance = 2/(nin + nout)

o Suitable for tanh activation

2. He Initialization

o Variance = 2/nin

o Better for ReLU activation

Search Creators... Page 12

21CS743 | DEEP LEARNING | SEARCH CREATORS.

6. Practical Implementation

6.1 Network Design Considerations

1. Architecture Choices

o Number of layers

o Neurons per layer

o Activation functions

2. Hyperparameter Selection

o Learning rate

o Batch size

o Regularization strength

6.2 Training Process

1. Data Preparation

o Splitting data

o Normalization

o Augmentation

2. Training Loop

o Forward pass

o Loss computation

o Backward pass

o Parameter updates

Practice Problems and Exercises

Search Creators... Page 13

21CS743 | DEEP LEARNING | SEARCH CREATORS.

1. Basic Concepts

o Explain the role of activation functions in neural networks

o Compare and contrast different types of gradient descent

o Describe the vanishing gradient problem

2. Mathematical Problems

o Calculate gradients for a simple 2-layer network

o Implement batch normalization equations

o Compute different loss functions

3. Implementation Challenges

o Design a network for MNIST classification

o Implement dropout in Python

o Create a custom loss function

Key Formulas Reference Sheet

1. Activation Functions

o Sigmoid: σ(x) = 1/(1 + e^(-x))

o tanh(x) = (e^x - e^(-x))/(e^x + e^(-x))

o ReLU: f(x) = max(0,x)

2. Loss Functions

o MSE = (1/n)Σ(y_true - y_pred)²

o Cross-Entropy = -Σ(y_true * log(y_pred))

3. Regularization

Search Creators... Page 14

21CS743 | DEEP LEARNING | SEARCH CREATORS.

o L1 = λΣ|w|

o L2 = λΣw²

4. Gradient Descent

o Update: w = w - α∇J(w)

o Momentum: v = βv - α∇J(w)

Common Issues and Solutions

1. Vanishing Gradients

o Use ReLU activation

o Implement batch normalization

o Try residual connections

2. Overfitting

o Add dropout

o Use regularization

o Implement early stopping

3. Poor Convergence

o Adjust learning rate

o Try different optimizers

o Check data normalization

Search Creators... Page 15

Module 2
No ratings yet
Module 2
12 pages
Deep Learning Module-02
No ratings yet
Deep Learning Module-02
15 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
02 Neural Networks
No ratings yet
02 Neural Networks
32 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
activation function
No ratings yet
activation function
6 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
FDL_Module1
No ratings yet
FDL_Module1
102 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
4 Neural Networks
No ratings yet
4 Neural Networks
31 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Unit 4
No ratings yet
Unit 4
19 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Lecture 2 - Neural Network v1.0
No ratings yet
Lecture 2 - Neural Network v1.0
64 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Day 2 - Loss & Activation Functions
No ratings yet
Day 2 - Loss & Activation Functions
8 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Slides 11
No ratings yet
Slides 11
48 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
15 Deep
No ratings yet
15 Deep
39 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
L7 Lecture Image - classification.DNN v4
No ratings yet
L7 Lecture Image - classification.DNN v4
61 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Bengali Speech Sentiment Analysis Using Machine Learning Models A Comparative Study
No ratings yet
Bengali Speech Sentiment Analysis Using Machine Learning Models A Comparative Study
6 pages
Deep Learning Courses - Coursera
No ratings yet
Deep Learning Courses - Coursera
3 pages
UT Austin Texas PGP AIML Brochure
No ratings yet
UT Austin Texas PGP AIML Brochure
18 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
6 pages
Document
No ratings yet
Document
7 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
26 pages
Breast Cancer Detection Using Deep Learning: February 2023
No ratings yet
Breast Cancer Detection Using Deep Learning: February 2023
12 pages
DeepLearning Ebook FINAL PDF
No ratings yet
DeepLearning Ebook FINAL PDF
17 pages
ML With Python
No ratings yet
ML With Python
6 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
16 pages
CS60010: Deep Learning: Recurrent Neural Network
No ratings yet
CS60010: Deep Learning: Recurrent Neural Network
44 pages
Deep Learning Applications and Image Processing
No ratings yet
Deep Learning Applications and Image Processing
5 pages
Feature Engineering PDF
No ratings yet
Feature Engineering PDF
19 pages
AI Course Outline
No ratings yet
AI Course Outline
4 pages
Siraj - School of AI - V1.0 08162018
No ratings yet
Siraj - School of AI - V1.0 08162018
19 pages
Final Year Project Presentation
No ratings yet
Final Year Project Presentation
26 pages
AI11
No ratings yet
AI11
2 pages
Lecture 17-Classification by Backpropagation-M
No ratings yet
Lecture 17-Classification by Backpropagation-M
25 pages
Core I
No ratings yet
Core I
2 pages
Neural Networks Utilization For Oil Spill Classification Using A Sequential CNN Model
No ratings yet
Neural Networks Utilization For Oil Spill Classification Using A Sequential CNN Model
5 pages
Automated MCQ Evaluation Using Deep Learning and Image Segmentation
No ratings yet
Automated MCQ Evaluation Using Deep Learning and Image Segmentation
6 pages
Soft-Computing Lab Manual Using MATLAB
No ratings yet
Soft-Computing Lab Manual Using MATLAB
27 pages
Haneena Jasmine 2021 IOP Conf. Ser. Mater. Sci. Eng. 1114 012012
No ratings yet
Haneena Jasmine 2021 IOP Conf. Ser. Mater. Sci. Eng. 1114 012012
10 pages
A Review of Classification Algorithms For EEG-based Brain-Computer Interfaces: A 10 Year Update
No ratings yet
A Review of Classification Algorithms For EEG-based Brain-Computer Interfaces: A 10 Year Update
29 pages
Introduction To Neural Network - Deep Learning
No ratings yet
Introduction To Neural Network - Deep Learning
17 pages
Self-Organizing Maps (SOM) : Dr. Saed Sayad
No ratings yet
Self-Organizing Maps (SOM) : Dr. Saed Sayad
14 pages
AAM 1st Unit QB
No ratings yet
AAM 1st Unit QB
4 pages
Competitive Learning Extended
No ratings yet
Competitive Learning Extended
35 pages
Kirkpatrick Et Al. - 2017 - Overcoming Catastrophic Forgetting in Neural Networks
No ratings yet
Kirkpatrick Et Al. - 2017 - Overcoming Catastrophic Forgetting in Neural Networks
14 pages
Optical Character Recognition Using Convolutional Neural Network
No ratings yet
Optical Character Recognition Using Convolutional Neural Network
5 pages