tutorial 1,2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Q. What is deep learning? Explain its types with the help of examples.

Ans: Deep learning is a subset of machine learning, which is a subset of artificial intelligence
that teaches computer to process data in a way that’s inspired by human brain. It uses artificial
neural networks to mimic the way the human brain processes data. It involves layers of
interconnected nodes (neurons) that can automatically learn and extract features from large
amounts of data. Deep learning models excel at handling unstructured data like images, text,
and sound, and they improve their performance as the size of the data increases.
Deep learning typically involves multiple hidden layers in a neural network, enabling it to learn
hierarchical patterns or representations in data.

Types of Deep Learning Models


1. Feedforward Neural Networks (FNNs):
Description: The simplest type of neural network where data moves in one direction—from
the input layer, through the hidden layers, to the output layer.
Example: Used in image classification tasks.
Example Application: Predicting handwritten digits (e.g., MNIST dataset).
2. Convolutional Neural Networks (CNNs):
Description: Designed specifically for processing structured grid data like images. CNNs use
convolutional layers to extract spatial and hierarchical features.
Example: Used in computer vision tasks such as object detection and image recognition.
Example Application: Identifying objects in images (e.g., ImageNet dataset).
3. Recurrent Neural Networks (RNNs):
Description: Designed for sequential data where the output depends on previous
computations. RNNs use loops to retain information from prior steps, making them effective
for time-series or sequence data.
Example: Used in natural language processing (NLP) tasks.
Example Application: Sentiment analysis or text prediction.
4. Generative Adversarial Networks (GANs):
Description: Comprises two networks—a generator and a discriminator—that work in tandem
to create realistic data. The generator creates fake data, while the discriminator evaluates it.
Example: Used in image generation and enhancement tasks.
Example Application: Creating realistic images from noise.
5. Autoencoders:
Description: Unsupervised learning models used to compress and reconstruct data. They are
often used for dimensionality reduction or anomaly detection.
Example: Used in data denoising tasks.
Example Application: Removing noise from images.

Q. Explain the working of feed forward neural network.


Ans: A Feedforward Neural Network (FNN) is the simplest type of artificial neural network in
which the flow of information is unidirectional—from the input layer to the output layer,
without any loops or feedback connections. In this each node in one layer is connected to the
other node in next layer. It is used for various tasks such as classification, regression, and
feature extraction.
Structure of an FNN
1. Input Layer:
o Accepts the raw input data (e.g., numerical features, pixel values of an
image).
o Each node represents one feature of the input data.
2. Hidden Layers:
o Consist of neurons that process inputs using weights, biases, and an
activation function to transform the data non-linearly.
o Multiple hidden layers help the network learn complex patterns.
3. Output Layer:
o Produces the final result (e.g., classification label, probability, or numerical
value).
o The number of neurons in the output layer depends on the type of task
(e.g., 1 neuron for binary classification, multiple neurons for multi-class
classification).
4. Forward Propagation:
• Data flows from the input layer to the output layer through all intermediate
layers. Predictions are computed at the output.
5. Loss Calculation:
• Compares the network's predictions with the actual labels using a loss
function (e.g., mean squared error for regression or cross-entropy for
classification).
6. Training (Backpropagation):
• During training, weights are updated to minimize the loss using an
optimization algorithm (e.g., gradient descent).
• However, backpropagation is outside the "feedforward" process itself.

Q. Explain gradient descent and its types.


Ans: Gradient Descent is an optimization algorithm used to minimize a loss function (or cost
function) by iteratively adjusting the model's parameters (weights and biases) in the direction
of the steepest descent. The algorithm ensures the model learns from errors and improves
performance.
Key Steps in Gradient Descent
1. Initialize Parameters: Start with random or predefined values for weights and
biases.
2. Compute Loss: Use a loss function to measure the difference between predicted
and actual values.
3. Calculate Gradient: Compute the gradient of the loss function with respect to the
parameters.
4. Update Parameters: Adjust the parameters using the formula:
θ =θ−α⋅∇L
o θ: Model parameters (weights, biases).
o α: Learning rate (step size).
o ∇L: Gradient of the loss function.
5. Repeat: Iterate until the loss converges to a minimum or reaches a predefined
threshold.
Types of Gradient Descent
1. Batch Gradient Descent (BGD)
o Description: Uses the entire training dataset to compute gradients for
each update.
o Advantages: Converges smoothly and provides a stable path to the
minimum.
o Disadvantages: Computationally expensive for large datasets.
o Use Case: Small datasets where computational resources are sufficient.
2. Stochastic Gradient Descent (SGD)
o Description: Uses a single randomly selected data point to compute the
gradient at each step.
o Advantages: Faster updates, suitable for large datasets.
o Disadvantages: Highly noisy convergence, might overshoot the minimum.
o Use Case: Real-time systems or large datasets.
3. Mini-Batch Gradient Descent
o Description: Uses a subset (mini-batch) of the training dataset to compute
gradients.
o Advantages: Balances the speed of SGD with the stability of BGD.
o Disadvantages: Requires careful selection of batch size.
o Use Case: Standard in deep learning frameworks due to its efficiency.

Q. Explain different activation and loss function used in deep learning.


Ans: Activation Functions in Deep Learning
Activation functions introduce non-linearity to neural networks, enabling them to model
complex patterns and learn from data. Below is an explanation of commonly used activation
functions in deep learning:
1. Sigmoid Activation Function
• Formula:

• Range: (0, 1)
• Use Case: Typically used in the output layer for binary classification tasks.
• Advantages:
o Outputs values between 0 and 1, making it suitable for probability
interpretation.
• Disadvantages:
o Vanishing Gradients: For very large or small inputs, gradients become very
small, slowing down training.
o Not zero-centered, which can lead to inefficient gradient updates.
2. Tanh (Hyperbolic Tangent)
• Formula:
• Range: (-1, 1)
• Use Case: Often used in hidden layers, especially for sequence-based models like
RNNs.
• Advantages:
o Zero-centered, helping the optimization process by making the data
distribution more balanced.
• Disadvantages:
o Still suffers from the vanishing gradient problem for very large or small
inputs.
3. ReLU (Rectified Linear Unit)
• Formula:

• Range: [0, ∞)
• Use Case: Commonly used in hidden layers of deep networks, particularly in
CNNs and MLPs.
• Advantages:
o Efficient: Simple and computationally efficient.
o Reduces the vanishing gradient problem for positive values.
• Disadvantages:
o Dying ReLU Problem: Neurons with negative inputs will always output
zero, which can cause neurons to become inactive.
4. Leaky ReLU
• Formula:

where α is a small positive constant.


• Range: (-∞, ∞)
• Use Case: A variant of ReLU to solve the "dying ReLU" problem.
• Advantages:
o Allows small gradients for negative inputs, preventing neurons from
becoming inactive.
• Disadvantages:
o The choice of α requires tuning.

5. Soft-max Activation Function


• Formula:

• Range: (0, 1) with all outputs summing to 1.


• Use Case: Used in the output layer for multi-class classification problems.
• Advantages:
o Outputs probabilities for multi-class classification tasks.
• Disadvantages:
o Not suitable for binary classification tasks.
6. Swish Activation Function
• Formula:

• Range: (-∞, ∞)
• Use Case: Recently used in advanced models like EfficientNet and Transformer-
based models.
• Advantages:
o Smooth, non-monotonic, and does not suffer from dying neurons.
• Disadvantages:
o Computationally more expensive than ReLU.
7. ELU (Exponential Linear Unit)
• Formula:

where α is a hyperparameter.
• Range: (-α, ∞)
• Use Case: Useful in deeper networks requiring faster convergence and avoiding
dead neurons.

• Advantages:
o Prevents dead neurons and allows for faster convergence.
• Disadvantages:
o More computationally expensive than ReLU.
Loss Functions in Deep Learning
Loss functions measure the difference between the predicted output and the actual target,
guiding the model's optimization process. Here are some commonly used loss functions:
1. Mean Squared Error (MSE)
• Formula:

is the actual value and is the predicted value.


• Use Case: Commonly used for regression problems.
• Advantages:
o Simple and intuitive, penalizes large errors.
• Disadvantages:
o Sensitive to outliers.
2. Cross-Entropy Loss (Log Loss)
• Formula:

where is the actual class label, and is the predicted probability.


• Use Case: Used for classification problems, particularly in binary and multi-class
classification.
• Advantages:
o Measures how well the model’s predictions match the true probability
distribution.
• Disadvantages:
o Sensitive to incorrect predictions when the predicted probability is very
low for the true class.

3. Hinge Loss
• Formula:

Where the true label and is the predicted output.


• Use Case: Used for Support Vector Machines (SVMs) and large-margin
classification.
• Advantages:
o Focuses on misclassified points and encourages correct classification with
large margins.
• Disadvantages:
o Only useful for binary classification with labels [−1,1][-1, 1][−1,1].

4. Mean Absolute Error (MAE)


• Formula:

• Use Case: Used for regression tasks where the magnitude of error is important.
• Advantages:
o Less sensitive to outliers compared to MSE.
• Disadvantages:
o Does not penalize large errors as much as MSE.
5. Kullback-Leibler (KL) Divergence
• Formula:

where P and Q are two probability distributions.


• Use Case: Used in tasks like variational autoencoders (VAEs) and reinforcement
learning.
• Advantages:
o Measures the difference between two probability distributions.

• Disadvantages:
o Only applicable when both distributions are valid probability distributions.
6. Sparse Categorical Cross-Entropy
• Formula:

• Use Case: Used for multi-class classification problems when the target labels are
integers.
• Advantages:
o Handles multi-class classification without requiring one-hot encoding of
labels.
• Disadvantages:
o Not applicable to binary classification tasks.
Q. Explain in detail about back propagation.
Ans: Backpropagation (short for "backward propagation of errors") is a fundamental algorithm
in training artificial neural networks. It enables the network to learn by adjusting its weights
and biases to minimize error in predictions. Below is a detailed explanation of how
backpropagation works:
1. Key Components
To understand backpropagation, it's essential to know its core components:
• Neural Network Structure:
o Input Layer: Takes input data (e.g., features of an image).
o Hidden Layers: Process inputs using weights, biases, and activation
functions.
o Output Layer: Produces predictions.
• Weights and Biases: Parameters of the network that are adjusted during
learning.
• Loss Function: Measures the difference between the predicted output and the
actual target value (e.g., Mean Squared Error, Cross-Entropy).
• Learning Rate: A small positive value that controls the step size during
optimization.
2. The Backpropagation Process
The process involves two main steps: forward propagation and backward propagation.
Step 1: Forward Propagation
• The input data passes through the network layer by layer.
• For each neuron in the hidden and output layers:
o Compute the weighted sum of inputs:

o where w are weights, x are inputs, and b is the bias.


o Apply an activation function (e.g., ReLU, sigmoid) to introduce non-
linearity: a=Activation(z)
• The final output is the network's prediction.
Step 2: Backward Propagation
This step computes gradients to update weights and biases, ensuring the error is minimized.
1. Calculate Error at the Output Layer:
o Using the loss function, calculate the difference between the predicted
output and the actual target .
2. Compute Gradients at the Output Layer:
o Derivatives of the loss function with respect to the output activation are
calculated:

3. Propagate Error Backwards:


o For each layer l, compute gradients of the loss with respect to:
▪ Weights (w):

▪ Biases (b):
▪ Activations

Derivative of Activation Function


4. Update Parameters:
o Use the gradients to update weights and biases via Gradient Descent:

where η is the learning rate.


3. Intuition Behind Backpropagation
• Chain Rule of Calculus: Backpropagation uses the chain rule to calculate how a
small change in weights affects the loss.
• Gradient Descent: By following the direction of steepest descent (negative
gradient), the algorithm minimizes the loss function.
4. Benefits of Backpropagation
• Automates learning by adjusting weights and biases.
• Efficiently trains networks with many layers (deep learning).
• Works well with stochastic optimization techniques (e.g., SGD).
5. Limitations of Backpropagation
• Vanishing/Exploding Gradients: Gradients can become too small or large,
hindering training in very deep networks.
• Dependency on Good Initialization: Poor initialization may slow convergence.
• Computationally Intensive: Requires significant computational resources for
large datasets.
6. Modern Enhancements
• Batch Normalization: Reduces internal covariate shift.
• Optimizers: Advanced algorithms like Adam and RMSProp improve convergence
speed.
• Dropout: Reduces overfitting by randomly deactivating neurons during training.
Backpropagation is at the heart of neural network training and remains a cornerstone of
machine learning despite its challenges.

You might also like