tutorial 1,2
tutorial 1,2
tutorial 1,2
Ans: Deep learning is a subset of machine learning, which is a subset of artificial intelligence
that teaches computer to process data in a way that’s inspired by human brain. It uses artificial
neural networks to mimic the way the human brain processes data. It involves layers of
interconnected nodes (neurons) that can automatically learn and extract features from large
amounts of data. Deep learning models excel at handling unstructured data like images, text,
and sound, and they improve their performance as the size of the data increases.
Deep learning typically involves multiple hidden layers in a neural network, enabling it to learn
hierarchical patterns or representations in data.
• Range: (0, 1)
• Use Case: Typically used in the output layer for binary classification tasks.
• Advantages:
o Outputs values between 0 and 1, making it suitable for probability
interpretation.
• Disadvantages:
o Vanishing Gradients: For very large or small inputs, gradients become very
small, slowing down training.
o Not zero-centered, which can lead to inefficient gradient updates.
2. Tanh (Hyperbolic Tangent)
• Formula:
• Range: (-1, 1)
• Use Case: Often used in hidden layers, especially for sequence-based models like
RNNs.
• Advantages:
o Zero-centered, helping the optimization process by making the data
distribution more balanced.
• Disadvantages:
o Still suffers from the vanishing gradient problem for very large or small
inputs.
3. ReLU (Rectified Linear Unit)
• Formula:
• Range: [0, ∞)
• Use Case: Commonly used in hidden layers of deep networks, particularly in
CNNs and MLPs.
• Advantages:
o Efficient: Simple and computationally efficient.
o Reduces the vanishing gradient problem for positive values.
• Disadvantages:
o Dying ReLU Problem: Neurons with negative inputs will always output
zero, which can cause neurons to become inactive.
4. Leaky ReLU
• Formula:
• Range: (-∞, ∞)
• Use Case: Recently used in advanced models like EfficientNet and Transformer-
based models.
• Advantages:
o Smooth, non-monotonic, and does not suffer from dying neurons.
• Disadvantages:
o Computationally more expensive than ReLU.
7. ELU (Exponential Linear Unit)
• Formula:
where α is a hyperparameter.
• Range: (-α, ∞)
• Use Case: Useful in deeper networks requiring faster convergence and avoiding
dead neurons.
• Advantages:
o Prevents dead neurons and allows for faster convergence.
• Disadvantages:
o More computationally expensive than ReLU.
Loss Functions in Deep Learning
Loss functions measure the difference between the predicted output and the actual target,
guiding the model's optimization process. Here are some commonly used loss functions:
1. Mean Squared Error (MSE)
• Formula:
3. Hinge Loss
• Formula:
• Use Case: Used for regression tasks where the magnitude of error is important.
• Advantages:
o Less sensitive to outliers compared to MSE.
• Disadvantages:
o Does not penalize large errors as much as MSE.
5. Kullback-Leibler (KL) Divergence
• Formula:
• Disadvantages:
o Only applicable when both distributions are valid probability distributions.
6. Sparse Categorical Cross-Entropy
• Formula:
• Use Case: Used for multi-class classification problems when the target labels are
integers.
• Advantages:
o Handles multi-class classification without requiring one-hot encoding of
labels.
• Disadvantages:
o Not applicable to binary classification tasks.
Q. Explain in detail about back propagation.
Ans: Backpropagation (short for "backward propagation of errors") is a fundamental algorithm
in training artificial neural networks. It enables the network to learn by adjusting its weights
and biases to minimize error in predictions. Below is a detailed explanation of how
backpropagation works:
1. Key Components
To understand backpropagation, it's essential to know its core components:
• Neural Network Structure:
o Input Layer: Takes input data (e.g., features of an image).
o Hidden Layers: Process inputs using weights, biases, and activation
functions.
o Output Layer: Produces predictions.
• Weights and Biases: Parameters of the network that are adjusted during
learning.
• Loss Function: Measures the difference between the predicted output and the
actual target value (e.g., Mean Squared Error, Cross-Entropy).
• Learning Rate: A small positive value that controls the step size during
optimization.
2. The Backpropagation Process
The process involves two main steps: forward propagation and backward propagation.
Step 1: Forward Propagation
• The input data passes through the network layer by layer.
• For each neuron in the hidden and output layers:
o Compute the weighted sum of inputs:
▪ Biases (b):
▪ Activations