tutorial 3
tutorial 3
tutorial 3
Ans: The learning rate in gradient descent is a hyperparameter that controls the step size at
each iteration while moving toward the minimum of the loss function. It determines how
much the model's parameters (weights and biases) are adjusted during training.
Role of Learning Rate
• Small Learning Rate:
o Leads to smaller updates.
o Ensures more precise convergence but makes the training slower.
o May get stuck in local minima or saddle points.
• Large Learning Rate:
o Leads to larger updates.
o Speeds up training but risks overshooting the minimum or causing
instability (oscillations around the optimal value).
Learning Rate in Gradient Descent Update Rule
In gradient descent, weights (w) are updated using:
where t is the true label, y is the output, and η is the learning rate.
Advantages:
1. Simplicity: Easy to implement and understand.
2. Fast Training: Efficient for small and linearly separable datasets.
3. Foundation for Neural Networks: Forms the basis for more complex
architectures like multi-layer perceptrons.
Disadvantages:
1. Linear Separability: Can only solve problems where data is linearly separable.
2. No Probabilistic Outputs: Outputs are strictly 000 or 111, with no confidence
scores.
3. Limited Flexibility: Cannot handle complex, non-linear relationships.
The perceptron is best suited for simple tasks and serves as a stepping stone for understanding
advanced models.
o Range: (0,1)
o Characteristics: Smooth, but can cause vanishing gradients.
2. ReLU (Rectified Linear Unit):
o Formula:
o Range: [0, 1)
o Characteristics: Efficient, avoids vanishing gradients, but can suffer from
"dead neurons."
3. Tanh:
o Formula:
o Range: (−1,1)
o Characteristics: Centred around zero; can still suffer from vanishing
gradients.
4. Softmax:
o Formula:
o Range: (−∞,∞)
o Characteristics: Smooth, improves gradient flow and training.
Role of Activation Functions
• Introduce Non-Linearity: Allows networks to model complex data relationships.
• Control Output: Adjusts the range and form of the output for different tasks.
• Facilitate Optimization: Helps in learning by enabling effective
backpropagation.
Fetches:
• Fetches refer to the process of retrieving the output(s) of one or more
operations from the computation graph.
• You specify the nodes (operations) to fetch the results from when running a
session.
How Fetches Work:
1. Specify Fetches: While running the session, you can specify the operations
whose results you want to fetch.
2. Retrieve Outputs: The session returns the results of these specified operations.