tutorial 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Q. Explain learning rate in context of gradient descent.

Ans: The learning rate in gradient descent is a hyperparameter that controls the step size at
each iteration while moving toward the minimum of the loss function. It determines how
much the model's parameters (weights and biases) are adjusted during training.
Role of Learning Rate
• Small Learning Rate:
o Leads to smaller updates.
o Ensures more precise convergence but makes the training slower.
o May get stuck in local minima or saddle points.
• Large Learning Rate:
o Leads to larger updates.
o Speeds up training but risks overshooting the minimum or causing
instability (oscillations around the optimal value).
Learning Rate in Gradient Descent Update Rule
In gradient descent, weights (w) are updated using:

where η is the learning rate, and is the gradient loss function.


Finding the Right Learning Rate
Choosing an appropriate learning rate is crucial for effective training. Common strategies
include:
• Fixed Learning Rate: Remains constant throughout training.
• Learning Rate Schedulers: Adjust η\etaη dynamically (e.g., decay it over
epochs).
• Adaptive Methods: Algorithms like Adam or RMSProp adapt the learning rate
per parameter.
A balance between too small and too large learning rates ensures efficient convergence to the
loss function's minimum.

Q. Explain linear perceptron in detail.


Ans: A linear perceptron is a simple neural network model used for binary classification. It
computes a weighted sum of inputs and applies a step activation function
to produce an output of 0 or 1. The perceptron adjusts weights and biases during training
using a simple learning rule to minimize classification errors.
Key Features:
1. Weights and Bias:
o Weights determine the importance of inputs.
o Bias helps adjust the decision boundary.
2. Activation Function:
o A step function outputs 1 if z ≥0 and 0 otherwise.
3. Training:
o Weights are updated using the perceptron learning rule:

where t is the true label, y is the output, and η is the learning rate.

Advantages:
1. Simplicity: Easy to implement and understand.
2. Fast Training: Efficient for small and linearly separable datasets.
3. Foundation for Neural Networks: Forms the basis for more complex
architectures like multi-layer perceptrons.
Disadvantages:
1. Linear Separability: Can only solve problems where data is linearly separable.
2. No Probabilistic Outputs: Outputs are strictly 000 or 111, with no confidence
scores.
3. Limited Flexibility: Cannot handle complex, non-linear relationships.
The perceptron is best suited for simple tasks and serves as a stepping stone for understanding
advanced models.

Q. What is activation function? Explain its characteristics.


Ans: An activation function is a mathematical function applied to the output of a neuron in a
neural network. It introduces non-linearity to the network, enabling it to learn and represent
complex patterns. Without activation functions, a neural network would behave like a simple
linear model, regardless of its depth.
Characteristics of Activation Functions:
1. Non-Linearity: Helps the network model complex data relationships.
2. Differentiability: Must be differentiable for backpropagation.
3. Range: Defines the output range (e.g., [0,1], [−1,1]).
4. Monotonicity: Ensures consistent gradient directions for stable optimization.
5. Efficiency: Should be computationally inexpensive.
6. Gradient Stability: Prevents vanishing or exploding gradients during training.
Common Activation Functions and Their Characteristics
1. Sigmoid:
o Formula:

o Range: (0,1)
o Characteristics: Smooth, but can cause vanishing gradients.
2. ReLU (Rectified Linear Unit):
o Formula:

o Range: [0, 1)
o Characteristics: Efficient, avoids vanishing gradients, but can suffer from
"dead neurons."
3. Tanh:
o Formula:

o Range: (−1,1)
o Characteristics: Centred around zero; can still suffer from vanishing
gradients.
4. Softmax:
o Formula:

o Range: (0,1) (probabilities sum to 1)


o Characteristics: Used for multi-class classification.
5. Leaky ReLU:
o Formula:
o Range: (−∞,∞)
o Characteristics: Addresses the dead neuron issue by allowing small
negative slopes.
6. Swish:
o Formula:

o Range: (−∞,∞)
o Characteristics: Smooth, improves gradient flow and training.
Role of Activation Functions
• Introduce Non-Linearity: Allows networks to model complex data relationships.
• Control Output: Adjusts the range and form of the output for different tasks.
• Facilitate Optimization: Helps in learning by enabling effective
backpropagation.

Q. Explain the concept of tensor in tensorflow.


Ans: In TensorFlow, a tensor is the central data structure used to represent data. Tensors are
multidimensional arrays that are generalized versions of scalars, vectors, and matrices. They
enable TensorFlow to efficiently perform numerical computations, particularly for machine
learning and deep learning tasks.
Key Concepts of Tensors
1. Rank (Order):
o Refers to the number of dimensions in the tensor.
o Examples:
▪ Scalar: Rank 0 (e.g., 333).
▪ Vector: Rank 1 (e.g., [3,4,5]).
▪ Matrix: Rank 2 (e.g., [[1,2],[3,4]]).
▪ Higher ranks: [[[...]]], etc.
2. Shape:
o Describes the number of elements along each dimension.
o Example: A tensor with shape (3,2) has 3 rows and 2 columns.
3. Data Type (dtype):
o Specifies the type of data stored in the tensor (e.g., float32, int32, string).
4. Immutability:
o In TensorFlow, tensors are immutable; their values cannot change after
creation.
Tensors in TensorFlow
Tensors in TensorFlow are created using functions like:
• Constant Tensor: tf.constant([1, 2, 3])
• Variable Tensor: tf.Variable([[1.0, 2.0], [3.0, 4.0]])
• Placeholder (for older TF versions): Used to feed data dynamically during
execution.
Operations on Tensors
TensorFlow supports a wide variety of tensor operations such as:
• Arithmetic operations: Addition, subtraction, multiplication, etc.
• Reshaping: Changing the shape without altering the data.
• Slicing: Extracting subsets of data.
• Broadcasting: Automatic expansion of tensors for compatible operations.
Importance of Tensors in TensorFlow
• Core Data Structure: Tensors represent all inputs, outputs, and computations.
• Parallel Processing: Optimized for GPUs and TPUs for high-performance
computing.
• Flexibility: Tensors support various dimensions, data types, and operations.
Tensors enable TensorFlow to model and execute complex numerical computations in an
efficient and scalable way.

Q. Explain difference between tensorflow 1.0 and tensorflow 2.0.


Ans: Key Differences Between TensorFlow 1.0 and TensorFlow 2.0

Aspect TensorFlow 1.0 TensorFlow 2.0


Eager Execution Used static computation graphs, Enabled eager execution by default,
requiring sessions for execution. allowing operations to run immediately
like regular Python code.
Keras integration Keras was separate. Fully integrated as tf.keras for easier
model building.
Simpler syntax Required placeholders and Eliminated placeholders and sessions,
verbose session management. simplifying the workflow.
Backward No built-in support for future Provides tf.compat.v1 to support
compatibility changes. legacy code.
Debugging Debugging was difficult due to Easier debugging with eager
static computation graph. execution and dynamic operations.
Inconsistent APIs across modules. Unified and consistent APIs for
API Consistency
better usability.
Required explicit sessions to Removed sessions; eager execution
Sessions
execute the computation graph. handles computation dynamically.

Q. Explain difference between constants, variables and placeholders.


Ans: In TensorFlow, constants, variables, and placeholders are used to represent and manage
data in computation graphs. Here's how they differ:
Constants Variables Placeholders
Definition Fixed values that doValues that can be Tensors that act as inputs
not change duringupdated or modified to the computation graph, where
execution. during training or values are fed at runtime.
execution.
Usage Used for static dataUsed for trainable parametersUsed
like for dynamically
that remains constant,
weights feeding data during
like configuration and biases in machine learningtraining or inference.
values or unchangingmodels.
inputs.
Mutability Immutable Mutable Requires feed at
runtime
Initialization Predefined value Must be initialized No initialization
TensorFlow 2.0 Fully supported Fully supported Replaced by tf.function
and eager execution
Example tf.constant(5) tf.Variable(initial_value=1.0) tf.placeholder(dtype=tf.float32,
# A constant tensor# A variable with an initial shape=[None, 3])
with value 5 value of 1.0 # Placeholder for a batch
of 3-dimensional inputs

Q. Explain the concept of computation graph and its advantages.


Ans: A computation graph is a graphical representation of mathematical operations in
TensorFlow. It consists of nodes, which represent operations, and edges, which represent the
data (tensors) flowing between operations. TensorFlow uses this graph to define, optimize, and
execute computations efficiently.
Concept of Computation Graph
1. Nodes: Represent operations like addition, multiplication, or activation
functions.
2. Edges: Represent tensors (data) passed between nodes.
3. Directed Acyclic Graph (DAG): The graph has a direction (data flows from
inputs to outputs) and no cycles.
Example: If you define:

The computation graph looks like:


• Node 1: Square operation on x.
• Node 2: Add operation combining x^2 and y.
Advantages of Computation Graphs
1. Optimized Execution:
o TensorFlow can optimize computations, like reusing intermediate results
or parallelizing operations.
2. Flexibility:
o Separate graph definition and execution allow deployment on various
devices (CPU, GPU, TPU).
3. Portability:
o Computation graphs can be serialized and executed on different
platforms (e.g., mobile, servers).
4. Support for Distributed Computing:
o Allows splitting computations across multiple devices or machines.
5. Visualization:
o Graphs can be visualized (e.g., using TensorBoard) for debugging and
understanding the model structure.
6. Memory Management:
o TensorFlow manages memory usage efficiently by constructing the graph
before execution.
Q. Explain session and fetches in computation graph.
Ans: In the context of computation graphs, especially in frameworks like TensorFlow, sessions
and fetches play a crucial role in executing operations and retrieving results. Let's break down
these concepts:
Explain session and fetches in computation graph.
In the context of computation graphs, especially in frameworks like TensorFlow, sessions and
fetches play a crucial role in executing operations and retrieving results. Let's break down
these concepts:
Session:
• A session is an environment in which operations (nodes) in the computation
graph are executed.
• In TensorFlow, for instance, a tf.Session object encapsulates the environment
and control of the execution of the computation graph.
How Sessions Work:
1. Graph Definition: First, you define the computation graph.
2. Create a Session: Instantiate a session to run the graph.
3. Run Operations: Execute the operations within the session.

Fetches:
• Fetches refer to the process of retrieving the output(s) of one or more
operations from the computation graph.
• You specify the nodes (operations) to fetch the results from when running a
session.
How Fetches Work:
1. Specify Fetches: While running the session, you can specify the operations
whose results you want to fetch.
2. Retrieve Outputs: The session returns the results of these specified operations.

You might also like