Deep Learning Interview Questions - Deep Learning Questions
Deep Learning Interview Questions - Deep Learning Questions
Deep Learning Interview Questions - Deep Learning Questions
Introduction
Are you planning to sit for deep learning interviews? Have you perhaps
already taken the first step, applied, and sat through the ordeal of several
rounds of interviews for a deep learning role and not made the cut?
These are questions every deep learning enthusiast, fresher, and even
expert with the best deep learning course learning has asked themselves
at some point.
That was a key reason behind penning down this article, a comprehensive
list of the popular deep learning interview questions and answers. But let
me expand on that a bit more.
Note: Make sure you check out the popular Fundamentals of Deep
Learning course if you harbor any deep learning career ambitions!
However, we are now seeing a tectonic shift in the industry. As the need
for more complex solutions grows around the world, organizations are
turning to deep learning frameworks. Advances in computer vision and
natural language processing (NLP) have created a need to adopt deep
learning or stay behind the curve.
The demand for deep learning folks is growing every month! This is a
great time to polish your skills and start climbing the deep learning hill.
Also, this does not mean that your interview will not contain a single
question on Machine Learning. There are some concepts that are
common in both fields and are extremely crucial for you to know. These
include topics like:
Evaluation Metrics
Gradient Descent
Bias vs. Variance (or Underfitting vs .Overfitting)
Cross-validation, etc.
So, here is a definitive interview guide that covers all the topics in details,
in the form of MCQs and long-form resources: The Most Comprehensive
Data Science & Machine Learning Interview Guide You’ll Ever Need. If
you’re looking for a structured and granular guide including tips, tricks
and case studies on how to crack interviews, I highly recommend taking
the Ace Data Science Interviews course.
Beginner
Intermediate
Advanced
There’s something here for everyone! So get your pen and paper ready,
strap in, and prepare to learn.
Your answers to these questions need not be too detailed but do keep in
mind that the interviewer might recall your answer while asking more
advanced questions later.
Essentially, you can have a different bias value at each layer or at each
neuron as well. However, it is best if we have a bias matrix for all the
neurons in the hidden layers as well.
A point to note is that both these strategies would give you very different
results.
Step 1: Calculate the sum of all the inputs (X) according to their weights
and include the bias term:
Z = (weights * X) + bias
Y = Activation(Z)
Steps 1 and 2 are performed at each layer. If you recollect, this is nothing
but forward propagation! Now, what if there is no activation function?
Y = Z = (weights * X) + bias
Wait – isn’t this just a simple linear equation? Yes – and that is why we
need activation functions. A linear equation will not be able to capture the
complex patterns in the data – this is even more evident in the case of
deep learning problems.
In simplest terms, if all the neurons have the same value of weights, each
hidden unit will get exactly the same signal. While this might work during
forward propagation, the derivative of the cost function during backward
propagation would be the same every time.
Therefore, if all weights have the same initial value, this would lead to
underfitting.
Now, this can be one tricky question. There might be a misconception that
deep learning can only solve unsupervised learning problems. This is not
the case. Some example of Supervised Learning and Deep learning
include:
Image classification
Text classification
Sequence tagging
And so on. These factors can change your decision greatly or not too
much. For example, if it is raining outside, then you cannot go out to play
at all. Or if you have only one bat, you can share it while playing as well.
The magnitude by which these factors can affect the game is called the
weight of that factor.
Factors like the weather or temperature might have a higher weight, and
other factors like equipment would have a lower weight.
However, does this mean that we can play a cricket match with only one
bat? No – we would need 1 ball and 6 wickets as well. This is where bias
comes into the picture. Bias lets you assign some threshold which helps
you activate a decision-point (or a neuron) only when that threshold is
crossed.
Now, this can be answered in two ways. If you are on a phone interview,
you cannot perform all the calculus in writing and show the interviewer. In
such cases, it best to explain it as such:
Forward propagation:
Backpropagation:
Deep Learning goes right from the simplest data structures like lists to
complicated ones like computation graphs.
1. It will let the interviewer know that you have practical experience as
well
2. Since you are talking about projects that you have implemented, it is
much easier and comfortable to talk about your own work
Here, I have given an overview of the key concepts in the questions – you
can always customize your answers to add more about your experiences
with some of these deep learning algorithms and techniques.
Once the interviewer has asked you about the fundamentals of deep
learning architectures, they would move on to the key topic of improving
your deep learning model’s performance.
We basically normalize a[1] and a[2] here. This means we normalize the
inputs to the layer, and then apply the activation functions to the
normalized inputs.
Sigmoid
Tanh
ReLU
Softmax
While it is not important to know all the activation functions, you can
always score points by knowing the range of these functions and how
they are used. Here is a handy table for you to follow:
Here is a great guide on how to use these and other activations functions:
Fundamentals of Deep Learning – Activation Functions and When to Use
Them?.
The key to this question lies in the Convolution operation. Unlike humans,
the machine sees the image as a matrix of pixel values. Instead of
interpreting a shape like a petal or an ear, it just identifies curves and
edges.
Thus, instead of looking at the entire image, it helps to just read the image
in parts. Doing this for a 300 x 300 pixel image would mean dividing the
matrix into smaller 3 x 3 matrices and dealing with them one by one. This
is convolution.
Z=X*f
If you have a board/screen in front of you, you can always illustrate this
with a simple example:
Thus, to make the input size similar to the filter size, we make use of
padding – adding 0s to the input matrix such that its new size becomes at
least 7 X 7. Thus, the output size would be using the formula:
Valid Padding: When we do not use any padding. The resultant matrix
after convolution will have dimensions (n – f + 1) X (n – f + 1)
Same padding: Adding padded elements all around the edges such
that the output matrix will have the same dimensions as that of the
input matrix
However, at times, the steps become too large and this results in larger
updates to weights and bias terms – so much so as to cause an overflow
(or a NaN) value in the weights. This leads to an unstable algorithm and is
called an exploding gradient.
On the other hand, the steps are too small and this leads to minimal
changes in the weights and bias terms – even negligible changes at times.
We thus might end up training a deep learning model with almost the
same weights and biases each time and never reach the minimum error
function. This is called the vanishing gradient.
The use of transfer learning has been one of the key milestones in deep
learning. Training a large model on a huge dataset, and then using the
final parameters on smaller simpler datasets has led to defining
breakthroughs in the form of Pretrained Models. Be it Computer Vision or
NLP, pretrained models have become the norm in research and in the
industry.
Some popular examples include BERT, ResNet, GPT-2, VGG-16, etc and
many more.
It is here that you can earn brownie points by pointing out specific
examples/projects where you used these models and how you used them
as well.
It is not possible to discuss all of them, so here are a few resources to get
started:
This loop essentially includes a time component into the network as well.
This helps in capturing sequential information from the data, which could
not be possible in a generic artificial neural network.
As you can see, the LSTM model can become quite complex. In order to
still retain the functionality of retaining information across time and yet not
make a too complex model, we need GRUs.
It is this reduction in the number of gates that makes GRU less complex
and faster than LSTM. You can learn about GRUs, LSTMs and other
sequence models in detail here: Must-Read Tutorial to Learn Sequence
Modeling & Attention Models.
Now, this is one question that is sure to be asked even if none of the
above ones is asked in your deep learning interview. I have included it in
the advanced section since you might be grilled on each and every part of
the code you have written.
have your GitHub code updated with the latest code changes you
have made
be ready to give in-depth explanations on at least 2-3 projects where
you used deep learning
When you are asked such a question, it is best to give a small 30-second
pitch on what was the:
problem statement
data you used and the framework (like PyTorch or TensorFlow)
any pretrained model you used or just the name of the basic model
you built upon
the value of the evaluation metric you achieved
After this, you can start going into detail about the model architecture,
what preprocessing steps you had to take, and how that changed the
data.
End Notes
These are a list of a few key questions that you would come across in a
deep learning interview. I have tried to cover more generic topics rather
than go into the details of how deep learning is used in fields like NLP or
computer vision.
I would love to hear your own interview experiences. Share them with me
and the community in the comments section below.