Common DS Interview Questions and Answers - 2
Common DS Interview Questions and Answers - 2
Common DS Interview Questions and Answers - 2
Logistic regression is one of the most popular machine learning models used for
solving a binary classification problem, that is, a problem where the output can take
any one of the two possible values. Its equation is given by
Where X represents the feature variable, a,b are the coefficients, and Y is the
target variable. Usually, if the value of Y is greater than some threshold value, the
input variable is labeled with class A. Otherwise, it is labeled with class B.
When only one variable When trends in two When more than two variables
is being analyzed variables are compared are considered for analysis to
through graphs like pie using graphs like scatter understand their correlations,
charts, the analysis is plots, the analysis of the the analysis is termed as
called univariate. bivariate type. multivariate.
To find the optimal value for k, one can use the elbow method or the silhouette
method.
Feature vectors are the set of variables containing values describing each
observation’s characteristics in a dataset. These vectors serve as input vectors to a
machine learning model.
19. How does the use of dropout work as a regulariser for deep
neural networks?
Dropout is a regularisation method used for deep neural networks to train different
neural networks architectures on a given dataset. When the neural network is
trained on a dataset, a few layers of the architecture are randomly dropped out of
the network. This method introduces noise in the network by compelling nodes
within a layer to probabilistically take on more or less authority for the input values.
Thus, dropout makes the neural network model more robust by fixing the units of
other layers with the help of prior layers.
The dropout regularisation method mostly proves beneficial for cases where the
dataset is small, and a deep neural network is likely to overfit during training. The
computational factor has to be considered for large datasets, which may outweigh
the benefit of dropout regularisation.
The dropout regularisation method involves the random removal of a layer from a
deep neural network, which speeds up the training process.
Where X is the independent variable, a,b are the coefficients, and Y is the
dependent variable that can take categorical values.
The expected value of test-MSE (Mean Square Error, for a given value x0, can
always be decomposed into the sum of three fundamental quantities: the variance
of f0‘(x0), the squared bias of f0(x0), and the variance of the error terms e. That is,
Here the notation(y0 − f0(x0))2 defines the expected test MSE, and refers to the
average test MSE that one would obtain if they repeatedly estimated f using a large
number of training sets, and tested each at x0. Also, f0‘(x0) refers to the output of
the fitted ML model for a given input x0 and e is the deviation of the predicted
valuef0‘(x0) from the true value at a given x0.
The equation above suggests that we need to select a statistical learning method
that simultaneously achieves low variance and low bias to minimize the expected
test error. A good statistical learning method's good test set performance requires
low variance and low squared bias. This is referred to as a trade-off because it is
easy to obtain a method with extremely low bias but high variance (for instance, by
drawing a curve that passes through every single training observation) or a method
with a very low variance
but high bias (by fitting a horizontal line to the data). The challenge lies in finding a
method for which both the variance and the squared bias are low.
Interpolating the data means one is estimating the values in between two known
values of a variable from the dataset. On the other hand, extrapolating the data
means one is estimating the values that lie outside the range of a variable.
26. Do gradient descent methods always converge to the same
point?
No, gradient descent methods do not always converge to the same point because
they converge to a local minimum or a local optima point in some cases. It
depends a lot on the data one is dealing with and the initial values of the learning
parameter.
If an algorithm learns something from If the algorithm does not learn anything
the training data so that the knowledge beforehand because there is no
can be applied to the test data, then it is response variable or training data, it is
referred to as Supervised Learning. referred to as unsupervised learning.
L2 Regularization
L1 Regularization
In L2 Regularization, the penalty term is the sum of squares of the magnitude of
the model coefficients while in L1 Regularization, it is the sum of absolute values of
the model coefficients.
Gradient descent is one of the most popular machine learning and deep learning
optimization algorithms used to update a learning model's parameters. There are 3
variants of gradient descent.
Batch Gradient Descent: Computation is carried out on the entire dataset in batch
gradient descent.
Stochastic Gradient Descent: Computation is carried over only one training
sample in stochastic gradient descent.
Mini Batch Gradient Descent: A small number/batch of training samples is used
for computation in mini-batch gradient descent.
For example, if a dataset has 1000 data points, then batch GD, will train on all the
1000 data points, Stochastic GD will train on only a single sample and the mini-
batch GD will consider a batch size of say100 data points and update the
parameters.