0% found this document useful (0 votes)
4 views8 pages

Data Science Interview Questions2

The document explains logistic regression, decision tree construction, and various metrics for evaluating models like RMSE, MSE, accuracy, precision, and recall. It also discusses the concept of stationary time-series data and the collaborative filtering algorithm used for recommendations on platforms like Amazon. Additionally, it includes a programming task for generating a FizzBuzz output.

Uploaded by

Ankit Kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Data Science Interview Questions2

The document explains logistic regression, decision tree construction, and various metrics for evaluating models like RMSE, MSE, accuracy, precision, and recall. It also discusses the concept of stationary time-series data and the collaborative filtering algorithm used for recommendations on platforms like Amazon. Additionally, it includes a programming task for generating a FizzBuzz output.

Uploaded by

Ankit Kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

2. How is logistic regression done?

Logistic regression measures the relationship between the


dependent variable (our label of what we want to predict) and
one or more independent variables (our features) by
estimating probability using its underlying logistic function
(sigmoid).
The image shown below depicts how logistic regression works:

The formula and graph for the sigmoid function are as shown:

3. Explain the steps in making a decision


tree.
1. Take the entire data set as input
2. Calculate entropy of the target variable, as well as the
predictor attributes
3. Calculate your information gain of all attributes (we gain
information on sorting different objects from each other)
4. Choose the attribute with the highest information gain as
the root node
5. Repeat the same procedure on every branch until the
decision node of each branch is finalized
For example, let's say you want to build a decision tree to
decide whether you should accept or decline a job offer. The
decision tree for this case is as shown:
It is clear from the decision tree that an offer is accepted if:
 Salary is greater than $50,000
 The commute is less than an hour
• • Incentives are offered

8. In your choice of language,


write a program that prints
the numbers ranging from
one to 50.

But for multiples of three, print "Fizz" instead of


the number, and for the multiples of five, print
"Buzz." For numbers which are multiples of both
three and five, print "FizzBuzz"

The code is shown below:


Note that the range mentioned is 51, which means
zero to 50. However, the range asked in the
question is one to 50. Therefore, in the above
code, you can include the range as (1,51).

The output of the above code is as shown:


15. How do you find RMSE and
MSE in a linear regression
model?

RMSE and MSE are two of the most common


measures of accuracy for a linear
regression model.

RMSE indicates the Root Mean Square Error.

MSE indicates the Mean Square Error.

19. How can time-series data


be declared as stationery?

It is stationary when the variance and mean of the


series are constant with time.

Here is a visual example:


In the first graph, the variance is constant with
time. Here, X is the time factor and Y is the
variable. The value of Y goes through the same
points all the time; in other words, it is stationary.

In the second graph, the waves get bigger, which


means it is non-stationary and the variance is
changing with time.

20. How can you calculate


accuracy using a confusion
matrix?

Consider this confusion matrix:


You can see the values for total data, actual
values, and predicted values.

The formula for accuracy is:

Accuracy = (True Positive + True Negative) / Total


Observations

= (262 + 347) / 650

= 609 / 650

= 0.93

As a result, we get an accuracy of 93 percent.

21. Write the equation and


calculate the precision and
recall rate.

Consider the same confusion matrix used in the


previous question.
Precision = (True positive) / (True Positive + False
Positive)

= 262 / 277

= 0.94

Recall Rate = (True Positive) / (Total Positive +


False Negative)

= 262 / 288

= 0.90

22. 'People who bought this


also bought…'
recommendations seen on
Amazon are a result of which
algorithm?

The recommendation engine is accomplished with


collaborative filtering. Collaborative filtering
explains the behavior of other users and their
purchase history in terms of ratings, selection, etc.

The engine makes predictions on what might


interest a person based on the preferences of other
users. In this algorithm, item features are
unknown.

For example, a sales page shows that a certain


number of people buy a new phone and also buy
tempered glass at the same time. Next time, when
a person buys a phone, he or she may see a
recommendation to buy tempered glass as well.

You might also like