2. How is logistic regression done?
Logistic regression measures the relationship between the
dependent variable (our label of what we want to predict) and
one or more independent variables (our features) by
estimating probability using its underlying logistic function
(sigmoid).
The image shown below depicts how logistic regression works:
The formula and graph for the sigmoid function are as shown:
3. Explain the steps in making a decision
tree.
1. Take the entire data set as input
2. Calculate entropy of the target variable, as well as the
predictor attributes
3. Calculate your information gain of all attributes (we gain
information on sorting different objects from each other)
4. Choose the attribute with the highest information gain as
the root node
5. Repeat the same procedure on every branch until the
decision node of each branch is finalized
For example, let's say you want to build a decision tree to
decide whether you should accept or decline a job offer. The
decision tree for this case is as shown:
It is clear from the decision tree that an offer is accepted if:
Salary is greater than $50,000
The commute is less than an hour
• • Incentives are offered
8. In your choice of language,
write a program that prints
the numbers ranging from
one to 50.
But for multiples of three, print "Fizz" instead of
the number, and for the multiples of five, print
"Buzz." For numbers which are multiples of both
three and five, print "FizzBuzz"
The code is shown below:
Note that the range mentioned is 51, which means
zero to 50. However, the range asked in the
question is one to 50. Therefore, in the above
code, you can include the range as (1,51).
The output of the above code is as shown:
15. How do you find RMSE and
MSE in a linear regression
model?
RMSE and MSE are two of the most common
measures of accuracy for a linear
regression model.
RMSE indicates the Root Mean Square Error.
MSE indicates the Mean Square Error.
19. How can time-series data
be declared as stationery?
It is stationary when the variance and mean of the
series are constant with time.
Here is a visual example:
In the first graph, the variance is constant with
time. Here, X is the time factor and Y is the
variable. The value of Y goes through the same
points all the time; in other words, it is stationary.
In the second graph, the waves get bigger, which
means it is non-stationary and the variance is
changing with time.
20. How can you calculate
accuracy using a confusion
matrix?
Consider this confusion matrix:
You can see the values for total data, actual
values, and predicted values.
The formula for accuracy is:
Accuracy = (True Positive + True Negative) / Total
Observations
= (262 + 347) / 650
= 609 / 650
= 0.93
As a result, we get an accuracy of 93 percent.
21. Write the equation and
calculate the precision and
recall rate.
Consider the same confusion matrix used in the
previous question.
Precision = (True positive) / (True Positive + False
Positive)
= 262 / 277
= 0.94
Recall Rate = (True Positive) / (Total Positive +
False Negative)
= 262 / 288
= 0.90
22. 'People who bought this
also bought…'
recommendations seen on
Amazon are a result of which
algorithm?
The recommendation engine is accomplished with
collaborative filtering. Collaborative filtering
explains the behavior of other users and their
purchase history in terms of ratings, selection, etc.
The engine makes predictions on what might
interest a person based on the preferences of other
users. In this algorithm, item features are
unknown.
For example, a sales page shows that a certain
number of people buy a new phone and also buy
tempered glass at the same time. Next time, when
a person buys a phone, he or she may see a
recommendation to buy tempered glass as well.