Model Definition11
Model Definition11
Model Definition11
To start with logistic regression, I’ll first write the simple linear regression equation
with dependent variable enclosed in a link function:
2. Logistic Regression
ln(odds) = ln(p/(1-p))
Since we are working here with a binomial distribution (dependent variable), we need
to choose a link function which is best suited for this distribution. And, it is logit
function. In the equation above, the parameters are chosen to maximize the
likelihood of observing the sample values rather than minimizing the sum of squared
errors (like in ordinary regression).
Linear Regression
It is used to estimate real values (cost of houses, number of calls, total sales etc.)
based on continuous variable(s). Here, we establish relationship between
independent and dependent variables by fitting a best line. This best fit line is known
as regression line and represented by a linear equation Y= a *X + b.
Y – Dependent Variable
a – Slope
X – Independent variable
b – Intercept
Decision Tree
we split the population into two or more homogeneous sets. This is done based on most significant
attributes/ independent variables to make as distinct groups as possible.
In the image above, you can see that population is classified into four different groups based on
multiple attributes to identify ‘if they will play or not’. To split the population into different
heterogeneous groups, it uses various techniques like Gini, Information Gain, Chi-square, entropy.
Support Vectors
we’d first plot these two variables in two dimensional space where each point has two co-ordinates
(these co-ordinates are known as Support Vectors)
Now, we will find some line that splits the data between the two differently classified
groups of data. This will be the line such that the distances from the closest point in
each of the two groups will be farthest away.
In the example shown above, the line which splits the data into two differently
classified groups is the black line, since the two closest points are the farthest apart
from the line. This line is our classifier. Then, depending on where the testing data
lands on either side of the line, that’s what class we can classify the new data as.
Naive Bayes
For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features, a
naive Bayes classifier would consider all of these properties to independently contribute to the
probability that this fruit is an apple.
Naive Bayesian model is easy to build and particularly useful for very large data sets.
Along with simplicity, Naive Bayes is known to outperform even highly sophisticated
classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c),
P(x) and P(x|c). Look at the equation below:
Here,
P(c|x) is the posterior probability of class (target) given predictor (attribute).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.
Works on pre-processing stage more before going for KNN like outlier
K-Means
It is a type of unsupervised algorithm which solves the clustering problem.
Its procedure follows a simple and easy way to classify a given data set through
a certain number of clusters (assume k clusters). Data points inside a cluster are
homogeneous and heterogeneous to peer groups.
Remember figuring out shapes from ink blots? k means is somewhat similar this
activity. You look at the shape and spread to decipher how many different clusters /
population are present!
How K-means forms cluster:
In K-means, we have clusters and each cluster has its own centroid. Sum of square
of difference between centroid and the data points within a cluster constitutes within
sum of square value for that cluster. Also, when the sum of square values for all the
clusters are added, it becomes total within sum of square value for the cluster
solution.
We know that as the number of cluster increases, this value keeps on decreasing but
if you plot the result you may see that the sum of squared distance decreases
sharply up to some value of k, and then much more slowly after that. Here, we can
find the optimum number of cluster.
8. Random Forest
Random Forest is a trademark term for an ensemble of decision trees. In Random
Forest, we’ve collection of decision trees (so known as “Forest”). To classify a new
object based on attributes, each tree gives a classification and we say the tree
“votes” for that class. The forest chooses the classification having the most votes
(over all the trees in the forest).
1. If the number of cases in the training set is N, then sample of N cases is taken at
random but with replacement. This sample will be the training set for growing the
tree.
2. If there are M input variables, a number m<<M is specified such that at each node,
m variables are selected at random out of the M and the best split on these m is
used to split the node. The value of m is held constant during the forest growing.
3. Each tree is grown to the largest extent possible. There is no pruning.