Statistics Theory Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Regression

The process of estimating the dependent variable on the basis of independent variable is called regression.

Simple Linear Regression:

When we want to check the dependence of one variable on a single independent variable, it is called simple linear regression.

e.g. Y=a+bX is a simple linear regression line in which Y is a dependent variable and X is an independent variable.

Assumptions of Simple Linear Regression

Simple linear regression makes certain assumptions about the data. These assumptions are:

• Homogeneity of variance (homoscedasticity): the variance of error term is constant. Var(ɛ)=E(ɛ2)= σ2

• The expected value of error term is zero i.e. E(ɛ)=0.

• Normality: The error are normally distributed with mean zero and constant variance.

• Independence of observations: E(ɛiɛj)=0, the error term are independent of each other (no auto correlation between errors)

• E(X,ɛ)=0, i.e., X and error terms are independent of each other.

• The relationship between the independent and dependent variable is linear: the line of best fit through the data points is a straight line.
Properties Of Least Square Regression Lines

The least square regression lines have the following properties:

 The least square regression lines always pass through the means  X , Y  of the data.

 The sum of the deviation of the observed values from the least square regression lines is always equals to zero i.e.  (Y  Yˆ )
i 0

 The sum of the squared deviation of the observed values from the least square regression lines is minimum i.e.  (Yi  Yˆ ) 2 is

minimum
 The least square regression lines obtained from the random sample is the line of best fit because a and b are the unbiased
estimates of the parameters  and 

Scatter Diagram: In scatter diagram we plot paired observations, with one variable on each axis, to look for a relationship between them. If
the variables are correlated, the points will fall along a line or curve. The scatter diagram reveals the relationship between two variables in

(a) Is positive and linear in (b) is negative and linear in (c) curvilinear and in (d) no relationship exist.
Correlation: The interdependence between two variables is called correlation. For example if the height of a person increase his weight is also
increased so they are correlated with each other.

Positive correlation: If the increase or decreases in one variable causes increase or decrease in other variable then there will be positive
correlation between variables .e.g. increase in height causes increases in weights.

Negative Correlation: If the increase in one variable causes decrease in other variable and vice versa then there will be negative correlation
between variables e.g. increase in price causes decrease in demand.

Properties of correlation coefficient.

The coefficient of correlation has the following properties:

• Correlation coefficient lies between -1 and +1.


• It is independent of unit employed.
• It is independent of origin and scale.
• Correlation coefficient.is geometric mean of two regression coefficients that is r  bd

Coefficient of Determination:

It measures the proportion of variability of the values of the dependent variable(Y) explained by its linear relation with independent(X) is
defined by the ratio of explained variation to the total variation.

Expalined Variation  (Y  Yˆ ) 2
r2   1
 (Y  Y )
Total Variation 2
Standard Error of Estimate: The observed values of (X,Y) do not fall on the regression line but they scatter away from it. The degree of
scatter (or dispersion) of the observed values about the regression line is measured by what is called standard deviation of regression or standard
error of estimate of Y on X. It is defined as:

 (Y  Yˆ ) 2

S y. x 
n2

Probability: It can be defined as “the likelihood of occurrence of an event is called probability”. We can find probability by using this formula

Number of favorable outcomes


P( A) 
Total possible outcomes
For example if we toss a coin the probability of head appears is equals to 1/2.

Sample Space:The collection of all possible outcomes of an experiment is called sample space. Each element of a sample space is called its
sample point. For example if we roll a dice its sample space is S={1,2,3,4,5,6}

Event: The subset of a sample space is called an event. For example if we toss two coins its sample space is S={HH,HT,TH,TT} and the subset
B={HH,HT} is an event.

Mutually Exclusive Events: Two events A and B are said to be mutually exclusive if they cannot both occur at the same time or in other words
there is no element common in A and B i.e. AՈB=φ. For example when we toss a coin if a head occurs tail cannot occur and vice versa so they
are mutually exclusive events.
Permutation: An arrangement of objects in a definite order is called permutation. If we have n objects and we wants to arrange r objects in a
n!
specific order then we use this formula n Pr 
(n  r )!

Combination: An arrangement of objects without caring an order is called combination. If we have n objects and we wants to arrange r objects
n!
without any order then we use this formula nCr 
r !(n  r )!

Law of Compliment:

P(A)=1-P(A’)
P(A’)=1-P(A)

Additive Law for mutually exclusive events:

P(AUB)=P(A)+P(B)

Additive Law for mutually exclusive events:

P(AUB)=P(A)+P(B)-P(AՈB)

For independent events P(AՈB)= P(B) P(B)

You might also like