White Paper On Regression

PRAXIS BUSINESS SCHOOL
White Paper on regression

A Report
Submitted to
Dr. Prithwis Mukherjee
In partial fulfilment of the requirements of the course
Quantitative Technique-2
On 07/09/2010
By
Ashish Maheshwari
( B09004)
 Statistical Modelling:
Statistical modelling involves the appropriate application of statistical techniques, each
requiring certain assumptions to perform hypothesis tests, interpret the data and reach valid
conclusions. Data from experiments, product testing, simulation, surveys, and statistical
process and quality control must be appropriately analyzed before results can be
determined and conclusions drawn. The results from experiment or testing must be obtained
following established statistical procedures, including experimental design and the
appropriate use of statistical analysis and modelling techniques. These results can then be
reproduced, within sampling error, by repeating the experiment.
Statistical modelling requires careful selection of analytical techniques, verification of

assumptions, and verification of data. Descriptive statistics, graphs and relational plots of
the data should first be examined to evaluate the legitimacy of the data, identify possible
outcomes and assumptions and form preliminary ideas on variable relationships for
modelling.
Benefits:
• Application of appropriate statistical analysis techniques
• Development of appropriate conclusions and key learning from the data
• Ensuring results address experimental objectives
• Maximizing information gained from the data
• Maximizing chances of the experiment being successful
Techniques:
1. Statistical analysis and modelling techniques
2. Descriptive techniques
3. Data graphs, plots and exploratory data analysis
4. Multi linear regression analysis
5. Logistic regression
6. Time series analysis
7. Discrminant analysis
8. Factor analysis
9. Cluster analysis
10. Multivariate analysis
11. Nonparametric analysis
12. Experimental design
 Pitfalls in using regression:

Regression analysis are statistical tool that, when properly used, can help people to make
decisions. But of the times they are not used in a proper way, they are misused. As a result,
decision makers often make inaccurate forecast. The most common errors made while
using Regression is as follows:
1. Specific limited range over which regression equation holds:
A common mistake is to assume that the estimating line can be applied over any range of
values. Hospital administrators can properly use regression analysis to predict the
relationship between cost per bed and occupancy levels. Some administrators however
incorrectly use the same regression to predict the cost per bed for occupancy levels that are
significantly higher than those were used to estimate regression line. The people make
decision on one set of cost and find that the cost change drastically as occupancy
increases.
2. Regression analysis do not determine cause and effect:
Another mistake which we assume while doing regression analysis is to assume that a
change in one variable is caused by change in the other variable.
Considering the example of research and development expenses and annual profit to
illustrate various aspects of regression analysis. It is really unlikely to say that profit in a
given year is caused by research and development expenditure in that year. In high
technology industries the research and development activity can be used to explain profits,
but a better way to do so would be to predict current profits in terms of past research and
development expenditure including economic conditions, dollars spent on advertising and
other variables .This can be done by using multiple regression techniques.
3. Conditions change and invalidate the regression equation:
Care must be taken when we use historical data to estimate the regression equation.
Condition can change and violate one or more of the assumptions on which our regression
analysis depends.
4. Values of variable change over time:
Another error which may arise is the dependence of some variables on time. Suppose a firm
uses regression analysis to determine the relationship between the number of employees
and production volume. If the observation used in the analysis to determine extend back for
several years, the resulting regression line may be too steep because it may fail to
recognise the effect of changing technology.
5.
Relationships that have no common bond:
When applying regression analysis people sometime find a relationship between two
variables that, in fact have no common bond.
For example, to find a statistical relationship between a random variable of the number of
miles per gallon consumed by eight different cars and the distance from earth to other eight
planets. But because there is no common bond between gas mileage and the distance to
other planets, this relationship would be meaningless.
6. Finding things that do not exist:
In this regard, if one have to run a large number of regressions between many pairs of
variables, it would be possible to get some interesting relationships. For example, to find a
high statistical relationship between your income and the amount of beer consumed in the
US or even between the length of weight train and the weather. But in neither case there is
a factor common to both variables. Hence, such relationships are meaningless.
7. Misinterpreting r and r2: :
The coefficient of determination is misinterpreted if we use r2 to describe the percentage of

change in the dependent variable that is caused by a change in the independent variable.
This is wrong because r2 is a measure only of how well one variable describes another, not
of how much of the change in one variable is caused by the other variable.
 Techniques of regression that can be used to model social and business

scenarios:
Regression analysis is a statistical forecasting model that is concerned with describing and
evaluating the relationship between the given two variables i.e. dependent and independent.
Regression analysis can predict the outcome of a given key business indicator (dependent
variable) based on the interactions of other related business drivers (explanatory variables).
Use of regression in Business model:
1. Trend line analysis:
Line regression is used in the creation of trend lines, which uses past data to predict future
performance or trends. Usually trend lines are used in business to show the movement of
financial or product attributes over time to time. Stock prices, oil prices, or product
specification can all be analysed using trend lines.
2. Risk analysis for Investments:

The capital asset pricing model was developed using linear regression analysis, and a
common measure of the volatility of a stock or investment is its beta, which can be
determined using linear regression. Linear regression and its use is key in assessing the risk
associated with the most investment vehicle.
3. Sales or Market forecasts:

Multivariate linear regression is a method for forecasting sales volume, or market movement
to create comprehensive plans for growth. This method is more accurate than trend
analysis, as trend analysis only looks at how one variable changes with respect to another.
4. Total quality control:

Quality control methods make frequent use of linear regression to analyse key product
specifications and other measurable parameters of product or organisational quality (such
as number of complaints over time etc.)
5. Linear Regression in Human resource:

Linear regression methods are also used to predict the demographics and types of future
work force for large companies. This helps the companies to prepare the need of the work
force through development of good hiring plans and training plans for the existing
employees.
Social Model:
1. Health survey :
Taking example of Tuberculosis scenario during National Family Health Survey. If we take
the relationship of reporting TB infection and seeking treatment for men and women by
various socio- economic characteristics, multivariate logistic regression are applied to find
the significant factors explaining reporting TB and treatment- seeking.
2. Analysis on Urbanization:
Taking example of China’s urbanization projection level, which can be projected by applying
regression model and S- curve regression model.
Its formula is : ut=a0+a1*t
Where, t is the independent variable of year, ut is the dependent variable of urbanisation
level in year t.
Based on the urbanisation level in 1990 cencus definition in the period of 1983-1999, the
constants in this formula are estimated and the linear regression simulation equation :
Ut= -1026.54+0.529*t
The static feature of this equation are as the following :
R2=0.98, F= 714.46, sig F=0.00000,
Which indicates that the simulation model is statistically significant.
Source:(www.iiasa.ac.at/admin)
3. Land use change scenario projections:
If the study area includes all the countries in the world, We derive future proportions of
artificial surfaces per region from projections of population and GDP, using a regression
model. We calculated a linear regression model linking the proportion of artificial surfaces
per region to the population and gross domestic per capita, with the country and urban type
city as additional factors.
 How does one test the validity of regression model – in terms of
a. Coefficient of determination:
In statistics Coefficient of determination, R2 is used in the context of statistical models whose
main purpose of future outcomes on the basis of other related information. It is the
proportion of validity in a data set that is accounted for by the statistical model. It provides a
measure of how well future outcomes are likely to be predicted by the model. There are
several different definitions of R 2 which are only sometimes equivalent. One class of such
cases includes that of linear regression. In this case, R2 is simply the square of the sample
correlation coefficient between the outcomes and their predicted values, or in the case of
simple linear regression, between the outcome and the values being used for prediction. In
such cases, the values vary from 0 to 1. If it is more towards 1, the model is valid and if it
more towards 0, the model is less valid.
b. Statistical significance of the identified slope coefficients:

The slope coefficient gives the degree of magnitude in change of independent variable on
dependent variable. For example if slope coefficient is -2, it states that 1 % increase in
independent variable leads to 2 % decrease in dependent variable. It also gives us how
important the independent variable is for deciding the future of dependent variable.
 Business model
DATE OCL CHANGE( DATE SENSE CHANGE

Y) X (Y)
Mar- 212.0 Mar- 5,649.

04 5 04 30
Apr- 252 19% Apr- 5,599. -1%

04 04 12
May- 299 19% May- 5,645. 1%

04 04 86
Jun-04 305 2% Jun-04 4,792. -15%

01
Jul-04 309.9 2% Jul-04 4,813. 0%

76
Aug- 310 0% Aug- 5,193. 8%

04 04 25
Sep- 338 9% Sep- 5,202. 0%

04 04 16
Oct- 389.5 15% Oct- 5,587. 7%

04 04 46
Nov- 361 -7% Nov- 5,678. 2%

04 04 65
Dec- 359 -1% Dec- 6,259. 10%

04 04 28
Jan-05 421 17% Jan-05 6,626. 6%

49
Feb- 395 -6% Feb- 6,565. -1%

05 05 21
Mar- 426 8% Mar- 6,725. 2%

05 05 92
Apr- 564 32% Apr- 6,506. -3%
05 05 60
May- 580 3% May- 6,183. -5%

05 05 07
Jun-05 572.1 -1% Jun-05 6,729. 9%

39
Jul-05 575 1% Jul-05 7,165. 6%

45
Aug- 650 13% Aug- 7,632. 7%

05 05 01
Sep- 188 -71% Sep- 7,818. 2%

05 05 90
Oct- 159.9 -15% Oct- 8,662. 11%

05 05 99
Nov- 120 -25% Nov- 7,989. -8%

05 05 86
Dec- 151 26% Dec- 8,813. 10%

05 05 82
Jan-06 155 3% Jan-06 9,422. 7%

49
Feb- 150.3 -3% Feb- 9,959. 6%

06 06 24
Mar- 144 -4% Mar- 10,368 4%

06 06 .75
Apr- 148.9 3% Apr- 11,342 9%

06 5 06 .96
May- 206.9 39% May- 12,103 7%

06 06 .78
Jun-06 159.9 -23% Jun-06 10,472 -13%

5 .46
Jul-06 142.6 -11% Jul-06 10,616 1%

5 .97
Aug- 153.3 7% Aug- 10,737 1%

06 06 .50
Sep- 158.8 4% Sep- 11,699 9%

06 5 06 .57
Oct- 172.5 9% Oct- 12,473 7%

06 06 .79
Nov- 170.2 -1% Nov- 12,992 4%

06 5 06 .62
Dec- 172 1% Dec- 13,729 6%

06 06 .67
Jan-07 166.6 -3% Jan-07 13,827 1%

.77
Feb- 172 3% Feb- 14,124 2%

07 07 .36
Mar- 154.2 -10% Mar- 13,013 -8%

07 07 .74
Apr- 141 -9% Apr- 12,811 -2%

07 07 .93
May- 149 6% May- 13,987 9%

07 07 .77
Jun-07 151.7 2% Jun-07 14,610 4%

5 .28
Jul-07 147.6 -3% Jul-07 14,685 1%

5 .16
Aug- 148 0% Aug- 15,344 4%

07 07 .02
Sep- 143 -3% Sep- 15,401 0%

07 07 .99
Oct- 162 13% Oct- 17,356 13%

07 07 .99
Nov- 302.1 86% Nov- 20,130 16%

07 07 .23
Dec- 320 6% Dec- 19,547 -3%

07 07 .09
Jan-08 340 6% Jan-08 20,325 4%

.27
Feb- 227 -33% Feb- 17,820 -12%

08 08 .67
Mar- 209.6 -8% Mar- 17,227 -3%

08 08 .56
Apr- 150 -28% Apr- 15,771 -8%

08 08 .72
May- 138.2 -8% May- 17,560 11%
08 5 08 .15
Jun-08 132 -5% Jun-08 16,591 -6%

.46
Jul-08 99.05 -25% Jul-08 13,480 -19%

.02
Aug- 95.15 -4% Aug- 14,064 4%

08 08 .26
Sep- 96.6 2% Sep- 14,412 2%

08 08 .99
Oct- 68 -30% Oct- 13,006 -10%

08 08 .72
Nov- 62 -9% Nov- 10,209 -22%

08 08 .37
Dec- 41 -34% Dec- 9,162. -10%

08 08 94
Jan-09 43.45 6% Jan-09 9,720. 6%

55
Feb- 50.9 17% Feb- 9,340. -4%

09 09 37
Mar- 43.6 -14% Mar- 8,762. -6%

09 09 88
Apr- 45.95 5% Apr- 9,745. 11%

09 09 77
May- 71 55% May- 11,635 19%

09 09 .24
Jun-09 95.55 35% Jun-09 14,746 27%

.51
Jul-09 96.9 1% Jul-09 14,506 -2%

.43
Aug- 112.9 17% Aug- 15,694 8%

09 5 09 .78
Sep- 131.0 16% Sep- 15,691 0%

09 5 09 .27
Oct- 138 5% Oct- 17,186 10%

09 09 .20
Nov- 110 -20% Nov- 15,838 -8%

09 09 .63
Dec- 111.4 1% Dec- 16,947 7%

09 5 09 .46
Jan-10 126.8 14% Jan-10 17,473 3%

.45
Feb- 128.5 1% Feb- 16,339 -6%

10 10 .32
1.613675 1.86311
94% 25%
(Source: www..bseindia..com)
The data above shows the closing price per month of Orissa cements limited starting from March 04
to Februarys 10 vis-a -vis data of sensex starting from march 04 to February 10. Therefore, by
running regression analysis with the help of this data, we can calculate the Beta of the given stock.
When analysts use capital asset pricing model (CAPM), they generally use regression to calculate
Beta. Beta is use to calculate the cost of capital for a company. It helps in valuing a company and
further equity research and recommendation to the investors.
• Hypothesis 1:
Stock price of a company depends upon sensex.
• Hypothesis 2:
The stock price of the company is more sensitive than the sensex.
Since the statistical use of regression may overwhelm some, Microsoft excel has packaged them in
their standard copy of the software. Below, excel 7.0 is used to illustrate the ease of calculating the
regression.
 Step 1:
Dependent variable: Stock price of OCL.
 Step2:
Independent variable: Sensex price
 Step 3:
Obtain data for dependent variable and independent variable from past periods. For this business
model, we will use stock of OCL as well as sensex, starting from March 04 to February 10 .
 Step 4:
Run the regression to assess the level of fit. In order to complete regression analysis, we first need to
add a piece of software that comes with standard version of excel. Once the information is input,
select the data which to be analysed and run the regression tool to view regression dialog bbox. Keep
in mind that the Y range is the dependent variable and the X range is the independent variable.
1. Basic
R2
statist
Regression Statistics 2. R2 statistic
0.5547 for analysis
Multiple R 17 purpose
0.3077 3. Standard
ANOVA
R Square 11 error for each
4. Total sum
Adjusted R of df
0.2976 SS
squared regression. 771 standard
Square
Regression 0.92624
0.1737 2
Standard Error
Residual 8469 2.08386
Observations 71 5
Total 5.
70 Total
3.01010 of
sum
7 The performance of sensex is equal to the collective
squared errors.
performance of all the fifty companies stock in BSE.
We assume here that the volatility of sensex will
Coefficie Standard affect the stock price of a company. If an increase in
nts Error
6. Total sum
sensexof
increases the stock price then there is a
- 0.02112
squares.positive correlation in between them and vice-versa.
Intercept 0.00916 4
1.35798 0.24521
X Variable 1 9 3
Y=0.2305x+0.0159
Executive Summary:
The above linear regression model gives us idea of Beta of the stock of a company which in turn
infers about the volatility of that stock. This also presents us the fact how the stock of a company is
performing in the market and whether it in accordance with the economic growth of the country. It
simplifies the fact that the sensex returns for a day have a positive or a negative impact on the daily
stock return of a company.
Business Model:
Yea No. of cars fuel price per barrel 1/fuel price per barrel Per capita
rs sold in Rs in Rs income
200
2 6626387 1112.67 0.000898738 19040
200
3 6240526 1292.85 0.000773487 20989
200
4 6814554 1702.16 0.000587491 23241
200
5 7338314 2177.74 0.000459191 20813
200
6 8036010 2643.91 0.000378228 23222
200
7 8534690 2605.88 0.000383748 29382
200
8 9237780 4258.39 0.00023483 37490
I have taken data of number of car sold of Toyota , fuel price per barrel and per capita income from
year 2002 to 2008.
Source:
• Number of passenger vehicle sold in India (2002-2008) www.siam.com
• Per capita income of India ( 2002-2008) www.economywatch.com
• Crude oil price ( 2002- 2008) www.ioga.com
 Dependent Variable : Number of car sold

 Independent variable: 1/ fuel price per barrel in Rs. and per capita consumption
The business model in this context is to find out the dependency of sale of Toyota cars in relation to
fuel price and per capita income. From this model we can forecast the sale of Toyota.
 Hypothesis 1:
Sale of Toyota car depend upon per capita income
 Hypothesis 2:
Sale of Toyota car depend upon fuel price.
SUMMARY OUTPUT
Regression Statistics
0.9493
Multiple R 42
0.9012
R Square 49
Adjusted R 0.8518
Square 74
421834
Standard Error .6
Observations 7
ANOVA
Significanc
df SS MS F eF
3.25E+ 18.253
Regression 2 6.5E+12 12 04 0.009752
1.78E+
Residual 4 7.12E+11 11
Total 6 7.21E+12
Coefficie Standard P- Lower Upper Lower Upper

nts Error t Stat value 95% 95% 95.0% 95.0%
4.4710 0.0110 1127977
Intercept 6958610 1556367 58 66 2637441 9 2637441 11279779
-
1/fuel price per barrel 2.2396 0.0886 6.11E+0
in Rs -2.5E+09 1.14E+09 8 53 -5.7E+09 8 -5.7E+09 6.11E+08
1.8745 0.1341 193.524
Per capita income 77.99742 41.60968 02 31 -37.5296 4 -37.5296 193.5244
R2 is 0.94 which is very near to 1, that indicates sale of Toyota cars is depend on fuel price as well as
per capita income. The model can be Y=6958610-2.5E+0.9x1 + 77.99742x2
Where,
Y= sale of Toyota car.
X1 =1/ fuel price per barrel in Rs.
X2= per capita income.
Y=-4E+09x + 1E+07
Y= 149.56x+4E+06
Executive Summary:
The above model gives idea about the expected sale of Toyota car next year. In this model fuel price
and per capita income are to be taken as independent variable. So its easy to get a data of expected
per capita income and fuel price. We can put data in this model and easily find out the expected sale
of Toyota car next year. Here in this model the assumption is that sale of Toyota is only depend on
the two variables which may or may not be true. The limitation of this model is only applicable in India.

White Paper On Regression

Uploaded by

Copyright:

Available Formats

White Paper On Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

White Paper On Regression

Uploaded by

Copyright:

Available Formats

What statistical techniques are discussed in the document?

What statistical techniques are discussed in the document?

What are some pitfalls of using regression analysis?

What are some pitfalls of using regression analysis?

PRAXIS BUSINESS SCHOOL

White Paper on regression

Statistical modelling requires careful selection of analytical techniques, verification of

 Pitfalls in using regression:

4. Values of variable change over time:

The coefficient of determination is misinterpreted if we use r2 to describe the percentage of

 Techniques of regression that can be used to model social and business

2. Risk analysis for Investments:

3. Sales or Market forecasts:

4. Total quality control:

5. Linear Regression in Human resource:

 How does one test the validity of regression model – in terms of

b. Statistical significance of the identified slope coefficients:

DATE OCL CHANGE( DATE SENSE CHANGE

Mar- 212.0 Mar- 5,649.

Apr- 252 19% Apr- 5,599. -1%

May- 299 19% May- 5,645. 1%

Jun-04 305 2% Jun-04 4,792. -15%

Jul-04 309.9 2% Jul-04 4,813. 0%

Aug- 310 0% Aug- 5,193. 8%

Sep- 338 9% Sep- 5,202. 0%

Oct- 389.5 15% Oct- 5,587. 7%

Nov- 361 -7% Nov- 5,678. 2%

Dec- 359 -1% Dec- 6,259. 10%

Jan-05 421 17% Jan-05 6,626. 6%

Feb- 395 -6% Feb- 6,565. -1%

Mar- 426 8% Mar- 6,725. 2%

May- 580 3% May- 6,183. -5%

Jun-05 572.1 -1% Jun-05 6,729. 9%

Jul-05 575 1% Jul-05 7,165. 6%

Aug- 650 13% Aug- 7,632. 7%

Sep- 188 -71% Sep- 7,818. 2%

Oct- 159.9 -15% Oct- 8,662. 11%

Nov- 120 -25% Nov- 7,989. -8%

Dec- 151 26% Dec- 8,813. 10%

Jan-06 155 3% Jan-06 9,422. 7%

Feb- 150.3 -3% Feb- 9,959. 6%

Mar- 144 -4% Mar- 10,368 4%

Apr- 148.9 3% Apr- 11,342 9%

May- 206.9 39% May- 12,103 7%

Jun-06 159.9 -23% Jun-06 10,472 -13%

Jul-06 142.6 -11% Jul-06 10,616 1%

Aug- 153.3 7% Aug- 10,737 1%

Sep- 158.8 4% Sep- 11,699 9%

Oct- 172.5 9% Oct- 12,473 7%

Nov- 170.2 -1% Nov- 12,992 4%

Dec- 172 1% Dec- 13,729 6%

Jan-07 166.6 -3% Jan-07 13,827 1%

Feb- 172 3% Feb- 14,124 2%

Mar- 154.2 -10% Mar- 13,013 -8%

Apr- 141 -9% Apr- 12,811 -2%

May- 149 6% May- 13,987 9%

Jun-07 151.7 2% Jun-07 14,610 4%

Jul-07 147.6 -3% Jul-07 14,685 1%

Aug- 148 0% Aug- 15,344 4%