Statistical Techniques - Formatted

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

STATISTICAL TECHNIQUES

Areas to be covered:
 Forecast And Budget

 Forecasting Techniques

 High Low Method

 Scattered Diagram

 Time Series Analysis

 Seasonal Variation

 Linear Regression

 Index Numbers
FORECAST AND BUDGET
• Forecast: A forecast is an estimate of what might happen in the future. Forecast is based
on some assumptions about the conditions that are expected to apply.

• Budget: A budget is a plan of what the organization is aiming to achieve and what it has
set as a target. Budgets are more realistic because management will try to establish
some control over the conditions that will apply in the future

FORECASTING METHODS
1. High- low method
2. Scatter graph method
3. Linear regression analysis
4. Time series analysis
1. High Low Method
It is simple forecasting technique based on historical data. It is already discussed in
cost classification chapter.
Advantages:
• It is easy to use and understand.
• It needs just two activity levels (highest and lowest)

Disadvantages:
• It considers two extreme points which may be representative of normal conditions.
• Based on two points so formula is not very accurate.
• Based on historical data.
2. Scatter graph method
One forecasting technique is the scatter graph method. This is graphical way of for casting.
Steps involve in forecasting under scatter graph method are:
a. Collect data of past volumes of output and the associated cost of producing that
output.
b. Plot the data on the graph which has cost on vertical axis and volume of output on the
horizontal axis.
c. Draw the line of best fit through the middle of the plotted points so that the distance of
points above the line is the same as the distance of points below the line.

The intersection of the line of best fit on the vertical axis is the fixed cost and slope of the line
represents variable costs. It is a method of visual judgments that is a disadvantage of this
method.
Correlation:
Two variables are said to be correlated if a change in the value of one variable is
accompanied by a change in the value of another variable.
For example:
• Total variable cost and production units.
• Selling price of a product and its demand.
The purpose of correlation analysis is to measure and interpret the strength of linear
relationship between two variables.

Degrees of correlation:
Two variables might be perfectly correlated, partly correlated or uncorrelated. Correlation
can be positive or negative. The differing degrees of correlation can be illustrated at scatter
diagrams.
Perfect correlation
(a) (b)

Y x Y x
x x
x x
x x
x x

X X
All the pairs of values lie on a straight line. An exact linear relationship
exists between the two variables

Partial correlation
(a) (b)
x x
Y x Y x
x x x x
x x x x
x x

X X
In (a), although there is no exact relationship, low values of X tend to
be associated with low values of Y, and high values of X with high
values of Y.
In (b) again, there is no exact relationship, but low values of X tend to b
associated with high values of Y and vice versa.
No correlation
(c)
Y x
x x
x x
x
x

X
The values of these two variables are not correlated to each other.

Positive and negative correlation


Correlation, whether perfect or partial, can be positive or negative.

Positive correlation: means that the low values of one variable are
associated with low values of other. And high values of one variable
are associated with high values of other.
Negative correlation: means that the low values of one variable are
associated with high values of other. And high values of one variable
are associated with low values of other.

Correlation coefficient (r):


It express degree of linear correlation between two variables. It
is from +1 to -1. Two variables are perfectly or partially correlated
and if they are partially correlated ,they there may be high or low
degree of correlation.
Formula:
The degree of correlation between two variables is measured by correlation coefficient.

n( xy) - ( x) ( y)
r=
[n x2- ( x)2] [ n y 2-( y)2]
Where X and Y represents pairs of data for two variables X and Y. And ‘n’ stands for

number of pairs of data used in calculation. Remember that correlation coefficient (r) always

lie between -1 and +1. If your calculation results in anything outside this range you must revise

your calculations.
Example 1:
Statistical department of a provincial government is currently developing a data base to find
out whether there is any relationship between an individual’s annual income and his level of
education. Following information has been so far collected:
Individual Income Education
$’000 Years
X Y
1 45 20
2 63 19
3 36 16
4 52 20
5 29 12

You are required to calculate correlation coefficient using above data.


Solution:
Correlation in time series
Correlation in time series
A time series in a sequence of data points measured typically at
successive times spaced at uniform intervals of time. It is often the
case that something is dependent on passage of time. A trend can be
found when analyzing a variable’s movement with respect to time.
The correlation coefficient is calculated with time as X variable,
the independent variable and other variable as Y, the dependent
variable. It is recommended that when analyzing correlation in a time
series it is more convenient to replace years (time) with digits, i.e. X
variable having time values of year 2001, 2002, 2003, 2004 and so on
should be replaced with 0, 1, 2, 3 and so on. Note that first year is
replaced with 0 not 1. You can use whatever figure for years but using
0, 1, 2, 3… is most simplified.
Correlation in time series
Example 2: Demand of a particular product between 1999 and 2004 was as follows:
Year 1999 2000 2001 2002 2003 2004
Demand (units ‘000) 22 19 18 16 15 12
Determine whether there is any relationship between time and demand?
Solution:
The coefficient of determination (r2 )
The coefficient of determination, r2, is simply the square of
correlation coefficient, r. it is useful because it gives the proportion of
variance (fluctuation) of one variable that is predictable from the

o t h e r v a r i a b l e . I n o t h e r w o r d s r2 e x p r e s s e s t h e p r o p o r t i o n o f t o t a l
variance in the value of one variable that can be fully explained by
the other variable.

T h e c o e f f i c i e n t o f d e t e r m i n a t i o n i s s u c h t h a t 0 ≤ r2 ≤ + 1 , i . e . i t c a n n o t

be a negative value. The coefficient of determination denotes the

strength of linear association between X and y.

If r= -0.99 so coefficient of determination is +0.98, which means that

98% of variation in a variable is explained by other variable.


3. Linear Regression
Regression analysis is the study of the relationship between variables. It is one of the most

commonly used business analysis tool and easy to use.

Line of best fit

Although correlation coefficient is used to trace out that whether there is any linear

relationship between any two variables but correlation coefficient solely cannot be used to

predict the value of dependent variable, Y, based on independent variable, X. Once it is

found that two variables are correlated we can use the line of best fit. We can use this

equation for forecasting; putting a value for variable X and deriving a forecast value for

dependent variable Y.
3. Linear Regression
Where,
Dependent variable: is the single variable being explained/predicted by regression model.
Independent variable: is the explanatory variable used to predict dependent variable.

Estimating line of best fit


The line of best fit is a cost equation and is of the form:
Y = a + bx
Where,
a = total fixed cost,
b = gradient/slope of line or variable cost per unit
Regression analysis is used to establish the line of the best fit. Once the equation of line of
best fit is determined, it can be used for casting.
Estimating line of best fit
Regression analysis uses following formulas for estimating line of best fit:

b = n xy - x y and
n x2 - ( x) 2

a=
n n
Where n is the number of data pairs used in analysis.
Example 3: Following data is available for level of output and costs incurred at relevant
output level:
Output (‘000 units) 10 15 13 18 19 20
Cost ($’000) 40 55 48 65 69 81
Calculate total cost at an activity level of 22,500 units using regression analysis.

Solution:
Regression line and time series analysis
Regression line can also be used in time series analysis. Time to be taken as

independent variable and years to be replaced with 0,1,2,3 and so on correlation

coefficient is calculated.

The reliability of regression model in forecasting

As is the case with any other model, results from regression analysis will not be

accurate or reliable. There are a number of limitations of this model which cast doubt on

its results:
Reliability of Forecast in Linear Regression:
• This model assumes that there exists a linear relationship but this is not always true, there
might be a non-linear relationship. The model is only appropriate if there is a linear
relationship between two variables.
• The model assumes that that there are only two variables. Value of one variable, the
dependent variable Y, is predicted from value of one other variable, the independent
variable X. This is quite unrealistic as the value of Y might be affected by many other factors
not considered at all.
• Past behavior is used to forecast future. The model assumes that past movement pattern of
two variables will continue in the future. Again, this is an unrealistic assumption.
• Linear regression model is limited to predicting numeric output only. It cannot be used to
predict any other sort of information.
• A lack of explanation about what has been learned can be a problem. Prediction of a
figure not that is all desired.
Reliability of Forecast in Linear Regression:
• The model is only appropriate if used to predict value of dependent variable within relevant

range. Predicted results are not reliable if model is used for extrapolation.

o Interpolation means using a line of best fit to predict a value within the two extreme

points of the observed range.

o Extrapolation means using a line of best fit to predict a value outside the two extreme

points.

• There must be sufficient number of data pairs. Even if correlation is high between two

variables and have less than ten pairs of data any forecast value should be regarded as

somewhat unreliable.
Regression line and time series analysis
We can still use the forecast produced by the model with high confidence if correlation

coefficient between two variables is high. Coefficient of determination tells us that how

much of the variation in cost can be explained by volume level. Higher the coefficient of

determination the higher the reliance that could be placed on predicted result. As a general

rule if correlation is high (say positive or negative 0.9) the actual values will all lie close to

regression line. And if correlation is below 0.7 (-0.7 ≤ r ≤ +0.7), predicted value will only be a

rough estimate of what the value of Y is likely to be.


Advantages of regression analysis:
• It gives definitive line of best fit after taking account of all the given data.

• It produces good forecasting results.

• Many processes are linear so they are well defined by regression analysis.
4. Time Series Analysis

A time series is a series of figures relating to the changing value of a variable

over time. The data often conforms to a certain pattern over time. It is use to

forecast sales.

Graph of time series called a HISTORIGRAM.

This pattern can be extrapolated into the future and hence forecasts are possible.

Time periods may be any measure of time including days, weeks, months and

quarters.
4. Time Series Analysis

For example

• Annual cost for last ten years,

• Number of people employed in each last 10 years,

• Output per day of last month,

• Sales per month of last 3 years, etc.


The Four Components of a time series are:
a. Trend: this describes the long term general movement of the data recorded.

b. Seasonal variations: are short term fluctuations in recorded values, a regular variation

around the trend over a fixed time period, usually one year.

c. Cyclical variations: are long term fluctuations in recorded values, economic cycle of

booms and slumps. It takes several years to complete.

d. Random variations: irregular, random fluctuations in the data usually caused by factors

specific to the time series. They are unpredictable.


a. Trends
Long term movement over time in the value of data recorded. For example,

Trend
Downward trend Upward No clear
trend movement/static
Years Output/hour(units) Cost/unit Number of employees
($)
4 30 1 100
5 24 1.08 103
6 26 1.20 96
7 22 1.15 102
8 21 1.18 103
9 17 1.25 98
Finding a trend
One method of finding the trend is by the use of moving averages. (Take
moving averages which covers a cycle)

Moving averages of

Time series of even numbers Time series of odd numbers

Apply two times moving averages Apply once moving averages

(Because trend value should relate to a specific period)

Remember that when finding the moving average of an even number of


result, a second moving average has to be calculated so that values can relate to
specific actual figures. This method attempts to remove seasonal (or cyclical)
variation from a time series by a process of averaging so as to leave a set of
figures representing the trend. Moving average figure relate to midpoint of overall
period.
Example 4:
(Odd numbers)

Year Sales units

2000 390
2001 380
2002 460
2003 450
2004 470
2005 440
2006 500
Take a moving average of the annual sales over a period of three years.
Moving average of an even number of results
If the moving average were taken of results in an even number of time periods, the

basic technique would be the same, but the midpoint of the overall period would not

relate to single period. The trend line average figures need to relate to a particular time

period. To overcome this difficulty, take a moving average of the moving average.
Example 5:
Calculate the trend using moving average.

Year Quarter Volume of sales (‘000 units)


2005 1 600
2 840
3 420
4 720
2006 1 640
2 860
3 420
4 740
2007 1 670
2 900
3 430
4 760
Solution:
Actual volume Moving average of 4 Midpoint of 2 moving
Year Quarter
of sales quarters’ sales averages trend line

‘000 units ‘000 units ‘000 units


(A) (B/A) (C)
2005 1 600

2 840
645
3 420 650
655
4 720 657.50
660
2006 1 640 660
660
Solution: Year Quarter
Actual volume Moving average of 4 Midpoint of 2 moving
of sales quarters’ sales averages trend line
2 860 662.50
665
3 420 668.75
672.50
4 740 677.50
682.50
2007 1 670 683.75
685
2 900 687.50
690
3 430

4 760

687.50 650
5
8 1
b. Seasonal Variation
Short term fluctuations due to change in season. Affect seasonal businesses
like ice-cream manufacturing.

Finding the seasonal variation


There are two models to find out seasonal variations:
• Additive model
• Multiplicative model

Additive model
Seasonal variations are the difference between actual and trend figures. An
average of the seasonal variations for each time period within the cycle must be
determined and then adjusted so that the total of the seasonal variations sums to
zero.
Seasonal variation = actual sales – trend
So
Time series (actual sales) = trend + seasonal variation
Here Y = T + R + S
Continue Example 5:
Seasonal
Year Quarter Actual volume of sales Trend variation
‘000 units ‘000 units ‘000 units
2005 1 600
2 840
3 420 650 -230
4 720 657.50 62.50
2006 1 640 660 -20
2 860 662.50 197.50
3 420 668.75 -248.75
4 740 677.50 62.50
2007 1 670 683.75 -13.75
2 900 687.50 212.50
3 430
4 760
The variation between the actual result for any particular quarter and the trend line

average is not the same from the year to year, but an average of these variations can be

taken.
Q1 Q2 Q3 Q4
2005 -230 62.50
2006 -20 197.50 -248.75 62.50
2007 -13.75 212.50
Total -33.75 410 -478.75 125
Average (divided by 2) -16.875 205 -239.375 62.50
Estimate of the seasonal or quarterly variation is almost done, but there is one more

important step to take. Variations around the basic trend line should cancel each other out,

and add to the ‘zero’. At the moment they do not. Therefore spread the total of the

variations (11.25) across the four quarters (11.25/4) so that the final total of the variations sum

to zero.
Q1 Q2 Q3 Q4 Total

Estimated quarterly variations -16.875 205 -239.375 62.50 11.25

Adjusted to reduce variations to 0 -2.8125 -2.8125 -2.8125 -2.8125 -11.25

Final estimates of quarterly variations -19.6875 202.1875 -242.1875 59.6875 0

These might be rounded as follows: = -20 = 202 = -242 = 60 Total = 0


2. Multiplicative model
This model assumes that the components of the series are independent of each
other. In this model, each actual figure is expressed as a proportion of the trend.

Seasonal variation = actual sales / trend


So
Time series Y = T x S x R

The trend component will be same in both models but the seasonal and random
component will vary according to the model. In our example, we assume that random
component is small and so ignore it. So:

Y=TxS
Then:
S = Y/T
Continue Example 5:
Actual volume of Seasonal variation
Year Quarter Trend (T)
sales (Y) (Y/T)

‘000 units ‘000 units ‘000 units


2005 1 600
2 840
3 420 650 0.646
4 720 657.50 1.095
2006 1 640 660 0.970
2 860 662.50 1.298
3 420 668.75 0.628
4 740 677.50 1.092
2007 1 670 683.75 0.980
2 900 687.50 1.309
3 430
4 760
Q1 Q2 Q3 Q4
% % % %
2005 0.646 1.095
2006 0.970 1.298 0.628 1.092
2007 0.980 1.309 - -
Total 1.950 2.607 1.274 2.187
Average (divided by 2) 0.975 1.3035 0.637 1.0935

Instead of summing to zero, average should sum to 4 or 1 for each of the


four quarters.

Q1 Q2 Q3 Q4 Total

Estimated quarterly variations 0.975 1.3035 0.637 1.0935 4.009

Adjusted to reduce variations to 4 -0.00225 -0.00225 -0.00225 -0.00225 -0.009

Final estimates of quarterly variations 0.97275 1.30125 0.63475 1.09125 4

These might be rounded as follows: = 0.97 = 1.30 = 0.64 = 1.09 Total = 4

Multiplicative model is better than additive model


Index Number / Indices
Index is a measure of change over time by making some base. It
provides standard way of comparing the values.
Index

Price index Quantity index


Measure of change in the money Measure of change in the non-monetary
value of a group of items over time value of a group of items over time

Pn Qn
Price Index = x 100 Quantity index = x 100
Po Qo

Index number is calculated


by taking base

Fixed base Chain base


One base is selected and all Take the base value of the
subsequent changes are period immediate.
measured against that fixed
base before.
(use where basic nature of commodity is changed overtime)
Example 6:
Great Ltd sold leather jackets in 20X5 for $20, in 20X6 they were $25, in 20X7 $30 and in 20X8
$35. Assuming the base year to be 20X5, the price index numbers for the years 20X6 to
20X8 can be calculated as follows:

Solution:

20X6 index number =

20X7 index number =

20X8 index number =


Example 7:
Wood Ltd produces and sells high quality furniture in the UK. The total number of cupboards sold
by Teakwood Ltd was 4,000 in 20X5, 6,000 in 20X6, 9,000 in 20X7 and 10,000 in 20X8.
Assuming the base year to be 20X5, the quantity index numbers for the years 20X6 to 20X8
can be calculated as follows:

Solution:
20x6 index number =

20x7 index number =

20x8 index number =


Laspeyer index number
A Laspeyer index is a special type of weighted average index. It always uses weights

from the base time period.


Example 8: The following are the list of ingredients used by Popo to make soup;
20X4 20X6
Items
Quantity (Kg) $ Price Quantity (Kg) $ Price
Melted Butter 8 45 14 34
Inventory 15 66 30 55
Corn flour 9 32 12 86
Liquidized
20 65 5 25
Vegetables
Calculate the following price and quantity index with Laspeyer’s and Paasche’s in 20X6,if
20X4 is taken as the base year.
Solution:
Solution:
Advantages of index number
• Index numbers help management to understand information

• The information or data is presented in percentage terms. For e.g. an increase of 5% in

sales is more meaningful than sales has increased from $4,567,990 to $4,796,390.

• Comparing data and drawing conclusions is much easier with the help of indices.

• Calculating the quantity and price index separately helps the management to know

both the variables independently.


Disadvantages of index number
• A base period has to be selected.

• The index can be calculated by different methods, therefore, there is no single correct

method to calculate an index.

• The results are on an approximation basis, and not exact.

• The figures obtained are averages. Significant changes in variables cannot be seen with

the final results.

• They could be misleading as certain things can be changed over time.

• Indices consider new products that may appear; the old ones may be ignored.
Forecasting problems:
Forecasting problems:

All forecasting methods are subject to have errors but it vary from case to case. Some main

problems are:

• Future is always unpredictable or uncertain.

• Less data is available so less reliable forecasts.

• Pattern of forecasts and seasonal variations cannot be guaranteed to be continued in

future.

• There is always a danger of random variations.


Other changes which effects future forecasts:
Other changes which affects future forecasts:

• Political and economical changes: (It creates uncertainty for example change in interest

rates, exchange rates or inflation).

• Environmental and Social changes: (Changes in market will affect other company’s’

market).

• Technological changes and advances.

You might also like