Unit II
Unit II
Unit II
Descriptive Statistics
The descriptive statistics is concerned with describing or summarising the
numerical properties of data. The methodology of descriptive statistics includes
classification, tabulation, graphical representation and calculation of certain
indicators such as mean, median, range, etc. which summarise certain important
features of data. It restricts to generalisation and to specifically a particular group of
individuals being observed. No conclusion can be drawn beyond this group. The data
describe only one group on which these have been collected. Many such action
researches involve descriptive analysis. These researches provide worthy
information’s regarding the nature of a specific group of individuals.
Inferential Statistics
Inferential statistics, which is also referred to as statistical inference, is
concerned with derivation of scientific inference about generalisation of results from
the study of a few particular cases. Technically speaking, the methods of statistical
inference help in generalising the results of a sample to the entire population from
which the sample is drawn. It should be kept in mind while selecting a sample that it
should approximately represent the larger group of population. Thus the
characteristics of the sample will represent the characteristics of the total groups.
The nature of inference is inductive in the sense that we make general statements
from the study of a few cases. Inferential statistics provides us the tools of making
inductive inference scientific and rigorous. In such inference, it is presumed that the
generalization cannot be made with certainty.
Some uncertainty is inevitable since in some cases the inference drawn from
the data of a sample survey or an experiment can be wrong. However, the degree of
uncertainty is itself measurable and one can make rigorous statements about the
uncertainty (or the chance of being wrong) associated with a particular inference.
This uncertainty in inference is dealt with by applying the theory of probability, which
is the backbone of statistical inference. It is a branch of mathematical statistics that
deals with measurement of the extent of certainty of events whose occurrence
depends on chance.
Measures of Central Tendency
Mean:
Mean = Sum of all values / number of values.
Mean is typically the best measure of central tendency because it takes all
values into account. But it is easily affected by any extreme value/outlier. Note that
Mean can only be defined on interval and ratio level of measurement
Merits of mean:
4. Arithmetic mean cannot be computed when class intervals have open ends
Median
The median is that value of the series which divides the group into two equal
parts, one part comprising all values greater than the median value and the other part
comprising all the values smaller than the median value.
Merits of Median:
Simplicity:- It is very simple measure of the central tendency of the series. I the case
of simple statistical series, just a glance at the data is enough to locate the median
value.
Free from the effect of extreme values: - Unlike arithmetic mean, median value is not
destroyed by the extreme values of the series.
Certainty: - Certainty is another merits is the median. Median values are always a
certain specific value in the series.
Real value: - Median value is real value and is a better representative value of the
series compared to arithmetic mean average, the value of which may not exist in the
series at all.
Graphic presentation: - Besides algebraic approach, the median value can be
estimated also through the graphic presentation of data.
Possible even when data is incomplete: - Median can be estimated even in the case of
certain incomplete series. It is enough if one knows the number of items and the
middle item of the series.
Demerits of median:
Following are the various demerits of median:
Lack of representative character: - Median fails to be a representative measure in
case of such series the different values of which are wide apart from each other. Also,
median is of limited representative character as it is not based on all the items in the
series.
Unrealistic:- When the median is located somewhere between the two middle values,
it remains only an approximate measure, not a precise value.
Lack of algebraic treatment: - Arithmetic mean is capable of further algebraic
treatment, but median is not. For example, multiplying the median with the number
of items in the series will not give us the sum total of the values of the series.
However, median is quite a simple method finding an average of a series. It is quite a
commonly used measure in the case of such series which are related to qualitative
observation as and health of the student.
Mode
The value of the variable which occurs most frequently in a distribution is called the
mode.
Merits of Mode:
Simple and popular: - Mode is very simple measure of central tendency. Sometimes,
just at the series is enough to locate the model value.
Because of its simplicity, it s a very popular measure of the central tendency.
Less effect of marginal values: - Compared top mean, mode is less affected by
marginal values in the series. Mode is determined only by the value with highest
frequencies.
Graphic presentation:- Mode can be located graphically, with the help of histogram.
Best representative: - Mode is that value which occurs most frequently in the series.
Accordingly, mode is the best representative value of the series.
No need of knowing all the items or frequencies: - The calculation of mode does not
require knowledge of all the items and frequencies of a distribution. In simple series,
it is enough if one knows the items with highest frequencies in the distribution.
Demerits of mode:
Following are the various demerits of mode:
Uncertain and vague: - Mode is an uncertain and vague measure of the central
tendency.
Not capable of algebraic treatment: - Unlike mean, mode is not capable of further
algebraic treatment.
Difficult: - With frequencies of all items are identical, it is difficult to identify the
modal value.
Complex procedure of grouping:- Calculation of mode involves cumbersome procedure
of grouping the data. If the extent of grouping changes there will be a change in the
model value.
Ignores extreme marginal frequencies:- It ignores extreme marginal frequencies. To
that extent model value is not a representative value of all the items in a series.
Besides, one can question the representative character of the model value as its
calculation does not involve all items of the series
Measures of Dispersion
As the name suggests, the measure of dispersion shows the scatterings of the data. It
tells the variation of the data from one another and gives a clear idea about the
distribution of the data. The measure of dispersion shows the homogeneity or the
heterogeneity of the distribution of the observations.
Suppose you have four datasets of the same size and the mean is also same, say, m. In
all the cases the sum of the observations will be the same. Here, the measure of
central tendency is not giving a clear and complete idea about the distribution for the
four given sets.
Can we get an idea about the distribution if we get to know about the dispersion of
the observations from one another within and between the datasets? The main idea
about the measure of dispersion is to get to know how the data are spread. It shows
how much the data vary from their average value.
Characteristics of Measures of Dispersion
Easy to calculate
Easy to understand
It provides a minimum value when the deviations are taken from the median
Ignorance of negative sign creates artificiality and becomes useless for further
mathematical treatment
Standard Deviation
A standard deviation is the positive square root of the arithmetic mean of the
squares of the deviations of the given values from their arithmetic mean. It is denoted
by a Greek letter sigma, σ. It is also referred to as root mean square deviation. The
standard deviation is given as
σ = [(Σi (yi – ȳ) ⁄ n] ½ = [(Σ i yi 2 ⁄ n) – ȳ 2] ½
For a grouped frequency distribution, it is
σ = [(Σi fi (yi – ȳ) ⁄ N] ½ = [(Σi fi yi 2 ⁄ n) – ȳ 2] ½
The square of the standard deviation is the variance. It is also a measure of
dispersion.
σ 2 = [(Σi (yi – ȳ ) / n] ½ = [(Σi yi 2 ⁄ n) – ȳ 2]
For a grouped frequency distribution, it is
σ 2 = [(Σi fi (yi – ȳ ) ⁄ N] ½ = [(Σ i fi xi 2 ⁄ n) – ȳ 2].
If instead of a mean, we choose any other arbitrary number, say A, the standard
deviation becomes the root mean deviation.
Variance of the Combined Series
If σ1, σ2 are two standard deviations of two series of sizes n1 and n2 with means ȳ1
and ȳ2. The variance of the two series of sizes n1 + n2 is:
σ 2 = (1/ n1 + n2) ÷ [n1 (σ1 2 + d1 2) + n2 (σ2 2 + d2 2)]
where, d1 = ȳ 1 − ȳ , d2 = ȳ 2 − ȳ , and ȳ = (n1 ȳ 1 + n2 ȳ 2) ÷ ( n1 + n2).
Merits of Standard Deviation
Coefficient of Dispersion
Correlation Analysis
Definition:
The Correlation Analysis is the statistical tool used to study the closeness of the
relationship between two or more variables. The variables are said to be correlated
when the movement of one variable is accompanied by the movement of another
variable. The correlation analysis is used when the researcher wants to determine the
possible association between the variables and to begin with; the following steps are
to be followed:
Determining whether the relation exists and then measuring it (The measure of
correlation is called as the Coefficient of Correlation).
Testing its significance
Establishing the cause-and-effect relation if any. In the correlation analysis,
there are two types of variables- Dependent and Independent. The purpose of such
analysis is to find out if any change in the independent variable results in the change
in the dependent variable or not. Now the question arises that what is the need to
study the correlation? The study of correlation is very useful in the practical life due
to the following reasons:
3. The correlation analysis helps the manufacturing firm in estimating the price,
cost, sales of its product on the basis of the other variables that are
functionally related to it.
3. Whereas, in the case of a partial correlation we study more than two variables,
but consider only two among them that would be influencing each other such
that the effect of the other influencing variable is kept constant. Such as, in
the above example, if we study the relationship between the yield and
fertilizers used during the periods when certain average temperature existed,
then it is a problem of partial correlation.
Review and understand how different variables impact all of these things.
Predict what sales will look like in the next six month.
Though this sounds complicated, it's actually fairly simple. You could simply
look back at the activity of the GDP in the last quarter or in the last three-
month period, and compare it to your sales figure. In reality, the government
reported that the GDP grew 2.6 percent in the fourth quarter of 2018. If your
sales rose 5.2 percent during that same period, you'd have a pretty good idea
that your sales generally rise at twice the rate of GDP growth because:
The "2" means that your sales are rising at twice the rate of the GDP. You might
want to go back a couple of more quarters to be sure this trend continues, say
for an entire year. Suppose you sell car parts, wheat, or forklifts. It would be
the same regardless of the products or services you sell. Since you know that
your sales are increasing at twice the rate of GDP growth, then if the GDP
increases 4 percent the next quarter, your sales will likely rise 8 percent. If the
GDP goes up 3 percent, your sales would likely rise 6 percent, and so on.
In this way, regression analysis can be a valuable tool for forecasting sales and
help you determine whether you need to increase supplies, labor, production
hours, and any number of other factors.
Using Regression Analysis to Formulate Strategies
4. Correcting errors: Even the most informed and careful managers do make
mistakes in judgment. Regression analysis helps managers, and businesses in
general, recognize and correct errors. Suppose, for example, a retail store
manager feels that extending shopping hours will increase sales. Regression
analysis may show that the modest rise in sales might not be enough to offset
the increased cost for labor and operating expenses (such as using more
electricity, for example). Using regression analysis could help a manager
determine that an increase in hours would not lead to an increase in profits.
This could help the manager avoid making a costly mistake
5. New Insights: Looking at the data can provide new and fresh insights. Many
businesses gather lots of data about their customers. But that data is
meaningless without proper regression analysis, which can help find the
relationship between different variables to uncover patterns. For example,
looking at the data through regression analysis might indicate a spike in sales
during certain days of the week and a drop in sales on others. Managers could
then make adjustments to compensate, such as making sure to maintain stock
on those days, bringing in extra help, or even ensuring that the best sales or
service people are working on those days.
What Is the Significance of Regression Analysis in Business?
Regression analysis, then, is clearly a significant factor in business because it is
a statistical method that allows firms, and their managers, to make better-informed
decisions based on hard numbers. As Amy Gallo notes in the Harvard Business Review:
"In order to conduct a regression analysis, you gather the data on the variables in
question....You take all of your monthly sales numbers for, say, the past three years
and any data on the independent variables you’re interested in. So, in this case, let’s
say you find out the average monthly rainfall for the past three years. . . Glancing at
this data, you probably notice that sales are higher on days when it rains a lot. That’s
interesting to know - but by how much? If it rains 3 inches, do you know how much
you’ll sell? What about if it rains 4 inches?". Regression analysis is significant, then,
because it forces you, or any business, to take a look at the actual data, rather than
simply guessing. In Gallo's example, a business would plot the points showing monthly
rainfall for the past three years. That would be the independent variable. Then, you
would look at the monthly sales figures for the business for the past three years,
which is the depending variable: In essence, you're saying rising or falling sales
depend on the amount of rainfall in a given month.
Rain vs. Sales
Suppose your business is selling umbrellas, winter jackets, or spray-on waterproof
coating. You might find that sales rise a bit when there are 2 inches of rain in a
month. But you might also see that sales rise 25 percent or more during months of
heavy rainfall, where there are more than 4 inches of rain. You could, then, be sure
to stock up on umbrellas, winter jackets or spray-on waterproof coating during those
heavy-rain months. You might also extend business hours during those months and
possibly bring in more help.
Moving Averages:
Mean of time series data (observations equally spaced in time) from several
consecutive periods. Called 'moving' because it is continually recomputed as new data
becomes available, it progresses by dropping the earliest value and adding the latest
value. For example, the moving average of six-month sales may be computed by
taking the average of sales from January to June, then the average of sales from
February to July, then of March to August, and so on. Moving averages (1) reduce the
effect of temporary variations in data, (2) improve the 'fit' of data to a line (a process
called 'smoothing') to show the data's trend more clearly, and (3) highlight any value
above or below the trend.
Moving Averages in Technical Analysis explained
Perhaps one of the most commonly used tools for technical analysis is moving
averages. It does not predict the price direction, but defines the current direction
with a lag. That’s why they are called ‘’lagging’’ indicator.
Perhaps one of the most commonly used tools for technical analysis is moving
averages. It does not predict the price direction, but defines the current direction
with a lag. That’s why they are called ‘’lagging’’ indicator. Moving averages works
well when prices are in trend. However, one needs to cautious as the tool can give
false signal when prices are not trending.
For Short term Trend, one can use 5, 11 & 21-day moving averages, while for the
Medium/Intermediate term, 21 to 100 days is generally consider as a good measure.
Finally, any moving average that use 100 days or more, can be consider measuring
long term momentum. The shorter the MA the more sensitive is the signal.
Types of Moving Averages:
Simple Moving Average (SMA) and the Exponential Moving Average (EMA) are the
most popular types of moving averages. These moving averages can be used to spot
the direction of the trend or to identify potential support and resistance levels.
Simple Moving Average (SMA)
A simple moving average is computed by taking the average price of a security
over a certain number of periods. Simple Moving averages are usually constructed
using the closing price, while it it is also possible to calculate it from the open, the
high and the low data points.. For example: a 5-day SMA is calculated by adding the
closing price for the last 5 days and dividing the total by 5.
Daily closing price 5, 6, 7, 8, 9 & 10
First day of 5-day SMA: (5+6+7+8+9)/5 =7
Second day of 5-day SMA: (6+7+8+9+10)/5 =8
Whether you choose a 21-day average or a 52-week average the calculation is same
instead of adding five days you add 21-day or 52 weeks and divide by the same,
respectively.
Exponential Moving Average:
Exponential moving average is used to reduce lag in simple moving average. It
reduces the lag by applying more weight to recent prices relative to older prices, and
so it will react immediately to a recent price change than a SMA. For example: a 5
period exponential moving average weighs the most recent price 33.33%
The formula for an exponential moving average is
EMA= (Closing price –EMA (previous day)) * (Multiplier) + EMA (previous day)
A 5-period EMA’s Multiplier is calculated as below:
2/(Time period+1)= 2/(5+1) = 33.33%
Role Of Moving Average In Determining Market Trends:
Moving average can be utilized to determine the trends,. For instance, if the
moving average is rising, then the trend is considered up. On the other hand, if
the moving average is falling, the trend is considered to be down. It also helps
in identifying supports & resistance.
As seen in the following chart of Reliance capital, 100 day medium term
moving average which was previously acting as resistance level is now acting as
a strong support. Stock also broke 20-day short term moving average several
occasions but able to sustain above 100 day moving average, which indicates
that the medium term trend of the stock is very strong. An investor can gain
insights from the two moving averages in order to make their buy or sell
decisions depending upon time horizon to hold.