2) .Elaborate The Statistical Expression of Data
2) .Elaborate The Statistical Expression of Data
2) .Elaborate The Statistical Expression of Data
Statistics is a field of mathematics that pertains to data analysis. Statistical methods and
equations can be applied to a data set in order to analyze and interpret results, explain variations
in the data, or predict future data. A few examples of statistical information we can calculate are:
What is a Statistic?
A parameter is a property of a population. As illustrated in the example above, most of the time it
is infeasible to directly measure a population parameter. Instead a sample must be taken and
statistic for the sample is calculated. This statistic can be used to estimate the population
parameter. (A branch of statistics know as Inferential Statistics involves using samples to infer
information about a populations.) In the example about the population parameter is the average
weight of all 7th graders in the United States and the sample statistic is the average weight of a
group of 7th graders.
A large number of statistical inference techniques require samples to be a single random sample
and independently gathers. In short, this allows statistics to be treated as random variables. A in-
depth discussion of these consequences is beyond the scope of this text. It is also important to
note that statistics can be flawed due to large variance, bias, inconsistency and other errors that
may arise during sampling. Whenever performing over reviewing statistical analysis, a skeptical
eye is always valuable.
Basic Statistics
When performing statistical analysis on a set of data, the mean, median, mode, and standard
deviation are all helpful values to calculate. The mean, median and mode are all estimates of
where the "middle" of a set of data is. These values are useful when creating groups or bins to
organize larger sets of data. The standard deviation is the average distance between the actual
data and the mean.
The mean (also know as average), is obtained by dividing the sum of observed values by the
number of observations, n. Although data points fall above, below, or on the mean, it can be
considered a good estimate for predicting subsequent data points. The formula for the mean is
given below as Equation 13.1.113.1.1. The excel syntax for the mean is AVERAGE(starting cell:
ending cell).
X¯=∑i=ni=1Xin(13.1.1)(13.1.1)X¯=∑i=1i=nXin
However, equation (1) can only be used when the error associated with each measurement is the
same or unknown. Otherwise, the weighted average, which incorporates the standard deviation,
should be calculated using equation (2) below.
Xwav=∑wixi∑wi(13.1.2)(13.1.2)Xwav=∑wixi∑wi
where
wi=1σ2iwi=1σi2
Median
The median is the middle value of a set of data containing an odd number of values, or the
average of the two middle values of a set of data with an even number of values. The median is
especially helpful when separating data into two equal sized bins. The excel syntax to find the
median is MEDIAN(starting cell: ending cell).
Mode
The mode of a set of data is the value which occurs most frequently. The excel syntax for the
mode is MODE(starting cell: ending cell).
Considerations
Now that we've discussed some different ways in which you can describe a data set, you might
be wondering when to use each way. Well, if all the data points are relatively close together, the
average gives you a good idea as to what the points are closest to. If on the other hand, almost all
the points fall close to one, or a group of close values, but occasionally a value that differs
greatly can be seen, then the mode might be more accurate for describing this system, whereas
the mean would incorporate the occasional outlying data. The median is useful if you are
interested in the range of values your system could be operating in. Half the values should be
above and half the values should be below, so you have an idea of where the middle operating
point is.
The standard deviation gives an idea of how close the entire set of data is to the average value.
Data sets with a small standard deviation have tightly grouped, precise data. Data sets with large
standard deviations have data spread out over a wide range of values. The formula for standard
deviation is given below as Equation 13.1.313.1.3. The excel syntax for the standard deviation is
STDEV(starting cell: ending cell).
σ=1n−1∑i=1i=n(Xi−X¯)2 ⎷ (13.1.3)(13.1.3)σ=1n−1∑i=1i=n(Xi−X¯)2