Difference Between Descriptive and Inferential Statistics
Difference Between Descriptive and Inferential Statistics
Difference Between Descriptive and Inferential Statistics
Descriptive and inferential statistics are two broad categories in the field of statistics.
In this blog post, I show you how both types of statistics are important for different
purposes. Interestingly, some of the statistical measures are similar, but the goals
and methodologies are very different.
Descriptive Statistics
Both descriptive and inferential statistics help make sense out of row after row of
data!
Use descriptive statistics to summarize and graph the data for a group that you
choose. This process allows you to understand that specific set of observations.
The process involves taking a potentially large number of data points in the sample
and reducing them down to a few meaningful summary values and graphs. This
procedure allows us to gain more insights and visualize the data than simply pouring
through row upon row of raw numbers!
Central tendency: Use the mean or the median to locate the center of the dataset.
This measure tells you where most values fall.
Dispersion: How far out from the center do the data extend? You can use the range
or standard deviation to measure the dispersion. A low dispersion indicates that the
values cluster more tightly around the center. Higher dispersion signifies that data
points fall further away from the center. We can also graph the frequency
distribution.
Skewness: The measure tells you whether the distribution of values is symmetric
or skewed.
You can present this summary information using both numbers and graphs. These
are the standard descriptive statistics, but there are other descriptive analyses you
can perform, such as assessing the relationships of paired data using correlation and
scatterplots.
Mean 79.18
Collectively, this information gives us a pretty good picture of this specific class.
There is no uncertainty surrounding these statistics because we gathered the scores
for everyone in the class. However, we can’t take these results and extrapolate to a
larger population of students.
Inferential Statistics
Inferential statistics takes data from a sample and makes inferences about the larger
population from which the sample was drawn. Because the goal of inferential
statistics is to draw conclusions from a sample and generalize them to a population,
we need to have confidence that our sample accurately reflects the population. This
requirement affects our process. At a broad level, we must do the following:
We don’t get to pick a convenient group. Instead, random sampling allows us to have
confidence that the sample represents the population. This process is a primary
method for obtaining samples that mirrors the population on average. Random
sampling produces statistics, such as the mean, that do not tend to be too high or
too low. Using a random sample, we can generalize from the sample to the broader
population. Unfortunately, gathering a truly random sample can be a complicated
process.
You gain tremendous benefits by working with a random sample drawn from a
population. In most cases, it is simply impossible to measure the entire population to
understand its properties. The alternative is to gather a random sample and then use
the methodologies of inferential statistics to analyze the sample data.
While samples are much more practical and less expensive to work with, there are
tradeoffs. Typically, we learn about the population by drawing a relatively small
sample from it. We are a very long way off from measuring all people or objects in
that population. Consequently, when you estimate the properties of a population
from a sample, the sample statistics are unlikely to equal the actual population value
exactly.
For instance, your sample mean is unlikely to equal the population mean exactly.
The difference between the sample statistic and the population value is the sampling
error. Inferential statistics incorporate estimates of this error into the statistical
results.
Hypothesis tests
Suppose we define our population as all high school basketball players. Then, we
draw a random sample from this population and calculate the mean height of 181
cm. This sample estimate of 181 cm is the best estimate of the mean height of the
population. However, it’s virtually guaranteed that our estimate of the population
parameter is not exactly correct.
Regression analysis
For example, the fitted line plot below displays the relationship in
the regression model between height and weight in adolescent girls. Because the
relationship is statistically significant, we have sufficient evidence to conclude that
this relationship exists in the population rather than just our sample.
Related post: When Should I Use Regression Analysis?
For this example, suppose we conducted our study on test scores for a specific class
as I detailed in the descriptive statistics section. Now we want to perform an
inferential statistics study for that same test. Let’s assume it is a standardized
statewide test. By using the same test, but now with the goal of drawing inferences
about a population, I can show you how that changes the way we conduct the study
and the results that we present.
In descriptive statistics, we picked the specific class that we wanted to describe and
recorded all of the test scores for that class. Nice and simple. For inferential
statistics, we need to define the population and then draw a random sample from
that population.
Let’s define our population as 8th-grade students in public schools in the State of
Pennsylvania in the United States. We need to devise a random sampling plan to
help ensure a representative sample. This process can actually be arduous. For the
sake of this example, assume that we are provided a list of names for the entire
population and draw a random sample of 100 students from it and obtain their test
scores. Note that these students will not be in one class, but from many different
classes in different schools across the state.
Statistic Population Parameter Estimate (CIs)
Given the uncertainty associated with these estimates, we can be 95% confident that
the population mean is between 77.4 and 80.9. The population standard deviation (a
measure of dispersion) is likely to fall between 7.7 and 10.1. And, the population
proportion of satisfactory scores is expected to be between 77% and 92%.
Another key inferential statistic is the standard error of the mean. To learn more
about it, read my post The Standard Error of the Mean.
For descriptive statistics, we choose a group that we want to describe and then
measure all subjects in that group. The statistical summary describes this group with
complete certainty (outside of measurement error).
For inferential statistics, we need to define the population and then devise a
sampling plan that produces a representative sample. The statistical results
incorporate the uncertainty that is inherent in using a sample to understand an entire
population. The sample size becomes a vital characteristic. The law of large
numbers states that as the sample size grows, the sample statistics (i.e., sample
mean) will converge on the population value.
If you’re learning about statistics and like the approach I use in my blog, check out
my Introduction to Statistics eBook!