standard error
standard error
standard error
The standard error is the approximate standard deviation of a statistical sample population. The standard error is
a statistical term that measures the accuracy with which a sample represents a population. In statistics, a sample
mean deviates from the actual mean of a population; this deviation is the standard error.
Key Takeaways
The standard error is the approximate standard deviation of a statistical sample population.
The standard error can include the variation between the calculated mean of the population and one
which is considered known, or accepted as accurate.
The more data points involved in the calculations of the mean, the smaller the standard error tends to
be.
The term "standard error" is used to refer to the standard deviation of various sample statistics, such as the mean
or median. For example, the "standard error of the mean" refers to the standard deviation of the distribution of
sample means taken from a population. The smaller the standard error, the more representative the sample will
be of the overall population.
The relationship between the standard error and the standard deviation is such that, for a given sample size, the
standard error equals the standard deviation divided by the square root of the sample size. The standard error is
also inversely proportional to the sample size; the larger the sample size, the smaller the standard error because
the statistic will approach the actual value.
The standard error is considered part of descriptive statistics. It represents the standard deviation of the mean
within a dataset. This serves as a measure of variation for random variables, providing a measurement for the
spread. The smaller the spread, the more accurate the dataset.
How to Find the Mean, Median, Mode, Range, and Standard Deviation
By Karen G Blaettler; Updated May 14, 2018
Darkdiamond67/iStock/GettyImages
Simplify comparisons of sets of number, especially large sets of number, by calculating the center values using
mean, mode and median. Use the ranges and standard deviations of the sets to examine the variability of data.
Calculating Mean
The mean identifies the average value of the set of numbers. For example, consider the data set containing the
values 20, 24, 25, 36, 25, 22, 23.
To find the mean, use the formula: Mean equals the sum of the numbers in the data set divided by the number of
values in the data set. In mathematical terms: Mean=(sum of all terms)÷(how many terms or values in the set).
Divide by the number of data points in the set. This set has seven values so divide by 7.
Insert the values into the formula to calculate the mean. The mean equals the sum of the values (175) divided by
the number of data points (7). Since 175÷7=25, the mean of this data set equals 25. Not all mean values will
equal a whole number.
Calculating Median
Put the numbers in order from smallest to largest. Use the example set of values: 20, 24, 25, 36, 25, 22, 23.
Placed in order, the set becomes: 20, 22, 23, 24, 25, 25, 36.
Since this set of numbers has seven values, the median or value in the center is 24.
If the set of numbers has an even number of values, calculate the average of the two center values. For example,
suppose the set of numbers contains the values 22, 23, 25, 26. The middle lies between 23 and 25. Adding 23
and 25 yields 48. Dividing 48 by two gives a median value of 24.
Calculating Mode
The mode identifies the most common value or values in the data set. Depending on the data, there might be one
or more modes, or no mode at all.
Like finding the median, order the data set from smallest to largest. In the example set, the ordered values
become: 20, 22, 23, 24, 25, 25, 36.
A mode occurs when values repeat. In the example set, the value 25 occurs twice. No other numbers repeat.
Therefore, the mode is the value 25.
In some data sets, more than one mode occurs. The data set 22, 23, 23, 24, 27, 27, 29 contains two modes, one
each at 23 and 27. Other data sets may have more than two modes, may have modes with more than two
numbers (as 23, 23, 24, 24, 24, 28, 29: mode equals 24) or may not have any modes at all (as 21, 23, 24, 25, 26,
27, 29). The mode may occur anywhere in the data set, not just in the middle.
Calculating Range
Range shows the mathematical distance between the lowest and highest values in the data set. Range measures
the variability of the data set. A wide range indicates greater variability in the data, or perhaps a single outlier
far from the rest of the data. Outliers may skew, or shift, the mean value enough to impact data analysis.
In the sample group, the lowest value is 20 and the highest value is 36.
To calculate range, subtract the lowest value from the highest value. Since 36-20=16, the range equals 16.
In the sample set, the high data value of 36 exceeds the previous value, 25, by 11. This value seems extreme,
given the other values in the set. The value of 36 might be an outlier data point.
Standard deviation measures the variability of the data set. Like range, a smaller standard deviation indicates
less variability.
Finding standard deviation requires summing the squared difference between each data point and the mean
[∑(x-µ)2], adding all the squares, dividing that sum by one less than the number of values (N-1), and finally
calculating the square root of the dividend. Mathematically, start with calculating the mean.
Calculate the mean by adding all the data point values, then dividing by the number of data points. In the sample
data set, 20+24+25+36+25+22+23=175. Divide the sum, 175, by the number of data points, 7, or 175÷7=25.
The mean equals 25.
Next, subtract the mean from each data point, then square each difference. The formula looks like this: ∑(x-µ) 2,
where ∑ means sum, x represents each data set value and µ represents the mean value. Continuing with the
example set, the values become: 20-25=-5 and -5 2=25; 24-25=-1 and -12=1; 25-25=0 and 02=0; 36-25=11 and
112=121; 25-25=0 and 02=0; 22-25=-3 and -32=9; and 23-25=-2 and -22=4.
Divide the sum of the squared differences by one less than the number of data points. The example data set has
7 values, so N-1 equals 7-1=6. The sum of the squared differences, 160, divided by 6 equals approximately
26.6667.
Calculate the standard deviation by finding the square root of the division by N-1. In the example, the square
root of 26.6667 equals approximately 5.164. Therefore, the standard deviation equals approximately 5.164.
Standard deviation helps evaluate data. Numbers in the data set that fall within one standard deviation of the
mean are part of the data set. Numbers that fall outside of two standard deviations are extreme values or outliers.
In the example set, the value 36 lies more than two standard deviations from the mean, so 36 is an outlier.
Outliers may represent erroneous data or may suggest unforeseen circumstances and should be carefully
considered when interpreting data.
Measure of central tendency is a value that represents a typical, or central, entry of a data set. The most
common measures of central tendency are:
• Mean (Average): The sum of all the data entries divided by the number of entries. Population Mean: x N µ Σ =
Sample Mean: x x n Σ =
• Median: The value that lies in the middle of the data when the data set is ordered. If the data set has an odd
number of entries, then the median is the middle data entry. If the data has an even number of entries, then the
median is obtained by adding the two numbers in the middle and dividing result by two.
• Mode: The data entry that occurs with the greatest frequency. A data set may have one mode, more than one
mode, or no mode. If no entry is repeated the data set has no mode.
Measures of Variation:
• Range: The difference between the maximum and minimum data entries in the set. Range = (Max. data entry)
– (Min. data entry)
In statistics, the sample maximum and sample minimum, also called the largest observation and smallest
observation, are the values of the greatest and least elements of a sample.
• The standard deviation measure variability and consistency of the sample or population. In most real-world
applications, consistency is a great advantage. In statistical data analysis, less variation is often better. 22() ()
Population Standard Deviation Sample Standard Deviation 1 x xx s Nn
Degrees of freedom are the number of values in a study that have the freedom to vary. They are commonly
discussed in relationship to various forms of hypothesis testing in statistics, such as a chi-square. It is essential
to calculate degrees of freedom when trying to understand the importance of a chi-square statistic and the
validity of the null hypothesis.
For example, consider a student needs to take nine courses to graduate, and there are only nine courses offered
the student can take. In this example, there are eight degrees of freedom - the student is able to choose eight of
the classes that are available, but the ninth class is the only class left, and the student has to enroll in it to
graduate.
There are two different kinds of chi square tests: the test of independence, which asks a question of relationship,
such as, "Is there a relationship between gender and SAT scores?"; and the goodness-of-fit test, which asks
something like "If a coin is tossed 100 times, will it come up heads 50 times and tails 50 times?" For these tests,
degrees of freedom are utilized to determine if a certain null hypothesis can be rejected based on the total
number of variables and samples within the experiment. For example, when considering students and course
choice, a sample size of 30 or 40 students is likely not large enough to generate significant data. Getting the
same or similar results from a study using a sample size of 400 or 500 students is more valid.
Hypothesis testing starts with setting up the premises, which is followed by selecting a significance level. Next,
we have to choose the test statistic, i.e. t-test or f-test. While t-test is used to compare two related samples, f-test
is used to test the equality of two populations.
The hypothesis is a simple proposition that can be proved or disproved through various scientific techniques and
establishes the relationship between independent and some dependent variable. It is capable of being tested and
verified to ascertain its validity, by an unbiased examination. Testing of a hypothesis attempts to make clear,
whether or not the supposition is valid.
For a researcher, it is imperative to choose the right test for his/her hypothesis as the entire decision of
validating or refusing the null hypothesis is based on it. Take a read of the given article to understand the
difference between t-test and f-test.
Definition of T-test
A t-test is a form of the statistical hypothesis test, based on Student’s t-statistic and t-distribution to find out the
p-value (probability) which can be used to accept or reject the null hypothesis.
T-test analyses if the means of two data sets are greatly different from each other, i.e. whether the population
mean is equal to or different from the standard mean. It can also be used to ascertain whether the regression line
has a slope different from zero. The test relies on a number of assumptions, which are:
Mean and standard deviation of the two sample are used to make comparison between them, such that:
where,
Definition of F-test
F-test is described as a type of hypothesis test, that is based on Snedecor f-distribution, under the null
hypothesis. The test is performed when it is not known whether the two populations have the same variance.
F-test can also be used to check if the data conforms to a regression model, which is acquired through least
square analysis. When there is multiple linear regression analysis, it examines the overall validity of the model
or determines whether any of the independent variables is having a linear relationship with the dependent
variable. A number of predictions can be made through, the comparison of the two datasets. The expression of
the f-test value is in the ratio of variances of the two observations, which is shown as under:
Where, σ2 = variance
1. A univariate hypothesis test that is applied when the standard deviation is not known and the sample size
is small is t-test. On the other hand, a statistical test, which determines the equality of the variances of
the two normal datasets, is known as f-test.
2. The t-test is based on T-statistic follows Student t-distribution, under the null hypothesis. Conversely,
the basis of the f-test is F-statistic follows Snedecor f-distribution, under the null hypothesis.
3. The t-test is used to compare the means of two populations. In
Comparison Chart
BASIS FOR
T-TEST F-TEST
COMPARISON
T-test is a univariate hypothesis test, that is F-test is statistical test, that determines
Meaning applied when standard deviation is not known the equality of the variances of the two
and the sample size is small. normal populations.
Application Comparing the means of two populations. Comparing two population variances.
Sometimes, measuring every single piece of item is just not practical. That is why we developed and use
statistical methods to solve problems. The most practical way to do it is to measure just a sample of the
population. Some methods test hypotheses by comparison. The two of the more known statistical hypothesis test
are the T-test and the Z-test. Let us try to breakdown the two.
A T-test is a statistical hypothesis test. In such test, the test statistic follows a Student’s T-distribution if the null
hypothesis is true. The T-statistic was introduced by W.S. Gossett under the pen name “Student”. The T-test is
also referred as the “Student T-test”. It is very likely that the T-test is most commonly used Statistical Data
Analysis procedure for hypothesis testing since it is straightforward and easy to use. Additionally, it is flexible
and adaptable to a broad range of circumstances.
There are various T-tests and two most commonly applied tests are the one-sample and paired-sample T-tests.
One-sample T-tests are used to compare a sample mean with the known population mean. Two-sample T-tests,
the other hand, are used to compare either independent samples or dependent samples.
T-test is best applied, at least in theory, if you have a limited sample size (n < 30) as long as the variables are
approximately normally distributed and the variation of scores in the two groups is not reliably different. It is
also great if you do not know the populations’ standard deviation. If the standard deviation is known, then, it
would be best to use another type of statistical test, the Z-test. The Z-test is also applied to compare sample and
population means to know if there’s a significant difference between them. Z-tests always use normal
distribution and also ideally applied if the standard deviation is known. Z-tests are often applied if the certain
conditions are met; otherwise, other statistical tests like T-tests are applied in substitute. Z-tests are often applied
in large samples (n > 30). When T-test is used in large samples, the t-test becomes very similar to the Z-test.
There are fluctuations that may occur in T-tests sample variances that do not exist in Z-tests. Because of this,
there are differences in both test results.
Summary:
1. Z-test is a statistical hypothesis test that follows a normal distribution while T-test follows a Student’s T-
distribution.
2. A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is appropriate when you
are handling moderate to large samples (n > 30).
3. T-test is more adaptable than Z-test since Z-test will often require certain conditions to be reliable.
Additionally, T-test has many methods that will suit any need.
4. T-tests are more commonly used than Z-tests.
5. Z-tests are preferred than T-tests when standard deviations are known.
A 95% confidence level does not mean that for a given realized interval there is a 95% probability that the
population parameter lies within the interval (i.e., a 95% probability that the interval covers the population
parameter).[10] According to the strict frequentist interpretation, once an interval is calculated, this interval either
covers the parameter value or it does not; it is no longer a matter of probability. The 95% probability relates to
the reliability of the estimation procedure, not to a specific calculated interval. [11] Neyman himself (the original
proponent of confidence intervals) made this point in his original paper:[3]
"It will be noticed that in the above description, the probability statements refer to the problems of
estimation with which the statistician will be concerned in the future. In fact, I have repeatedly stated
that the frequency of correct results will tend to α. Consider now the case when a sample is already
drawn and the calculations have given [particular limits]. Can we say that in this particular case the
probability of the true value [falling between these limits] is equal to α? The answer is obviously in the
negative. The parameter is an unknown constant, and no probability statement concerning its value
may be made..."
The correlation coefficient is a statistical measure that calculates the strength of the relationship between the
relative movements of two variables. The values range between -1.0 and 1.0. A calculated number greater than
1.0 or less than -1.0 means that there was an error in the correlation measurement. A correlation of -1.0 shows a
perfect negative correlation, while a correlation of 1.0 shows a perfect positive correlation. A correlation of 0.0
shows no relationship between the movements of the two variables.
There are several types of correlation coefficients but the one that is most common is the Pearson correlation ( r).
This measures strength and direction of the linear relationship between two variables. It cannot capture
nonlinear relationships between two variables and cannot differentiate between dependent and independent
variables.
A value of exactly 1.0 means there is a perfect positive relationship between the two variables. For a positive
increase in one variable, there is also a positive increase in the second variable. A value of -1.0 means there is a
perfect negative relationship between the two variables. This shows that the variables move in opposite
directions - for a positive increase in one variable, there is a decrease in the second variable. If the correlation is
0, there is no relationship between the two variables.
The strength of the relationship varies in degree based on the value of the correlation coefficient. For example, a
value of 0.2 shows there is a positive relationship between the two variables, but it is weak and likely
insignificant. Experts do not consider correlations significant until the value surpasses at least 0.8. However, a
correlation coefficient with an absolute value of 0.9 or greater would represent a very strong relationship.
This statistic is useful in finance. For example, it can be helpful in determining how well a mutual fund
performs relative to its benchmark index, or another fund or asset class. By adding a low or negatively
correlated mutual fund to an existing portfolio, the investor gains diversification benefits.
Key Takeaways
Correlation coefficients are used to measure the strength of the relationship between two variables.
Pearson correlation is the one most commonly used in statistics. This measures the strength and
direction of a linear relationship between two variables.
Values always range between -1 (strong negative relationship) and +1 (strong positive relationship).
Values at or close to zero imply weak or no relationship.
Correlation coefficient values less than +0.8 or greater than -0.8 are not considered significant.
Result
The results page looks a little complex, but actually isn’t as baffling as it might at first seem.
The chi square statistic appears in the Value column of the Chi-Square Tests table immediately to the right of
“Pearson Chi-Square”. In this example, the value of the chi square statistic is 6.718. The p-value appears in the
same row in the “Asymptotic Significance (2-sided)” column (.010). The result is significant if this value is
equal to or less than the designated alpha level (normally .05).
In this case, the p-value is smaller than the standard alpha value, so we’d reject the null hypothesis that asserts
the two variables are independent of each other. To put it simply, the result is significant – the data suggests that
the variables Religion and Eating are associated with each other.
Determine the degrees of freedom of your chi-square value. If you are comparing results for a single sample
with multiple categories, the degrees of freedom is the number of categories minus 1. For example, if you were
evaluating the distribution of colors in a jar of jellybeans and there were four colors, the degrees of freedom
would be 3. If you are comparing tabular data the degrees of freedom equals the number of rows minus 1
multiplied by the number of columns minus 1.
Determine the critical p value that you will use to evaluate your data. This is the percent probability (divided by
100) that a specific chi-square value was obtained by chance alone. Another way of thinking about p is that it is
the probability that your observed results deviated from the expected results by the amount that they did solely
due to random variation in the sampling process.
Look up the p value associated with your chi-square test statistic using the chi-square distribution table. To do
this, look along the row corresponding to your calculated degrees of freedom. Find the value in this row closest
to your test statistic. Follow the column that contains that value upwards to the top row and read off the p value.
If your test statistic is in between two values in the initial row, you can read off an approximate p value
intermediate between two p values in the top row.
Compare the p value obtained from the table to the critical p value earlier decided upon. If your tabular p value
is above the critical value, you will conclude that any deviation between the sample category values and the
expected values was due to random variation and was not significant. For example, if you chose a critical p
value of 0.05 (or 5%) and found a tabular value of 0.20, you would conclude there was no significant variation.
Tip
Remember that any conclusion made based on this test will still have a chance of being wrong, proportionate to
the p value obtained.
SPSS always assumes that the independent variable is represented numerically. In the sample data set, MAJOR
is a string. So first convert the string variable into a numerical variable. Once your conversion is over you are
ready to do the ANOVA
Descriptive section
Test of Homogeneity of Variances
ANOVA
Multiple Comparisons
Grade Point Average
Graph
You can use t-test to compare the means of two samples but when there are more than two samples to be
compared then ANOVA is the best method to be used.
Assumptions of ANOVA
There are four main assumptions
Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability
found inside a data set into two parts: systematic factors and random factors. The systematic factors have a
statistical influence on the given data set, while the random factors do not. Analysts use the ANOVA test to
determine the influence that independent variables have on the dependent variable in a regression study.
The t- and z-test methods developed in the 20th century were used for statistical analysis until 1918, when
Ronald Fisher created the analysis of variance method. ANOVA is also called the Fisher analysis of variance,
and it is the extension of the t- and z-tests. The term became well-known in 1925, after appearing in Fisher's
book, "Statistical Methods for Research Workers." It was employed in experimental psychology and later
expanded to subjects that were more complex.
The ANOVA test is the initial step in analyzing factors that affect a given data set. Once the test is finished, an
analyst performs additional testing on the methodical factors that measurably contribute to the data set's
inconsistency. The analyst utilizes the ANOVA test results in an f-test to generate additional data that aligns
with the proposed regression models.
The ANOVA test allows a comparison of more than two groups at the same time to determine whether a
relationship exists between them. The result of the ANOVA formula, the F statistic (also called the F-ratio),
allows for the analysis of multiple groups of data to determine the variability between samples and within
samples.
If no real difference exists between the tested groups, which is called the null hypothesis, the result of the
ANOVA's F-ratio statistic will be close to 1. Fluctuations in its sampling will likely follow the Fisher F
distribution. This is actually a group of distribution functions, with two characteristic numbers, called the
numerator degrees of freedom and the denominator degrees of freedom.
Key Takeaways
Analysis of variance, or ANOVA, is a statistical method that separates observed variance data into
different components to use for additional tests.
A one-way ANOVA is used for three or more groups of data, to gain information about the relationship
between the dependent and independent variables.
If no true variance exists between the groups, the ANOVA's F-ratio should equal close to 1.
A researcher might, for example, test students from multiple colleges to see if students from one of the colleges
consistently outperform students from the other colleges. In a business application, an R&D researcher might
test two different processes of creating a product to see if one process is better than the other in terms of cost
efficiency.
The type of ANOVA test used depends on a number of factors. It is applied when data needs to be experimental.
Analysis of variance is employed if there is no access to statistical software resulting in computing ANOVA by
hand. It is simple to use and best suited for small samples. With many experimental designs, the sample sizes
have to be the same for the various factor level combinations.
ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample t-tests. However, it
results in fewer type I errors and is appropriate for a range of issues. ANOVA groups differences by comparing
the means of each group and includes spreading out the variance into diverse sources. It is employed with
subjects, test groups, between groups and within groups.
There are two types of ANOVA: one-way (or unidirectional) and two-way. One-way or two-way refers to the
number of independent variables in your analysis of variance test. A one-way ANOVA evaluates the impact of a
sole factor on a sole response variable. It determines whether all the samples are the same. The one-way
ANOVA is used to determine whether there are any statistically significant differences between the means of
three or more independent (unrelated) groups.
A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have one independent
variable affecting a dependent variable. With a two-way ANOVA, there are two independents. For example, a
two-way ANOVA allows a company to compare worker productivity based on two independent variables, such
as salary and skill set. It is utilized to observe the interaction between the two factors and tests the effect of two
factors at the same time.
The dependent variable is 'dependent' on the independent variable. As the experimenter changes the independent
variable, the change in the dependent variable is observed and recorded. When you take data in an experiment,
the dependent variable is the one being measured.
A scientist is testing the effect of light and dark on the behavior of moths by turning a light on and off.
The independent variable is the amount of light and the moth's reaction is the dependent variable. A
change in the independent variable (amount of light) directly causes a change in the dependent variable
(moth behavior).
You are interested in learning which kind of chicken produces the largest eggs. The size of the eggs
depends on the breed of chicken, so breed is the independent variable and egg size is the dependent
variable.
You want to know whether or not stress affects heart rate. Your independent variable is the stress,
while the dependent variable would be the heart rate. To perform an experiment, you would provide
stress and measure the subject's heartbeat. Note in a good experiment, you'd want to choose a stress
you could control and quantify. Your choice could lead you to perform additional experiments since it
might turn out the change in heart rate after exposure to a decrease in temperature 40 degrees (physical
stress) might be different from the heart rate after failing a test (psychological stress). Even though
your independent variable might be a number that you measure, it's one you control, so it's not
"dependent".
Sometimes it's easy to tell the two types of variables apart, but if you get confused, here are tips to help keep
them straight:
If you change one variable, which is affected? If you're studying the rate of growth of plants using
different fertilizers, can you identify the variables? Start by thinking about what you are controlling and
what you will be measuring. The type of fertilizer is the independent variable. The rate of growth is the
dependent variable. So, to perform an experiment, you would fertilize plants with one fertilizer and
measure the change in height of the plant over time, then switch fertilizers and measure the height of
plants over the same span of time. You might be tempted to identify time or height as your variable, not
the rate of growth (distance per time). It may help to look at your hypothesis or purpose to remember
your goal.
Write out your variables as a sentence stating cause and effect. The (independent variable) causes a
change in the (dependent variable). Usually, the sentence won't make sense if you get them wrong. For
example:
(Taking vitamins) affects the numbers of (birth defects). = makes sense
(Birth defects) affects the number of (vitamins). = probably not so much
Hypothesis Testingtop ^
In order to undertake hypothesis testing you need to express your research hypothesis as a null and alternative
hypothesis. The null hypothesis and alternative hypothesis are statements regarding the differences or effects
that occur in the population. You will use your sample to test which statement (i.e., the null hypothesis or
alternative hypothesis) is most likely (although technically, you test the evidence against the null hypothesis).
So, with respect to our teaching example, the null and alternative hypothesis will reflect statements about all
statistics students on graduate management courses.
The null hypothesis is essentially the "devil's advocate" position. That is, it assumes that whatever you are trying
to prove did not happen (hint: it usually states that something equals zero). For example, the two different
teaching methods did not result in different exam performances (i.e., zero difference). Another example might
be that there is no relationship between anxiety and athletic performance (i.e., the slope is zero). The alternative
hypothesis states the opposite and is usually the hypothesis you are trying to prove (e.g., the two different
teaching methods did result in different exam performances). Initially, you can state these hypotheses in more
general terms (e.g., using terms like "effect", "relationship", etc.), as shown below for the teaching methods
example:
Null Hypotheses (H0): Undertaking seminar classes has no effect on students' performance.
Depending on how you want to "summarize" the exam performances will determine how you might want to
write a more specific null and alternative hypothesis. For example, you could compare the mean exam
performance of each group (i.e., the "seminar" group and the "lectures-only" group). This is what we will
demonstrate here, but other options include comparing the distributions, medians, amongst other things. As
such, we can state:
Null Hypotheses (H0): The mean exam mark for the "seminar" and "lecture-only"
teaching methods is the same in the population.
Alternative Hypothesis (HA): The mean exam mark for the "seminar" and "lecture-only"
teaching methods is not the same in the population.
Now that you have identified the null and alternative hypotheses, you need to find evidence and develop a
strategy for declaring your "support" for either the null or alternative hypothesis. We can do this using some
statistical theory and some arbitrary cut-off points. Both these issues are dealt with next.
Hypothesis Testingtop ^
Significance levels
The level of statistical significance is often expressed as the so-called p-value. Depending on the statistical test
you have chosen, you will calculate a probability (i.e., the p-value) of observing your sample results (or more
extreme) given that the null hypothesis is true. Another way of phrasing this is to consider the probability that
a difference in a mean score (or other statistic) could have arisen based on the assumption that there really is no
difference. Let us consider this statement with respect to our example where we are interested in the difference
in mean exam performance between two different teaching methods. If there really is no difference between the
two teaching methods in the population (i.e., given that the null hypothesis is true), how likely would it be to see
a difference in the mean exam performance between the two teaching methods as large as (or larger than) that
which has been observed in your sample?
So, you might get a p-value such as 0.03 (i.e., p = .03). This means that there is a 3% chance of finding a
difference as large as (or larger than) the one in your study given that the null hypothesis is true. However, you
want to know whether this is "statistically significant". Typically, if there was a 5% or less chance (5 times in
100 or less) that the difference in the mean exam performance between the two teaching methods (or whatever
statistic you are using) is as different as observed given the null hypothesis is true, you would reject the null
hypothesis and accept the alternative hypothesis. Alternately, if the chance was greater than 5% (5 times in 100
or more), you would fail to reject the null hypothesis and would not accept the alternative hypothesis. As such,
in this example where p = .03, we would reject the null hypothesis and accept the alternative hypothesis. We
reject it because at a significance level of 0.03 (i.e., less than a 5% chance), the result we obtained could happen
too frequently for us to be confident that it was the two teaching methods that had an effect on exam
performance.
Whilst there is relatively little justification why a significance level of 0.05 is used rather than 0.01 or 0.10, for
example, it is widely used in academic research. However, if you want to be particularly confident in your
results, you can set a more stringent level of 0.01 (a 1% chance or less; 1 in 100 chance or less).