MT 21 Activity Lopez
MT 21 Activity Lopez
MT 21 Activity Lopez
Lopez
MT 21 – Biostatistics and Epidemiology
INTRODUCTION
Statistics is the collection, organization, analysis, interpretation, and presentation of data. It is
categorized into two types: Descriptive statistics and inferential statistics. Inferential statistics differs from
descriptive statistics as descriptive statistics only allows you to describe data while inferential statistics
allows to make predictions from your data. Inferential statistics allows researchers, into making inferences
and conclusions based on the given data. I have provided an outline of my discussion in sections. Section
A (hypothesis, statistical tests, probability theory, normality distribution and central limit theorem),
Section B (choosing a statistical test), Section C (confidence intervals and margin of error), Section D
(hypothesis and hypothesis testing), Section E (normal distribution, Z-scores, and calculating
probabilities), Section F (null hypothesis, alternative hypothesis, P-value, statistical significance, and
Type 1 and 2 error), Section G (what are confidence intervals), Section H (normality tests in SPSS), and
Section I (hypothesis testing, sampling, sampling distribution, dependent and independent samples, and
specificity, sensitivity, and validity). All the above mentioned are to be discussed thoroughly and
comprehensively.
DISCUSSIONS OF TOPICS
SECTION A
a. Hypothesis
Hypothesis is a part of the research paper where it states the research question that sets forth the
appropriate statistical evaluation. It can be null and alternative, where it gives the definition of there is no
statistical difference or association between variables and there is a statistical difference or association
between variables, respectively. Furthermore, hypothesis can be one-tailed, where outcome is expected in
a single direction or two-tailed, where the direction is unknown.
b. Statistical tests
Statistical tests provide mechanisms in making quantitative decisions about processes in the research.
This determines whether there is enough evidence to reject or accept the hypothesis. A one-tailed test is
used to test the possibility of the relationship in one direction and completely disregarding the possibility
of a relationship in the other direction and a two-tailed test is used to test the significance in both directions.
c. Probability theory
Probability is the measure of the likelihood that an event will occur. It is the basis for decision-making
statistical inferences. Probability is determined by the number of outcomes over the total possible
outcomes. It is expressed in 0-1, where the p=0, the event is unlikely to happen but if p=1, the event is
likely to happen.
d. Tests for normality
Testing the normality gives determination whether to use parametric or non-parametric inferential
method. The sample should represent the various characteristics of a population where it includes how it
was obtained, the size, and the distribution of variables. The tests for normality include Kolmogorov-
smirnov test, Lillefors test, Shapiro-Wilk test, Anderson-Darling tests an descriptive methods such as
histogram analysis and skewness and kurtosis.
e. Central limit theorem
This theorem states that as the sample size increases, the sample distribution of the mean increasingly
approaches a normal distribution.
SECTION B
a. Choosing a statistical statistic
Begins with understanding the purpose of your research question, it could be whether comparison and
relationship where it understands difference between two groups and connection between two groups
respectively. Determining what type of data, you are looking is also tackled. It could be categorical and
continuous. Categorical data represents qualitative characteristics and continuous data represents
quantitative characteristics. Now, you can choose between the 3 families of tests: chi-squared, t-tests and
correlation. Chi-squared is used in testing homogeneity and independence. T-tests can be one-sample t-
test, 2-sample unpaired t-test and 2-sample paired t-test, and one-way ANNOVA. Correlation covers
Pearson’s correlation, regression, and Spearman’s correlation.
SECTION C
a. Confidence intervals
Sampling a population and gathering data from the sample is the beginner step. The sampling
distribution of the sample proportions are going to be specific to the sample size. Confidence interval is a
range of estimates in the unknown parameter. It is the probability of how you are n=unknown standard
deviations from the mean (based on the video), specifically it displays the probability that a parameter will
fall between values around the mean. Most common confidence intervals are 95% or 99%.
b. Margin of error
A related idea to the confidence interval is the notion of margin of error. Margin of error cares on the
confidence interval. It dictates how many points or range you are above and below the sample proportion.
SECTION D
a. Hypothesis
Again, hypothesis is defined as an educated guess supported or refuted by experimentation and
observation. It is a statement about a future event, or an event the outcome of which is unknown at the
time of the prediction, set forth in such a way that it can be rejected or not rejected.
b. Hypothesis testing
Hypothesis test is an inferential procedure in which the research seeks to determine how likely it is
that the results of the study are due to chance. There are four steps in hypothesis testing: stating the null
and alternative hypothesis, choosing a statistical significance level, carrying out the appropriate statistical
procedure, and making decision regarding hypothesis.
SECTION E
a. Normal distribution
A normal distribution exhibits a bell-shaped, symmetrical distribution where the mean and median in
the same number. The standard normal distribution is a normal distribution represented in z-scores. It
consists of a mean and a standard deviation in one. The z-table can now e used to calculate the area under
the curve between two data points.
b. Z-score
Finding the z-score is equal to x minus the mean over the standard deviation. It is a measure of the
distance between a data point and the mean. A positive z-score is above the mean and a negative z-score
is below the mean The z-table is used to determine what area corresponds to the -score.
SECTION F
a. Null hypothesis
Represented by Ho where it states ‘no difference between groups’. In other words, there seems no
relationship between risk factors and outcomes.
b. Alternative hypothesis
Represented by Ha or H1, where its states ‘there is a difference between groups’. In other words, there
seems a relationship between risk factors and outcomes.
c. P-value
The probability of obtaining a result at least as extreme as the current one, assuming the null is true. It
is a measurement of how much the observed data disagrees with the null hypothesis. When the p-value is
very small, it means there is more disagreement of the data with the null hypothesis, and this is when you
begin rejecting the null. When the p-value is high, it means that there is less disagreement of the data with
the null hypothesis, and this is when you fail to reject the null hypothesis.
d. Statistical significance
Is also considered level of significance or alpha. It is a selected cut-off point that determines whether
we consider a p-value acceptably high or low. When p < a, we can conclude that there is a statistical
difference between groups. When the p > a, we can conclude that there is no statistical difference between
groups. A 5% alpha is often used based on the consensus of researchers. It implies that having a 5% prob
of incorrectly rejecting the null is acceptable.
e. Type 1 error and Type 2 error
When you reject the null hypothesis, there’s a chance that a mistake was made. This means you
rejected a hypothesis that is true or fail to reject a hypothesis that is false. Type 1 error is when you
incorrectly reject the null hypothesis. An example would be if the researchers say that there is a difference
between groups where actually there isn’t. It is also known as a false positive study result. Alpha is the
probability of making a type 1 error. Type 2 error is when you fail to reject the null where in reality, you
should have rejected the null. It is also known as a false negative study result. Beta is the probability of
making a type 2 error. Power is the probability of not making a type 2 error.
SECTION G
a. What are confidence intervals
Statistics is about estimation. The sample mean is an estimate for the true average. Confidence interval
is a range or values we are sure our true values lie in. It is identified through number observations, mean
and standard deviation.
SECTION H
a. Normality tests in SPSS
In research, it is important for us to know if the distribution is normal or not. After the data view was
filled with the necessary data. Click the analyze ribbon and, descriptive statistics and explore. Histograms
are used to determine if data are normally distributed or not.
SUMMARY OF LEARNINGS
Clearly, inferential statistics allow us to make predictions and conclusions based on our data. It
helps us determine whether there is a relationship between variables. I gained a lot of idea and learnings
based on the lecture notes and videos provided. With inferential statistics, we can either prove or disprove
theories, determine associations between variables and determine if findings are significant or not. For
me, the most highlighted learning that I will always remember is to determine whether the findings from
the sample can be personalized to the entire population. Inferential statistics therefore have two primary
uses, to make estimates about a population and test hypothesis for researchers to draw conclusions. Also,
I have learned that inferential statistics is important in healthcare as it is used in forming predictions,
theories and summarizations about a population. Through identifying statistical trends, health care
providers will be able to monitor current local conditions and compare it with provincial, national, and
international states. Overall, I learned about hypothesis, normal distribution, normality tests, statistical
tests, measures of validity, the importance of sampling and others mentioned above.