Biostatistics World

Allergy & Immunology 24
General principles 24
Infectious diseases 201
Miscellaneous (Multisystem) 40
Poisoning & environmental exposure 26
Psychiatric / behavioral & substance abuse 123
Social sciences (ethics/legal/professional) 126
Pediatrics
Behavioral science
 Human development
 Human sexuality
 Defense mechanism
 Changing patient’s behavior
 Ethical and legal issues
 Normal sleep and sleep disorders
 Substance abuse
 Psychiatric diagnosis and related treatment
4o mini
Adverse event
Bias
Binomial and poisson distribution
Case control studies
Causality
Cohort studies
Confounding, effect modification, bias, errors
Unknown residual confounders may impact the statistical analysis of study results.
 Randomization removes the effects of known and unknown confounders.
Correlation coefficient
Cross section study
Hazard ratio
Health indicators (individual, population)
Hypothesis testing
Intention to treat analysis
Life expectancy and survival statistics
Likelihood ratio
Measures of association
Meta analysis
Morbidity and mortality
Number needed to harm
Number needed to treat
Odds ratio
P value and confidence interval
Patient safety
Pearson and spearman correlations
Power and sample size
Prevention levels
Principles and methods of epidemiology
Randomized control trials
Receive operating characteristic
Regression techniques
Risk
Risk, rate, prevalence and incidence
Sensitivity and specificity

Sensitivity, specificity, NPV, PPV
Specific epidemiologic patterns of disease
Statistical test v/s clinical significance
Statistical tests
The paired t-test compares the mean of 2 related groups. The test requires that a quantitative
dependent variable (i.e. outcome) be evaluated in 2 related (i.e. matched, paired) groups.
The two-sample t-test is a statistical method that is commonly employed to compare the means
of two groups of subjects.
A chi-square test evaluates the relationship between 2 categorical variables. It requires a

categorical independent variable (e.g. exposure or intervention) that is used to divide study
participants into 2 groups and a categorical dependent variable (e.g. outcome). The chi-square
test is used to compare proportions. A 2 x 2 table may be used to compare the observed values
with the expected values.
The independent samples t-test compares the mean of 2 independent groups. The test requires
that a quantitative dependent variable (e.g. outcome) be evaluated in 2 independent groups.
The analysis of variance test compares the mean of > 3 groups. The test requires a quantitative
dependent variable (e.g. outcome) and a categorical independent variable (e.g. exposure, risk
factor).
Analysis of variance (ANOVA) is used to compare the means of independent groups; it

determines only whether at least 2 group means differ. Group comparisons are follow-up tests
used to determine which group means differ. A confidence interval for the difference in group
means that includes 0 indicates no significant difference in group means, and vice versa.
Analysis of variance (ANOVA) compares the means of independent groups and determines only
whether at least 2 groups means differ. Nonoverlapping confidence intervals always imply a
statistically significant difference between groups; however, the opposite is not necessarily true.
An association between a risk factor and an outcome is more likely to be causal if its strength
increases as the exposure level increases (i.e. dose-response relationship or biological gradient).
Although a dose-response relationship is a highly important factor of causality, it is not
necessary to infer causation.
Know how to interpret the strength of association and dose-response relationship from a study.
Study designs
Validity and reliability

Phase III trials evaluate the efficacy, risks, and benefits of a new treatment compared to current
standard treatments (or placebo) in a large sample of affected patients (i.e. patients with the
disease of interest). Most phase III trials are randomized, blind, and compare > 2 treatments.
The Kaplan-Meier survival curve is a commonly used graphic representation of the probabilities
that subjects survive in a study without an event of interest. The event-free survival rates of > 2
study groups can be compared; they are statistically different only if the p-value is less than 0.05.
Scatter plots are useful for crude analysis of data. They can demonstrate the type of association
(i.e. linear, non-linear), if any is present.
When comparing the effects of a treatment on a composite outcome, it is important to note any
differences between the individual endpoints.
The applicability of study results beyond the specific population studied depends on whether
particular patients are similar to those studied. Compliance to treatments can be an important
factor.
I. Study Design and Methodology
A. Types of Studies
Abstract
Objectives: This review aims to explore various study designs in biostatistics, emphasizing their
methodologies, objectives, strengths, limitations, and applications in clinical and public health
research. Specifically, it focuses on randomized controlled trials (RCTs), cohort studies, case-
control studies, cross-sectional studies, case series, systematic reviews, meta-analyses, ecological
studies, and vaccine efficacy studies.
Methods: Each study design is critically examined for its unique approach to data collection and
analysis, the temporal relationship between exposures and outcomes, and its capacity to infer
causality or associations. Common measures of association, including relative risk (RR) and
odds ratio (OR), are discussed in relation to each design.
Results:
 Randomized Controlled Trials (RCTs): RCTs minimize bias through randomization

and are the gold standard for causal inference. They effectively compare outcomes in
treatment versus control groups, with relative risk as a key measure.
 Cohort Studies: These studies assess the incidence of diseases in exposed and unexposed
groups, providing robust data on disease risk, especially in prospective designs.
Retrospective cohorts rely on historical data for analysis.
 Case-Control Studies: Ideal for rare diseases or conditions with long latent periods,
these studies retrospectively evaluate exposure history in cases and controls. Odds ratio
serves as the primary measure of association.
 Cross-Sectional Studies: These "snapshot" studies measure prevalence and associations
between exposures and outcomes but lack temporal resolution to establish causality.
 Case Series: Descriptive in nature, case series provide valuable insights into rare
conditions or treatment outcomes but lack comparative analysis.
 Systematic Reviews and Meta-Analyses: These synthesize data from multiple studies to
enhance statistical power and resolve inconsistencies, though they are susceptible to
publication bias.
 Ecological Studies: These studies analyze population-level data to explore associations
between exposures and outcomes but may suffer from ecological fallacy.
 Vaccine Efficacy Studies: These studies quantify the risk reduction provided by
vaccines, offering critical evidence for public health interventions.
Conclusions: Different study designs serve distinct roles in biostatistics, with RCTs providing
the strongest evidence for causal relationships, while observational studies are indispensable for
understanding associations, generating hypotheses, and informing public health strategies.
Researchers must carefully choose the appropriate design based on the research question, ethical
considerations, and feasibility constraints.
Keywords: randomized controlled trials, cohort studies, case-control studies, cross-sectional

studies, case series, systematic reviews, meta-analyses, ecological studies, vaccine efficacy,
biostatistics, study design
Randomized Controlled Trials (RCTs)
Experimental studies (e.g. randomized control trials) can help establish causal relationships
whereas observational studies (e.g. case series) only suggest associations. Results of
observational studies do not provide enough scientific evidence to make clinical decisions (e.g.
treating patients with new drugs or therapies).
It is used to estimate and compare the incidence of an outcome in groups of individuals

randomly assigned to one of several treatments or placebo.
A common measure of association is relative risk.
In clinical trials, randomization is said to be successful when a similarity of baseline

characteristics of the patients in the treatment and placebo groups is seen.
Successful randomization in a clinical trial allows a study to eliminate bias in treatment

assignments. An ideal randomization process minimizes selection bias, results in near-equal
treatment and control group sizes, and achieves a low probability of confounding variables.
 Cohort Studies (Prospective and Retrospective)

o A cohort study is an analytical study design that estimates and compares the risk
of developing disease between exposed and nonexposed groups.
o A cohort study begins by identifying individuals as exposed or unexposed to a
risk factor and ends by comparing outcome (i.e. disease) rates between the 2
groups. In a prospective cohort design, the study begins before the outcome has
occurred, whereas in a retrospective cohort design the study begins after the
outcome has occurred.
o A cohort study begins by identifying individuals as exposed or nonexposed to 1
risk factor and ends by comparing the incidence of > 1 outcomes (i.e. disease)
between the 2 groups.
o A cohort study design is best for determining the incidence of a disease.
Comparing the incidence of the disease in 2 populations, with and without a given
risk factor, allows for calculation of relative risk.
 Case-Control Studies
o A case-control study is used to compare the exposure of people with the disease
(cases) to the exposure of the people without the disease (controls). The main
measure of association is the exposure odds ratio.
o Case-control studies can consider only 1 outcome (i.e. disease) per study but can
evaluate exposure to > 1 risk factors.
o In a case-control study, if the outcome is uncommon in the population, the odds
ratio is a close approximation of the relative risk (‘rare disease assumption’).
o A case-control study is an observational study design. It begins with individuals
who have the outcome (i.e. cases) and compares them with individuals who do
not have the outcome (i.e. controls), according to a history of exposure to > 1 risk
factors.
o Case-control studies can consider only 1 outcome (i.e. disease) per study but can
evaluate exposure to > 1 risk factors.
o The critical distinction between case-control and retrospective cohort studies is
the order in which outcomes and risk factors are assessed. Case-control studies
determine the outcome first and then look for associated risk factors; retrospective
cohort studies first ascertain risk factor exposure and then determine the outcome.
o A case-control study is used to compare the exposure status of people with a
disease (i.e. cases) to the exposure status of people without the disease (i.e.
controls). The main measure of association is the odds ratio.
o The case-control study is the most appropriate analytical design to study
uncommon (rare) diseases and diseases with a long latent period.
 A case series is a descriptive observational study in which a (generally small) group of
patients with a similar diagnosis or treatment is described at a point in time or followed
over a certain period. This study design includes no comparison group, so it cannot
compare the effectiveness of different treatments and does not provide enough evidence
to affect clinical decisions (e.g. treating patients with new drugs or therapies).
 Cross-Sectional Studies
o A cross-sectional study is an observational study commonly used to estimate the
prevalence of > 1 disease. It may also be used to examine associations between
potential risk factors and disease. However, this study design may be affected by
late-look (Neyman) bias.
o A cross-sectional study is an observational study commonly used to estimate the
prevalence of disease, but it may also be used to examine associations between
potential risk factors and disease. However, exposure to risk factors and disease
status are measured simultaneously at a particular point in time (i.e. snapshot).
o In a cross-sectional study, risk factor and outcome are measured simultaneously at
a particular point in time (snapshot study). In other study designs, a certain period
separates the exposure from the outcome.
o Cross-sectional studies are examples of observational studies. They are often
relatively quick and easy to perform, are useful in determining prevalence of an
outcome, and can serve to generate hypotheses.
 Systematic Reviews and Meta-Analysis
 In ecological studies, the unit of observation is the population, not the individual. Disease
rates and exposures are measured in each of a series of populations and their associations
determined. Results about associations at the population level may not translate to the
individual level. In ecological studies, the unit of observation is the population. Disease
rates and exposures are measured in each of a series of populations, and their association
is determined. Therefore, results about associations at the population level may not
translate to the subject level.
 Vaccine efficacy is calculated by estimating the risk of disease in a group of vaccinated
individuals and in a group of unvaccinated individuals (i.e. control group) and then
determining the percentage reduction among vaccinated individuals relative to
unvaccinated individuals.
B. Study Design Concepts
Abstract
Objectives: This review explores key concepts in study design, focusing on the principles of
epidemiology, selection of study populations, blinding, randomization, intention-to-treat versus
per-protocol analyses, validity, and reliability. It aims to provide a comprehensive understanding
of methodological foundations critical for conducting and interpreting clinical and
epidemiological research.
Methods: Each concept is examined for its role in ensuring scientific rigor, minimizing bias, and
improving the applicability of study results. The advantages and limitations of different
approaches are discussed, with examples from clinical and epidemiological contexts.
Results:
 Principles of Epidemiology: Foundational principles guide study design, ensuring

alignment with research objectives and population characteristics.
 Selection of Study Population: Careful selection is crucial for the relevance and
generalizability of study findings.
 Blinding and Randomization: Blinding minimizes bias by concealing intervention
assignments, while randomization ensures balanced distribution of confounders,
enhancing internal validity.
 Intention-to-Treat vs. Per-Protocol Analysis: Intention-to-treat analysis preserves
randomization and mirrors real-world effectiveness, while per-protocol analysis estimates
efficacy under ideal conditions, potentially inflating the effect size.
 Validity: Internal validity ensures accurate causal inferences within a study, while
external validity pertains to the generalizability of findings across populations.
 Reliability and Reproducibility: Precision and accuracy are critical for trustworthy
results. Increasing sample size enhances precision, and using validated tools ensures
consistent and accurate measurements.
Conclusions: Methodological rigor is foundational to the credibility and applicability of research

findings. Blinding, randomization, and adherence to principles of validity and reliability are
essential to reduce bias, enhance causal inference, and ensure that study results are both reliable
and generalizable. A nuanced understanding of these concepts enables researchers to design
robust studies and critically evaluate existing literature.
Keywords: study design, blinding, randomization, validity, reliability, intention-to-treat, per-

protocol analysis, epidemiology, reproducibility, clinical research
4o
 Principles of Epidemiology
 Selection of Study Population
 Blinding and Randomization
o Blinding is the concealment of intervention assignments from individuals
involved in a clinical research study. The optimal strategy to maximize unbiased
ascertainment of outcomes is to blind as many individuals as possible in a clinical
study, especially those collecting and analyzing the data.
 Intention-to-Treat vs. Per-Protocol Analysis
o Per-protocol analysis includes only those participants who strictly adhered (i.e.
non-dropouts) to the study protocol. This approach tends to provide an estimate of
the true effect of an intervention for a perfect scenario, but it overestimates the
effect of the intervention in a practical clinical setting.
o Per-protocol analysis includes only those participants who strictly adhered to and
completed the study protocol (i.e. nondropouts). This approach tends to provide
an estimate of the true effect of an intervention for a perfect scenario, but it can
overestimate the effect of the intervention in a practical clinical setting.
 Validity (Internal and External)
o Generalizability or external validity pertains to the applicability of study results to
other populations (e.g. the results of a study in middle-aged women would not be
expected to be applicable to elderly men). The external validity (or
generalizability) of a study is the applicability of study results beyond the specific
population studied. Results from a randomized controlled trial conducted in a very
specific population of patients are generalizable only to that specific subgroup of
the population.
o Internal validity refers to the ability of a research design to provide evidence of a
causal relationship between a treatment and an outcome. Blinding increases the
internal validity of a study.
 Reliability and Reproducibility
o Precision is the measure of random error. The tighter the confidence interval, the
more precise the result. Increasing the sample size increases precision.
o A precise tool is one that consistently provides a very similar or identical value
when measuring a fixed quantity. An accurate tool is one that provides a
measurement similar to the actual value or within a specified range of a reference
value (as reflected by a gold-standard measurement).
C. Bias and Errors
Abstract
Objectives: This study examines the types and impacts of biases and errors in research,
emphasizing strategies to mitigate their effects on validity and reliability. Key areas include
selection bias, information bias, confounding variables, effect modification, and other specific
forms such as reporting and observer bias.
Methods: A detailed exploration of each bias type was conducted, analyzing its mechanisms,
implications for study results, and mitigation strategies. Real-world examples and clinical
scenarios are included to illustrate concepts, alongside evidence-based practices for minimizing
these biases during study design and analysis.
Results:
 Selection Bias: Subtypes such as volunteer, susceptibility, and attrition bias can threaten
internal and external validity. Randomization and intention-to-treat analyses are critical
for minimizing these biases.
 Information Bias: Includes errors in data collection or reporting, often leading to
inaccurate study outcomes. Observer and response biases are notable examples.
 Confounding Variables: Confounding occurs when external variables obscure true
associations. Strategies such as randomization, matching, and stratified analysis are
effective in controlling confounding.
 Effect Modification: This occurs when a third variable alters the strength or direction of
an independent variable's effect on an outcome. Stratified analyses help distinguish
between effect modification and confounding.
 Specific Bias Types:
o Reporting and Hawthorne biases affect self-reported and observational data.
o Lead-time bias in screening tests can overstate survival benefits without altering
disease prognosis.
o Berkson bias highlights the challenges of using hospitalized controls, limiting
generalizability.
o Latency periods demonstrate the delayed effects of exposure or pathogenesis on
outcomes.
Conclusions: Awareness and management of biases are critical for ensuring the validity and
reliability of study findings. Employing robust study designs, blinding, randomization, and
rigorous data analysis are essential to minimize bias and error. Researchers must also consider
the natural history of diseases and the potential variability in treatment outcomes based on
population characteristics.
Keywords: bias, selection bias, information bias, confounding variables, effect modification,
reporting bias, observer bias, lead-time bias, study validity, research methodology.
 Know the different kinds of bias, which can decrease the validity of study results.
 Consider the natural history of a disease when evaluating the effectiveness of a drug in a
trial.
 Given different disease manifestations, the same treatment may lead to significantly
different clinical outcomes.
 Selection Bias
o Differential loss to follow-up is a subtype of selection bias and represents a threat
to the internal validity of a study.
o Volunteer bias (a subtype of selection bias) results when a sample of volunteers
(e.g. nonrandom sample method) misrepresents its target population and threatens
the generalizability (i.e. external validity) and utility of findings in a clinical
setting.
o When the treatment regimen selected for a patient depends on the severity of the
patient’s condition, a form of selection bias known as susceptibility bias
(confounding by indication) can result. To avoid selection bias in studies, patients
are randomly assigned to treatments to minimize potential confounding variables.
Many studies also perform an intention-to-treat analysis, which compares the
initial randomized treatment groups (the original intention) regardless of the
eventual treatment.
o Loss to follow-up in prospective studies creates a potential for attrition bias, a
subtype of selection bias. When a substantial number of subjects are lost to
follow-up, the study may overestimate or underestimate the association between
the exposure and the disease. Investigators try to achieve high rates of follow-up
to reduce the potential for attrition bias.
o Differential loss to follow-up is a subtype of selection bias and represents a threat
to the internal validity of a study.
o Nonrandom treatment assignment may lead to selection bias because study
participants may have an unequal chance of receiving the treatments of interest.
 Information Bias
 Confounding Variables
o Confounding bias occurs when an extraneous (i.e. confounding) variable masks
the association between an independent variable (e.g. risk factor, treatment) and a
dependent variable (i.e. outcome, disease of interest). It may lead to false
conclusions about the association (i.e. statistically significant associations that are
truly invalid).
the relationship between an independent variable (e.g. risk factor, treatment) and a
conclusions (i.e. statistically significant results that are truly invalid).
the relationship between an independent variable (e.g. risk factor, treatment) and a
conclusions (i.e. statistically significant results that are truly invalid). Failure to
adjust for potential confounding factors may lead to residual confounding and a
distortion of the true relationship between variables.
o Matching is frequently used in case-control studies because it is an efficient
method to control confounding. Remember: matching variables should always be
the potential confounders of the study (e.g., age, race). Cases and controls are then
selected based on the matching variables, such that both groups have a similar
distribution in accordance with the variables.
o A confounder is an extraneous factor that has properties linking it with both the
exposure and the outcome of interest.
o Randomization is used to control for confounders during the design stage of a
study. It helps to control for known, unknown, and difficult-to-measure
confounders.
o Know the concept of confounding. Distinguish between crude and adjusted
measures of association. Confounding refers to the bias that can result when the
exposure-disease relationship is mixed with the effect of extraneous factors (i.e.,
confounders).
 Effect Modification
o Effect modification occurs when the magnitude or direction of the effect of the
independent variable on the dependent variable (outcome) varies depending on
the level of a third variable (effect modifier). Separate (stratified) analyses should
be conducted for each level of the effect modifier.
o Effect modification results when an external variable positively or negatively
impacts the effect of a risk factor on the disease of interest. Stratified analysis
helps determine whether a variable is a confounder or an effect modifier.
 Reporting bias may occur if subjects over- or under-report exposure history due to
perceived social stigmatization.
 Observer bias
o They occur when the investigator’s decision is adversely affected by knowledge
of the exposure status.
o They occur when observers (e.g. researchers, physicians) misclassify data (e.g.
treatment outcomes) due to preconceived expectations regarding the treatment.
 Hawthorne effect is the tendency of the study population to affect the outcome since
they are aware that they are being studied.
 Response bias occurs when participants purposely give desirable responses to questions
about topics perceived to be sensitive (e.g. health behaviors). This practice results in
responses that are inaccurate and may lead to incorrect conclusions (e.g. lower than
expected prevalence of disease or frequency of risk factors).
 Understand the concept of lead-time bias in screening tests.
o The typical example of lead-time bias is prolongation of apparent survival in
patients to whom a test is applied, without changing the prognosis of the disease.
 The concept of a latency period can be applied to both disease pathogenesis and
exposure to risk modifiers. Exposure to risk factors and the initial steps in disease
pathogenesis sometimes occur years before clinical manifestations are evident. In
addition, exposure to risk modifiers may need to be continuous over a certain period
before influencing the outcome.
 Berkson bias occurs when controls are chosen from among hospitalized patients only,
resulting in a potential bias (e.g. hospitalized controls may have an exposure related to
the outcome of interest) and limiting generalizability.
II. Measures of Disease Frequency and Risk
A. Epidemiological Measures
Abstract
Objectives: This study outlines key epidemiological measures, including risk, rate, and odds
ratio, used to quantify disease frequency and assess associations between exposures and
outcomes. Additionally, concepts like attributable risk and risk reduction are explored to enhance
understanding of clinical trial and observational study results.
Methods: Fundamental principles and formulas for epidemiological metrics are detailed, with
explanations of their applications in research settings. Statistical significance is evaluated
through confidence intervals and adjusted measures to control for confounding variables.
Results:
 Risk and Relative Risk (RR):

o Risk quantifies the probability of disease occurrence in a population over time.
o Relative risk (RR) compares disease risk between groups. RR < 1 indicates
reduced risk; RR > 1 indicates increased risk.
o Confidence intervals (CI) excluding the null value (RR = 1) signify statistical
significance. Adjusted RRs account for confounding.
o Absolute Risk Reduction (ARR): Measures the decrease in risk due to an
intervention: ARR=Rate[control]−Rate[treatment]\text{ARR} = \
text{Rate[control]} - \text{Rate[treatment]}ARR=Rate[control]−Rate[treatment].
o Relative Risk Reduction (RRR): Quantifies risk reduction relative to the control
group: RRR=ARR/Risk[control]\text{RRR} = \text{ARR} / \
text{Risk[control]}RRR=ARR/Risk[control] or RRR=1−RR\text{RRR} = 1 - \
text{RR}RRR=1−RR.
 Rate: Incidence vs. Prevalence:
o Incidence measures new cases within a time frame.
o Prevalence captures total cases at a specific point.
o Stable incidence with rising prevalence may indicate prolonged disease duration
due to improved management.
 Odds Ratio (OR):
o Used in case-control studies to measure association between exposure and
outcome.
o OR < 1 suggests a protective effect; OR > 1 suggests increased risk.
o Confidence intervals excluding the null value (OR = 1) confirm statistical
significance.
o Formula: OR=adbc\text{OR} = \frac{\text{ad}}{\text{bc}}OR=bcad, using
contingency table values.
 Risk Difference and Attributable Risk (ARP):
o ARP quantifies the proportion of disease in the exposed group attributable to the
exposure.
o Formula: ARP=(Risk[exposed]−Risk[unexposed]Risk[exposed])×100\text{ARP}
= \left( \frac{\text{Risk[exposed]} - \text{Risk[unexposed]}}{\
text{Risk[exposed]}} \right) \times 100ARP=(Risk[exposed]Risk[exposed]
−Risk[unexposed])×100.
o ARP can also be derived using ARP=RR−1RR\text{ARP} = \frac{\text{RR} - 1}
{\text{RR}}ARP=RRRR−1.
Conclusions: Epidemiological measures provide essential tools for assessing disease risk,
evaluating interventions, and interpreting study outcomes. Understanding and applying these
measures are critical for evidence-based decision-making in clinical and public health research.
Keywords: epidemiology, risk, relative risk, absolute risk reduction, relative risk reduction,
incidence, prevalence, odds ratio, attributable risk, statistical significance.
 Risk: Absolute Risk, Relative Risk (RR)
Risk is the probability of getting a disease over a certain period of time. To calculate the risk,
divide the number of diseased subjects by the total number of subjects at risk (i.e. all the people
at risk).
A relative risk (RR) < 1.0 indicates a lower risk for the outcome in the treatment group; a RR =
1.0 indicates no association between treatment and outcome; and a RR > 1.0 indicates a greater
risk for the outcome in the treatment group. A confidence interval (CI) that includes the null
value for RR (i.e. RR = 1.0) is not statistically significant, and a CI that excludes the null value
(i.e. RR = 1.0) is statistically significant. Relative risk (RR) is a measure of association between
an exposure and an outcome. Confidence intervals that contain the null value (i.e. RR = 1.0)
indicate that the association is not statistically significant. Adjusted RRs minimize confounding
and provide better estimates of associations.
Absolute risk reduction (ARR) describes the difference in the rate (or risk) of an unfavorable
outcome between control and treatment groups. It is calculated as follows: ARR = (Rate[control]
- Rate [treatment]). The rate of an unfavorable outcome is equal to the number of unfavorable
outcomes in a group divided by the sample size of that group.
Relative risk reduction (RRR) measures how much a treatment reduces the risk of an unfavorable
outcome. RRR may be calculated using the absolute risk reduction (ARR) or the relative risk
(RR): RRR = ARR / Risk[control] = (Risk[control] - Risk[treatment]) / Risk[control]; RRR = 1 -
RR. Relative risk reduction (RRR) measures how much a treatment reduces the risk (i.e. rate) of
an unfavorable outcome. RRR may be calculated using absolute risk reduction (ARR) or relative
risk (RR) as follows: RRR = ARR / Rate[control] = (Rate[control] - Rate[treatment]) /
Rate[control]; RRR = 1 - RR.
When comparing the relative risk (RR) of the same outcome in 2 cohorts: RR = rate of outcome
in cohort 1 / rate of outcome of cohort 2.
 Rate: Incidence and Prevalence

o Know the difference between incidence and prevalence. Incidence is the measure
of new cases, the rapidity with which they are diagnosed. Prevalence is the
measure of the total number of cases at a particular point in time.
o An increasing prevalence but stable incidence of a disease can be attributed to
factors that prolong the duration of the disease (e.g. improved quality of care and
disease management).
 Odds Ratio (OR)
o The odds ratio is a measure of association used in case-control studies to compare
odds of exposure in cases (i.e. individuals with a condition) relative to controls
(i.e. individuals without a condition). The odds ratio (OR) is a measure of
association between an exposure and an outcome. Confidence intervals that
contain the null value (i.e. OR = 1.0) indicate that the association is not
statistically significant. Adjusted ORs minimize confounding and provide better
estimates of associations.
o The odds ratio (OR) is a measure of association used in case-control studies. It
quantifies the relationship between an exposure and a disease; its null value (i.e.
null hypothesis) is always 1 (i.e. OR = 1).
o The odds ratio (OR) is a measure of association between an exposure and a
disease. For OR < 1.0, the exposure decreases the odds of disease (i.e. protective
factor). For OR = 1.0, there is no association between exposure and disease. For
OR > 1.0, the exposure increases the odds of disease (i.e. risk factor). Confidence
intervals that contain the null value (i.e. OR = 1.0) indicate that the association is
not statistically significant.
o The odds ratio (OR) is a measure of association calculated as: OR = (odds of
exposure in cases) / (odds of exposure in controls). For a correctly formatted
contingency table: OR = ad / bc (or other mathematically equivalent equations).
o The odds ratio (OR) is a measure of association calculated as follows: OR = (odds
of exposure in cases) / (odds of exposure in controls). For a contingency table in
the standard format: OR = ad / bc.
o An odds ratio (OR) < 1 indicates that the exposure (e.g. caffeine) is associated
with a decrease in odds of an outcome (e.g. fecundability). A confidence interval
that excludes the null value (i.e. OR = 1) is statistically significant.
 Risk Difference and Attributable Risk
o Attributable risk percentage (ARP) describes the percentage of disease in an
exposed group that can be attributed to the exposure. It is calculated as the
difference in risk of disease between the exposed and unexposed groups divided
by the risk of disease in the exposed group: ARP = [(Risk[exposed] -
Risk[unexposed]) / Risk[exposed]] x 100.
o ARP represents the excess risk in the exposed population that can be attributed to
the risk factor. It can be easily derived from the relative risk using the following
formula: ARP = (RR - 1) / RR.
B. Patient-Oriented Metrics
 Number Needed to Treat (NNT)

o The number needed to treat (NNT) is defined as the number of people that need to
receive a treatment to prevent 1 additional adverse event. It is calculated as the
inverse of the absolute risk reduction (ARR).
o The number needed to treat is the reciprocal of the absolute risk reduction rate
from an intervention. This number is used by some clinicians to present
probabilities regarding outcomes to patients.
o The number needed to treat (NNT) is the number of patients who need to receive
a treatment to prevent 1 additional negative event. NNT is the inverse of the
absolute risk reduction. The lower the NNT, the more effective the treatment
because fewer patients need to be treated to prevent 1 additional negative event.
o The interpretation of the number needed to treat (NNT) should always include
information about the comparison group, the specific outcome, and the period of
observation for the outcome. NNT values for the same intervention or treatment
will vary depending on the outcome, the period of observation, and the
comparison group used.
o The number needed to treat (NNT) is the number of patients who need to receive
a treatment to prevent 1 additional negative event. NNT is the inverse of the
absolute risk reduction. The lower the NNT, the more effective the treatment
because fewer patients need to be treated to prevent 1 additional negative event.
 Number Needed to Harm (NNH)
o The number needed to harm (NNH) is the number of people who must be exposed
to a treatment to cause harm to 1 person who otherwise would not have been
harmed. To calculate the NNH, the absolute risk increase (ARI) between the
treatment and control group must be known: NNH = 1 / ARI.
 Hazard Ratio
o Hazard ratios are proportions that indicate the chance of an event occurring in
the treatment group compared to the chance of the event occurring in the control
group. When reviewing a drug advertisement, it is important to critically read all
the presented information.
o Hazard ratios are the ratio of an event rate occurring in the treatment arm versus
the non-treatment arm.
o Ratios less than 1 indicate that the treatment arm had a lower event rate while
ratios higher than one indicate the treatment arm had a higher rate of events.
o Hazard ratios are the ratio of an event rate occurring in the treatment group
versus the non-treatment group. 4
o Ratios < 1 indicate that the treatment group had a lower event rate and ratios >
1 indicate that the treatment group had a higher event rate.
o Hazard ratio (HR) measures the effect of an intervention on an outcome over
time. A confidence interval for the HR that excludes the null value (HR = 1.0) is
statistically significant.
o Hazard ratios are proportions that indicate the chance of an event occurring in the
treatment group compared to the chance of the event occurring in the control
group. When reviewing a drug advertisement, it is important to critically read all
the presented information.
C. Population Health Metrics
 Morbidity and Mortality

 Life Expectancy and Survival Rates
 Health Indicators (Individual and Population Levels)
III. Data Analysis and Statistical Interpretation
A. Statistical Measures
 P-Value and Hypothesis Testing

o Statistical significance is reflected by a p-value less than a given significance
level (e.g. p < 0.05).
o The p-value is the probability of observing a given (or more extreme) result due to
chance alone, assuming the null hypothesis is true. A result is generally
considered statistically significant when p < 0.05.
o The power of a study represents its ability to detect a difference between 2 groups
(e.g. exposed versus unexposed) when there truly is a difference. Increasing the
sample size increases the power of a study and consequently narrows the
confidence interval surrounding the point estimate. Confidence intervals express
statistical significance and are interrelated with p-values.
o Studies with larger sample sizes have greater statistical power, and consequently
a lower probability of a type II error, than studies with smaller sample sizes.
o The null hypothesis is the statement of no relationship between the exposure and
the outcome. To state the null hypothesis correctly, the study design should be
considered.
o The p-value is the probability of obtaining a result by chance alone when a null
hypothesis H0 is assumed to be true. Its magnitude is compared to a
predetermined significance level α (usually set at 5% or 1%) to determine
statistical significance. A p-value less than the significance level α indicates that
results will occur by chance less than α% when H0 is assumed to be true.
o Statistically significant changes are reflected by a p-value less than a given
significance level (e.g. p < 0.01).
o The null value of a confidence interval (CI) for a difference (percent change or
mean change) between 2 groups is always 0, indicating that there is no difference
between the groups. A CI that includes the null value (i.e. 0) is not statistically
significant, and a CI that excludes the null value is statistically significant.
o Comparisons between groups are important to determine the full impact of an
exposure (e.g. treatment, intervention) on an outcome. Different measures of
effect (e.g. relative risk, mean difference between groups) may be used in a single
study to evaluate different aspects of the effect on an exposure.
o Type II error describes a study’s failure to detect an effect (e.g. difference
between groups) when one truly exists. The probability of a type II error is
affected by sample size, outcome variability, effect size, and significance level.
 Confidence Intervals (CI)
o A confidence interval (CI) always contains the sample value at the center of the
interval. A CI that excludes a null value (i.e. difference in percentages = 0) is
statistically significant.
o A C% confidence interval is a range of plausible values calculated from sample
data that captures the true population value with a C% confidence level. An
interval may be used to evaluate claims about population values.
o A confidence interval (CI) with a greater confidence level is always wider than a
CI with a lower confidence level for the same data, and the sample value (e.g.
mean) is always the midpoint of the interval.
o A confidence interval (CI) with a greater confidence level is always wider than a
CI with a lower confidence level for the same data. A CI for a difference in means
is significant if the interval excludes the null value (i.e. 0), and it is not significant
if the interval includes the null value.
o The null value of a confidence interval (CI) for a difference in rates (proportions
or percentages) is always 0, indicating that there is no difference between the
rates. A CI that includes the null value (i.e. 0) is not statistically significant, and a
CI that excludes the null value is statistically significant.
o Confidence intervals and p values are interrelated and express the statistical
significance of a study. In a statistically significant study, p should be < 0.05. This
corresponds to a 95% confidence interval that does not include the null value.
 Likelihood Ratios (Positive and Negative)
 Measures of Central Tendency and Variability: Mean, Median, Mode, Standard
Deviation, Variance
o The median is the value that is located in the middle of a dataset. It divides the
right half of the data from the left half.
o An outlier is defined as an extreme and unusual value observed in a dataset.
 The mean is very sensitive to outliers and easily shifts toward them.
 The median and mode are more resistant to outliers.
o A normal distribution is symmetric and bell-shaped.
 All its measures of central tendency are equal: mean = median = mode.
o In a normal (bell-shaped) distribution: 68% of all values are within 1 standard
deviation from the mean; 95% of all values are within 2 standard deviations from
the mean; 99.7% of all values are within 3 standard deviations from the mean.
o A percentile is the value in a normal distribution that has a specified percentage
of observations below it (i.e. area under the curve to the left of the specific value).
 It is possible to use the 68-95-99.7 rule to identify the percentiles that
correspond to values that are 1, 2, or 3 standard deviations from the mean.
B. Regression and Correlation
 Linear and Logistic Regression

 Pearson and Spearman Correlation Coefficients
A regression analysis is a statistical technique used to describe the effect of > 1 quantitative or
qualitative independent (i.e. explanatory) variables on 1 quantitative dependent (i.e. outcome)
variable.
Factorial design studies involve randomization to different interventions with additional study of
2 or more variables.
The correlation coefficient (r) describes the direction (i.e. positive, negative) and strength of the
linear relationship between 2 quantitative variables. It does not necessarily imply causality.
The correlation coefficient r describes the direction (negative or positive) and the strength
(values closer to -1 or 1 indicate stronger relationships) of the linear relationship between 2
quantitative variables. It does not necessarily imply causality.
A regression analysis is a statistical technique used to describe the effect of > 1 independent
variables (e.g. exposures, risk factors), which may be quantitative or qualitative, on 1
quantitative dependent variable (i.e. outcome).
Stratification can help clarify the association between 2 variables.
C. Distributions and Probabilities
 Binomial and Poisson Distributions

 Normal Distribution and Z-Scores
D. Sensitivity and Specificity
 Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value

(NPV)
Know how to calculate the sensitivity and specificity of a test.

False negatives will increase when the cut-off level of a diagnostic test is raised.
More sensitive diagnostic tests have lower false negative rates than less sensitive tests. More
specific diagnostic tests have lower false positive rates than less specific tests.
Unlike sensitivity and specificity, positive and negative predictive values depend on the
prevalence of disease in the population being tested. A change in a test cutoff point that causes
an increase in the number of false positives and true positives will decrease positive predictive
value.
Raising the cut-off point (e.g. increasing the inclusion criteria) of a screening test results in an
increase in specificity and decrease in sensitivity.
Changing the cutoff point of a quantitative diagnostic test will inversely affect its sensitivity and
specificity. Typically, lowering the cutoff value will increase sensitivity (fewer false negatives)
and decrease specificity (more false positives). Screening tests need high sensitivity, and
confirmatory tests need high specificity.
Changing the cutoff value of a test in a way that alters the proportion of true-positive and false-
negative results will change the sensitivity. Likewise, a change in the test that modifies the
proportion of false-positive and true-negative results will change the specificity. Alterations in
test sensitivity and specificity, as well as changes in disease prevalence, will affect the positive
and negative predictive values.
Know how to calculate the predictive values of a test.
Predictive values change depending on the disease prevalence in a study population. As disease
prevalence increases, the positive predictive value increases and the negative predictive value
decreases, and vice versa.
Predictive values depend on disease prevalence. As disease prevalence increases, the positive
predictive value increases and the negative predictive value decreases.
Positive predictive value is the probability that an individual has a disease given a positive test
result. Negative predictive value is the probability that an individual does not have a disease
given a negative test result.
Negative predictive value (NPV) is the probability that an individual does not have a disease
given a negative test result. It is calculated as NPV = true negative / (true negative + false
negative).
NPV is the probability of being free of a disease if the test result is negative. Remember: the
NPV will vary with the pretest probability of a disease. A patient with a high probability of
having a disease will have a low NPV, and a patient with a low probability of having a disease
will have a high NPV.
Both the positive predictive value (PPV) and negative predictive value (NPV) of a test depend on
the prevalence of the disease of interest in the population in which the test is applied. PPV
increases and NPV decreases with an increase in prevalence.
If a test result is negative, the probability of having the disease is 1 - negative predictive value.
The accuracy of a diagnostic test is the probability that an individual is correctly classified by the
test: Accuracy = ( True positives + True negatives ) / Total number of individuals tested.
In a clinical setting, the pretest probability of disease is equal to the prevalence of disease in the
population of interest; it is used to calculate the pretest odds of disease as follows: Pretest odds
of disease = Pretest probability / (1 - Pretest probability)
 Receiver Operating Characteristic (ROC) Curves

o A shift in the receiver operating characteristic curve upward for a given cutoff
indicates increased sensitivity. A shift of the curve to the right for a given cutoff
point indicates a decrease in specificity.
o The receiver operating characteristic curve of a quantitative diagnostic test
demonstrates the trade-off between the test’s sensitivity and specificity at various
cutoff points. Changing the cutoff point to increase the true-positive rate (directly
proportional to sensitivity) will also increase the false-positive rate (inversely
proportional to specificity).
E. Advanced Topics
 Statistical Power and Sample Size Calculation

o Power describes a study’s ability to detect an effect (e.g. difference between
groups) when one truly exists. Power is influenced by 4 factors, as follows: ↑
sample size ↑ power; ↓ outcome variability ↑ power; ↑ effect size ↑ power; ↑
significance level ↑ power.
o Power describes a study’s ability to detect an effect (e.g. difference between
groups) when one truly exists. Power is influenced by 4 factors, as follows: ↑
sample size ↑ power; ↓ outcome variability ↑ power; ↑ effect size ↑ power; ↑
significance level ↑ power.
o Studies with larger sample sizes have greater power (i.e. a false H0 is rejected)
and therefore lower probability of a type II error than do studies with smaller
sample sizes.
 Effect Size and its Clinical Implications
IV. Critical Appraisal and Interpretation of Evidence
A. Evidence-Based Medicine Principles
 Systematic Reviews
 Meta-Analysis Techniques
 Causality in Epidemiology
B. Study Limitations and Errors
 Type I and Type II Errors

 Misclassification and Measurement Errors
 Observer and Recall Bias
C. Clinical Significance vs. Statistical Significance
 Interpreting Study Results for Clinical Application
V. Prevention and Public Health
A. Levels of Prevention
 Primary Prevention
 Secondary Prevention
 Tertiary Prevention
B. Health Outcomes
 Evaluating Interventions at Population and Individual Levels

 Disease Screening: Criteria and Effectiveness
C. Patient Safety
 Adverse Events Monitoring

o Continuity of care for medications at the time of transitions of care, between
inpatient and outpatient facilities and within inpatient facilities, is a potential
source of medical error. Interventions that target pharmacy personnel and high-
risk patients appear to be the most effective in improving the quality of patient
care.
 Quality Improvement and Safety Metrics
VI. Epidemiological Patterns and Applications
 Specific Patterns of Disease Distribution

 Social Determinants of Health
 Global and Local Disease Trends
Medicine
-Atherosclerosis
-Cervical cancer
-Genetic inheritance
-Diabetes mellitus
-Dilated cardiomyopathy

Biostatistics World

Uploaded by

Copyright:

Available Formats

Biostatistics World

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostatistics World

Uploaded by

Copyright:

Available Formats

Allergy & Immunology 24

Infectious diseases 201

Poisoning & environmental exposure 26

Psychiatric / behavioral & substance abuse 123

Social sciences (ethics/legal/professional) 126

Binomial and poisson distribution

Case control studies

Confounding, effect modification, bias, errors

 Randomization removes the effects of known and unknown confounders.

Health indicators (individual, population)

Intention to treat analysis

Life expectancy and survival statistics

Morbidity and mortality

Number needed to harm

Number needed to treat

P value and confidence interval

Pearson and spearman correlations

Power and sample size

Principles and methods of epidemiology

Randomized control trials

Receive operating characteristic

Risk, rate, prevalence and incidence

Sensitivity and specificity

Specific epidemiologic patterns of disease

Statistical test v/s clinical significance

A chi-square test evaluates the relationship between 2 categorical variables. It requires a

Analysis of variance (ANOVA) is used to compare the means of independent groups; it

Validity and reliability

I. Study Design and Methodology

 Randomized Controlled Trials (RCTs): RCTs minimize bias through randomization

Keywords: randomized controlled trials, cohort studies, case-control studies, cross-sectional

Randomized Controlled Trials (RCTs)

It is used to estimate and compare the incidence of an outcome in groups of individuals

A common measure of association is relative risk.

In clinical trials, randomization is said to be successful when a similarity of baseline

Successful randomization in a clinical trial allows a study to eliminate bias in treatment

 Cohort Studies (Prospective and Retrospective)

B. Study Design Concepts

 Principles of Epidemiology: Foundational principles guide study design, ensuring

Conclusions: Methodological rigor is foundational to the credibility and applicability of research

Keywords: study design, blinding, randomization, validity, reliability, intention-to-treat, per-

C. Bias and Errors

II. Measures of Disease Frequency and Risk

 Risk and Relative Risk (RR):

 Risk: Absolute Risk, Relative Risk (RR)

 Rate: Incidence and Prevalence

 Number Needed to Treat (NNT)

C. Population Health Metrics

 Morbidity and Mortality

III. Data Analysis and Statistical Interpretation

 P-Value and Hypothesis Testing

B. Regression and Correlation

 Linear and Logistic Regression

Stratification can help clarify the association between 2 variables.

C. Distributions and Probabilities

 Binomial and Poisson Distributions

D. Sensitivity and Specificity

 Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value

Know how to calculate the sensitivity and specificity of a test.