Biostatistics World
Biostatistics World
Biostatistics World
General principles 24
Miscellaneous (Multisystem) 40
Pediatrics
Behavioral science
Human development
Human sexuality
Defense mechanism
Changing patient’s behavior
Ethical and legal issues
Normal sleep and sleep disorders
Substance abuse
Psychiatric diagnosis and related treatment
4o mini
Adverse event
Bias
Causality
Cohort studies
Unknown residual confounders may impact the statistical analysis of study results.
Correlation coefficient
Cross section study
Hazard ratio
Hypothesis testing
Likelihood ratio
Measures of association
Meta analysis
Odds ratio
Patient safety
Prevention levels
Regression techniques
Risk
Statistical tests
The paired t-test compares the mean of 2 related groups. The test requires that a quantitative
dependent variable (i.e. outcome) be evaluated in 2 related (i.e. matched, paired) groups.
The two-sample t-test is a statistical method that is commonly employed to compare the means
of two groups of subjects.
The independent samples t-test compares the mean of 2 independent groups. The test requires
that a quantitative dependent variable (e.g. outcome) be evaluated in 2 independent groups.
The analysis of variance test compares the mean of > 3 groups. The test requires a quantitative
dependent variable (e.g. outcome) and a categorical independent variable (e.g. exposure, risk
factor).
Analysis of variance (ANOVA) compares the means of independent groups and determines only
whether at least 2 groups means differ. Nonoverlapping confidence intervals always imply a
statistically significant difference between groups; however, the opposite is not necessarily true.
An association between a risk factor and an outcome is more likely to be causal if its strength
increases as the exposure level increases (i.e. dose-response relationship or biological gradient).
Although a dose-response relationship is a highly important factor of causality, it is not
necessary to infer causation.
Know how to interpret the strength of association and dose-response relationship from a study.
Study designs
The Kaplan-Meier survival curve is a commonly used graphic representation of the probabilities
that subjects survive in a study without an event of interest. The event-free survival rates of > 2
study groups can be compared; they are statistically different only if the p-value is less than 0.05.
Scatter plots are useful for crude analysis of data. They can demonstrate the type of association
(i.e. linear, non-linear), if any is present.
When comparing the effects of a treatment on a composite outcome, it is important to note any
differences between the individual endpoints.
The applicability of study results beyond the specific population studied depends on whether
particular patients are similar to those studied. Compliance to treatments can be an important
factor.
A. Types of Studies
Abstract
Objectives: This review aims to explore various study designs in biostatistics, emphasizing their
methodologies, objectives, strengths, limitations, and applications in clinical and public health
research. Specifically, it focuses on randomized controlled trials (RCTs), cohort studies, case-
control studies, cross-sectional studies, case series, systematic reviews, meta-analyses, ecological
studies, and vaccine efficacy studies.
Methods: Each study design is critically examined for its unique approach to data collection and
analysis, the temporal relationship between exposures and outcomes, and its capacity to infer
causality or associations. Common measures of association, including relative risk (RR) and
odds ratio (OR), are discussed in relation to each design.
Results:
Conclusions: Different study designs serve distinct roles in biostatistics, with RCTs providing
the strongest evidence for causal relationships, while observational studies are indispensable for
understanding associations, generating hypotheses, and informing public health strategies.
Researchers must carefully choose the appropriate design based on the research question, ethical
considerations, and feasibility constraints.
Experimental studies (e.g. randomized control trials) can help establish causal relationships
whereas observational studies (e.g. case series) only suggest associations. Results of
observational studies do not provide enough scientific evidence to make clinical decisions (e.g.
treating patients with new drugs or therapies).
Abstract
Objectives: This review explores key concepts in study design, focusing on the principles of
epidemiology, selection of study populations, blinding, randomization, intention-to-treat versus
per-protocol analyses, validity, and reliability. It aims to provide a comprehensive understanding
of methodological foundations critical for conducting and interpreting clinical and
epidemiological research.
Methods: Each concept is examined for its role in ensuring scientific rigor, minimizing bias, and
improving the applicability of study results. The advantages and limitations of different
approaches are discussed, with examples from clinical and epidemiological contexts.
Results:
4o
Principles of Epidemiology
Selection of Study Population
Blinding and Randomization
o Blinding is the concealment of intervention assignments from individuals
involved in a clinical research study. The optimal strategy to maximize unbiased
ascertainment of outcomes is to blind as many individuals as possible in a clinical
study, especially those collecting and analyzing the data.
Intention-to-Treat vs. Per-Protocol Analysis
o Per-protocol analysis includes only those participants who strictly adhered (i.e.
non-dropouts) to the study protocol. This approach tends to provide an estimate of
the true effect of an intervention for a perfect scenario, but it overestimates the
effect of the intervention in a practical clinical setting.
o Per-protocol analysis includes only those participants who strictly adhered to and
completed the study protocol (i.e. nondropouts). This approach tends to provide
an estimate of the true effect of an intervention for a perfect scenario, but it can
overestimate the effect of the intervention in a practical clinical setting.
Validity (Internal and External)
o Generalizability or external validity pertains to the applicability of study results to
other populations (e.g. the results of a study in middle-aged women would not be
expected to be applicable to elderly men). The external validity (or
generalizability) of a study is the applicability of study results beyond the specific
population studied. Results from a randomized controlled trial conducted in a very
specific population of patients are generalizable only to that specific subgroup of
the population.
o Internal validity refers to the ability of a research design to provide evidence of a
causal relationship between a treatment and an outcome. Blinding increases the
internal validity of a study.
Reliability and Reproducibility
o Precision is the measure of random error. The tighter the confidence interval, the
more precise the result. Increasing the sample size increases precision.
o A precise tool is one that consistently provides a very similar or identical value
when measuring a fixed quantity. An accurate tool is one that provides a
measurement similar to the actual value or within a specified range of a reference
value (as reflected by a gold-standard measurement).
Abstract
Objectives: This study examines the types and impacts of biases and errors in research,
emphasizing strategies to mitigate their effects on validity and reliability. Key areas include
selection bias, information bias, confounding variables, effect modification, and other specific
forms such as reporting and observer bias.
Methods: A detailed exploration of each bias type was conducted, analyzing its mechanisms,
implications for study results, and mitigation strategies. Real-world examples and clinical
scenarios are included to illustrate concepts, alongside evidence-based practices for minimizing
these biases during study design and analysis.
Results:
Selection Bias: Subtypes such as volunteer, susceptibility, and attrition bias can threaten
internal and external validity. Randomization and intention-to-treat analyses are critical
for minimizing these biases.
Information Bias: Includes errors in data collection or reporting, often leading to
inaccurate study outcomes. Observer and response biases are notable examples.
Confounding Variables: Confounding occurs when external variables obscure true
associations. Strategies such as randomization, matching, and stratified analysis are
effective in controlling confounding.
Effect Modification: This occurs when a third variable alters the strength or direction of
an independent variable's effect on an outcome. Stratified analyses help distinguish
between effect modification and confounding.
Specific Bias Types:
o Reporting and Hawthorne biases affect self-reported and observational data.
o Lead-time bias in screening tests can overstate survival benefits without altering
disease prognosis.
o Berkson bias highlights the challenges of using hospitalized controls, limiting
generalizability.
o Latency periods demonstrate the delayed effects of exposure or pathogenesis on
outcomes.
Conclusions: Awareness and management of biases are critical for ensuring the validity and
reliability of study findings. Employing robust study designs, blinding, randomization, and
rigorous data analysis are essential to minimize bias and error. Researchers must also consider
the natural history of diseases and the potential variability in treatment outcomes based on
population characteristics.
Keywords: bias, selection bias, information bias, confounding variables, effect modification,
reporting bias, observer bias, lead-time bias, study validity, research methodology.
Know the different kinds of bias, which can decrease the validity of study results.
Consider the natural history of a disease when evaluating the effectiveness of a drug in a
trial.
Given different disease manifestations, the same treatment may lead to significantly
different clinical outcomes.
Selection Bias
o Differential loss to follow-up is a subtype of selection bias and represents a threat
to the internal validity of a study.
o Volunteer bias (a subtype of selection bias) results when a sample of volunteers
(e.g. nonrandom sample method) misrepresents its target population and threatens
the generalizability (i.e. external validity) and utility of findings in a clinical
setting.
o When the treatment regimen selected for a patient depends on the severity of the
patient’s condition, a form of selection bias known as susceptibility bias
(confounding by indication) can result. To avoid selection bias in studies, patients
are randomly assigned to treatments to minimize potential confounding variables.
Many studies also perform an intention-to-treat analysis, which compares the
initial randomized treatment groups (the original intention) regardless of the
eventual treatment.
o Loss to follow-up in prospective studies creates a potential for attrition bias, a
subtype of selection bias. When a substantial number of subjects are lost to
follow-up, the study may overestimate or underestimate the association between
the exposure and the disease. Investigators try to achieve high rates of follow-up
to reduce the potential for attrition bias.
o Differential loss to follow-up is a subtype of selection bias and represents a threat
to the internal validity of a study.
o Nonrandom treatment assignment may lead to selection bias because study
participants may have an unequal chance of receiving the treatments of interest.
Information Bias
Confounding Variables
o Confounding bias occurs when an extraneous (i.e. confounding) variable masks
the association between an independent variable (e.g. risk factor, treatment) and a
dependent variable (i.e. outcome, disease of interest). It may lead to false
conclusions about the association (i.e. statistically significant associations that are
truly invalid).
o Confounding bias occurs when an extraneous (i.e. confounding) variable masks
the relationship between an independent variable (e.g. risk factor, treatment) and a
dependent variable (i.e. outcome, disease of interest). It may lead to false
conclusions (i.e. statistically significant results that are truly invalid).
o Confounding bias occurs when an extraneous (i.e. confounding) variable masks
the relationship between an independent variable (e.g. risk factor, treatment) and a
dependent variable (i.e. outcome, disease of interest). It may lead to false
conclusions (i.e. statistically significant results that are truly invalid). Failure to
adjust for potential confounding factors may lead to residual confounding and a
distortion of the true relationship between variables.
o Matching is frequently used in case-control studies because it is an efficient
method to control confounding. Remember: matching variables should always be
the potential confounders of the study (e.g., age, race). Cases and controls are then
selected based on the matching variables, such that both groups have a similar
distribution in accordance with the variables.
o A confounder is an extraneous factor that has properties linking it with both the
exposure and the outcome of interest.
o Randomization is used to control for confounders during the design stage of a
study. It helps to control for known, unknown, and difficult-to-measure
confounders.
o Know the concept of confounding. Distinguish between crude and adjusted
measures of association. Confounding refers to the bias that can result when the
exposure-disease relationship is mixed with the effect of extraneous factors (i.e.,
confounders).
Effect Modification
o Effect modification occurs when the magnitude or direction of the effect of the
independent variable on the dependent variable (outcome) varies depending on
the level of a third variable (effect modifier). Separate (stratified) analyses should
be conducted for each level of the effect modifier.
o Effect modification results when an external variable positively or negatively
impacts the effect of a risk factor on the disease of interest. Stratified analysis
helps determine whether a variable is a confounder or an effect modifier.
Reporting bias may occur if subjects over- or under-report exposure history due to
perceived social stigmatization.
Observer bias
o They occur when the investigator’s decision is adversely affected by knowledge
of the exposure status.
o They occur when observers (e.g. researchers, physicians) misclassify data (e.g.
treatment outcomes) due to preconceived expectations regarding the treatment.
Hawthorne effect is the tendency of the study population to affect the outcome since
they are aware that they are being studied.
Response bias occurs when participants purposely give desirable responses to questions
about topics perceived to be sensitive (e.g. health behaviors). This practice results in
responses that are inaccurate and may lead to incorrect conclusions (e.g. lower than
expected prevalence of disease or frequency of risk factors).
Understand the concept of lead-time bias in screening tests.
o The typical example of lead-time bias is prolongation of apparent survival in
patients to whom a test is applied, without changing the prognosis of the disease.
The concept of a latency period can be applied to both disease pathogenesis and
exposure to risk modifiers. Exposure to risk factors and the initial steps in disease
pathogenesis sometimes occur years before clinical manifestations are evident. In
addition, exposure to risk modifiers may need to be continuous over a certain period
before influencing the outcome.
Berkson bias occurs when controls are chosen from among hospitalized patients only,
resulting in a potential bias (e.g. hospitalized controls may have an exposure related to
the outcome of interest) and limiting generalizability.
A. Epidemiological Measures
Abstract
Objectives: This study outlines key epidemiological measures, including risk, rate, and odds
ratio, used to quantify disease frequency and assess associations between exposures and
outcomes. Additionally, concepts like attributable risk and risk reduction are explored to enhance
understanding of clinical trial and observational study results.
Methods: Fundamental principles and formulas for epidemiological metrics are detailed, with
explanations of their applications in research settings. Statistical significance is evaluated
through confidence intervals and adjusted measures to control for confounding variables.
Results:
Conclusions: Epidemiological measures provide essential tools for assessing disease risk,
evaluating interventions, and interpreting study outcomes. Understanding and applying these
measures are critical for evidence-based decision-making in clinical and public health research.
Keywords: epidemiology, risk, relative risk, absolute risk reduction, relative risk reduction,
incidence, prevalence, odds ratio, attributable risk, statistical significance.
Risk is the probability of getting a disease over a certain period of time. To calculate the risk,
divide the number of diseased subjects by the total number of subjects at risk (i.e. all the people
at risk).
A relative risk (RR) < 1.0 indicates a lower risk for the outcome in the treatment group; a RR =
1.0 indicates no association between treatment and outcome; and a RR > 1.0 indicates a greater
risk for the outcome in the treatment group. A confidence interval (CI) that includes the null
value for RR (i.e. RR = 1.0) is not statistically significant, and a CI that excludes the null value
(i.e. RR = 1.0) is statistically significant. Relative risk (RR) is a measure of association between
an exposure and an outcome. Confidence intervals that contain the null value (i.e. RR = 1.0)
indicate that the association is not statistically significant. Adjusted RRs minimize confounding
and provide better estimates of associations.
Absolute risk reduction (ARR) describes the difference in the rate (or risk) of an unfavorable
outcome between control and treatment groups. It is calculated as follows: ARR = (Rate[control]
- Rate [treatment]). The rate of an unfavorable outcome is equal to the number of unfavorable
outcomes in a group divided by the sample size of that group.
Relative risk reduction (RRR) measures how much a treatment reduces the risk of an unfavorable
outcome. RRR may be calculated using the absolute risk reduction (ARR) or the relative risk
(RR): RRR = ARR / Risk[control] = (Risk[control] - Risk[treatment]) / Risk[control]; RRR = 1 -
RR. Relative risk reduction (RRR) measures how much a treatment reduces the risk (i.e. rate) of
an unfavorable outcome. RRR may be calculated using absolute risk reduction (ARR) or relative
risk (RR) as follows: RRR = ARR / Rate[control] = (Rate[control] - Rate[treatment]) /
Rate[control]; RRR = 1 - RR.
When comparing the relative risk (RR) of the same outcome in 2 cohorts: RR = rate of outcome
in cohort 1 / rate of outcome of cohort 2.
B. Patient-Oriented Metrics
A. Statistical Measures
A regression analysis is a statistical technique used to describe the effect of > 1 quantitative or
qualitative independent (i.e. explanatory) variables on 1 quantitative dependent (i.e. outcome)
variable.
Factorial design studies involve randomization to different interventions with additional study of
2 or more variables.
The correlation coefficient (r) describes the direction (i.e. positive, negative) and strength of the
linear relationship between 2 quantitative variables. It does not necessarily imply causality.
The correlation coefficient r describes the direction (negative or positive) and the strength
(values closer to -1 or 1 indicate stronger relationships) of the linear relationship between 2
quantitative variables. It does not necessarily imply causality.
A regression analysis is a statistical technique used to describe the effect of > 1 independent
variables (e.g. exposures, risk factors), which may be quantitative or qualitative, on 1
quantitative dependent variable (i.e. outcome).
More sensitive diagnostic tests have lower false negative rates than less sensitive tests. More
specific diagnostic tests have lower false positive rates than less specific tests.
Unlike sensitivity and specificity, positive and negative predictive values depend on the
prevalence of disease in the population being tested. A change in a test cutoff point that causes
an increase in the number of false positives and true positives will decrease positive predictive
value.
Raising the cut-off point (e.g. increasing the inclusion criteria) of a screening test results in an
increase in specificity and decrease in sensitivity.
Changing the cutoff point of a quantitative diagnostic test will inversely affect its sensitivity and
specificity. Typically, lowering the cutoff value will increase sensitivity (fewer false negatives)
and decrease specificity (more false positives). Screening tests need high sensitivity, and
confirmatory tests need high specificity.
Changing the cutoff value of a test in a way that alters the proportion of true-positive and false-
negative results will change the sensitivity. Likewise, a change in the test that modifies the
proportion of false-positive and true-negative results will change the specificity. Alterations in
test sensitivity and specificity, as well as changes in disease prevalence, will affect the positive
and negative predictive values.
Predictive values change depending on the disease prevalence in a study population. As disease
prevalence increases, the positive predictive value increases and the negative predictive value
decreases, and vice versa.
Predictive values depend on disease prevalence. As disease prevalence increases, the positive
predictive value increases and the negative predictive value decreases.
Positive predictive value is the probability that an individual has a disease given a positive test
result. Negative predictive value is the probability that an individual does not have a disease
given a negative test result.
Negative predictive value (NPV) is the probability that an individual does not have a disease
given a negative test result. It is calculated as NPV = true negative / (true negative + false
negative).
NPV is the probability of being free of a disease if the test result is negative. Remember: the
NPV will vary with the pretest probability of a disease. A patient with a high probability of
having a disease will have a low NPV, and a patient with a low probability of having a disease
will have a high NPV.
Both the positive predictive value (PPV) and negative predictive value (NPV) of a test depend on
the prevalence of the disease of interest in the population in which the test is applied. PPV
increases and NPV decreases with an increase in prevalence.
If a test result is negative, the probability of having the disease is 1 - negative predictive value.
The accuracy of a diagnostic test is the probability that an individual is correctly classified by the
test: Accuracy = ( True positives + True negatives ) / Total number of individuals tested.
In a clinical setting, the pretest probability of disease is equal to the prevalence of disease in the
population of interest; it is used to calculate the pretest odds of disease as follows: Pretest odds
of disease = Pretest probability / (1 - Pretest probability)
E. Advanced Topics
Systematic Reviews
Meta-Analysis Techniques
Causality in Epidemiology
A. Levels of Prevention
Primary Prevention
Secondary Prevention
Tertiary Prevention
B. Health Outcomes
C. Patient Safety
-Atherosclerosis
-Cervical cancer
-Genetic inheritance
-Diabetes mellitus
-Dilated cardiomyopathy