Statistics

‫صدقة جارية لي و الهلي‪ ,‬أرجوكم ال تنسونا من‬
‫صالح دعائكم‬
‫‪Please Grace us with your good prayers‬‬
‫يوسف معيوف‬
‫‪Youssef Maayouf‬‬
What is the Null Hypothesis?
• A null hypothesis states that two treatments are equally effective (and is hence
negatively phrased), IN SIMPLE WORDS, it means that the effect of the new
treatment/condition is ZERO”NULL” and it adds nothing to the patient
• A significance test uses the sample data to assess how likely the null hypothesis is
to be correct.
What is the P value?
• The p value is the probability of obtaining a result at least as extreme as the

one that was actually observed, assuming that the null hypothesis is true.
The null hypothesis is rejected if the p-value is
smaller than 5%, T/F?
• True
What are the types of errors that may be
encountered when testing the Null Hypothesis?
• Two types of errors may occur when testing the null hypothesis, Type 1 and
Type2
What is Type 1 Error (one of the two types that
are encountered when testing the Null
Hypothesis)?
• Type I: the null hypothesis is rejected when it is true (you say that there is
significance after the treatment or the added condition while there is actually
not) - i.e. Showing a difference between two groups when it doesn't exist (=
significance level)
What is Type 2 Error (one of the two types that
are encountered when testing the Null
Hypothesis)?
• Type II: the null hypothesis is accepted when it is false (you say there is no
significance after the treatment or the added condition while there is) - i.e. Failing
to spot a difference when one really exists
What is the power of a study?
• The power of a study is the probability of (correctly) rejecting the null hypothesis
when it is false (you confidently can say that there is effect when there is effect)
• power = 1 - the probability of a type II error
• power can be by increasing the sample size

What are the types of Significance tests?
• The type of significance test used depends on whether the data is parametric
(something which can be measured, or normally distributed) or non-parametric
What are the Parametric tests?
• Student's t-test - paired or unpaired
• Pearson's product- moment coefficient - correlation

What are the Non Parametric tests?
• Mann-Whitney - unpaired data

• Wilcoxon matched-pairs - compares two sets of observations on a single sample
• Chi-squared test - used to compare proportions or percentages
• Spearman, Kendall rank – correlation
• McNemar's test is used on nominal data to determine whether the row and
column marginal frequencies are equal
What is Paired and Unpaired data?
• Paired data refers to data obtained from a single group of patients, e.g.
Measurement before and afteran intervention.
• Unpaired data comes from two different groups of patients, e.g. Comparing
response to different interventions in two groups
What is Meta-Analysis?
• In statistics, a meta-analysis refers to methods focused on contrasting and

combining results from different studies, in the hope of identifying patterns
among study results, sources of disagreement among those results, or other
interesting relationships that may come to light in the context of multiple studies
What is Funnel Plot?
• A funnel plot is a useful graph designed to check the existence of publication

bias in systematic reviews and meta-analyses.
• It assumes that the largest studies will be near the average, and small studies will
be spread on both sides of the average.
• Variation from this assumption can indicate publication bias.
• Funnel plots are usually drawn with treatment effects on the horizontal axis and
study size on the vertical axis.
What are the different interpretations of Funnel
Plot?
• A symmetrical, inverted funnel shape indicates that publication bias is unlikely
• Conversely, an asymmetrical funnel indicates a relationship between treatment

effect and study size.
• This indicates either publication bias or a systematic difference between smaller

and larger studies ('small study effects')
What is Central Limit Theorem (CLT)?
• Central Limit Theorem (CLT): the random sampling distribution of mean would
always tend to be normal irrespective of the population distribution for which the
sample were drown.
• The mean of the random sampling distribution of means is equal to the mean of
the original population
What is Confidence Interval?
• Confidence Interval (CI): describes the range of value around a mean, an odds ratio, a P
value or a standard deviation within which the true mean value lies.
• 95% CI 5% chance the true mean value for variable lies outside the range CI = mean ±
2xSE (Standard Error)
• In Normal Distribution, The range of the mean - (1.96 *SD) to the mean + (1.96 * SD) is
called the 95% confidence interval, i.e. if a repeat sample of 100 observations are taken
from the same group 95 of them would be expected to lie in that range
In normal distribution Mean = Median = Mode,
T/F?
• True
What is the standard deviation?
• The standard deviation (SD) represents the average difference each observation
in a sample lies from the sample mean
• SD = square root (variance)

What should you remember about Skewed
Distribution?
• In Normal distributions: mean = median = mode
• Positively skewed distribution: mean > median > mode
• Negatively skewed distribution mean < median < mode
• To remember the above note how they are in alpha order, think positive going
forward with '>', whilst negative going backwards '<'
Standard error of the mean = standard
deviation / square root (number of patients),
T/F?
• True
What The Standard Error of the Mean?
• The Standard Error of the mean (SEM)

• Is a measure of the spread expected for the mean mean of the observations - i.e.
how 'accurate' the calculated sample mean is from the true population mean
• SEM = S / square root (n)

• Where S = standard deviation and n = sample size
• Therefore the SEM gets smaller as the sample size (n) increases
What is the Relative Risk?
• Relative Risk (RR) is the ratio of risk in the experimental group (experimental
event rate, EER) to risk in the control group (control event rate, CER)
• Simply:
• CER = rate at which events occur in the control group
• EER = rate at which events occur in the experimental group

Control event rate = (Number who had
particular outcome with the control) / (Total
number who
had the control), T/F?
• True
Experimental event rate = (Number who had particular
outcome with the intervention) / (Total number who had
the intervention), T/F?
• True
What is Absolute Risk Reduction?
• Absolute risk reduction = (Control event rate) - (Experimental event rate)

How do we calculate Relative Risk Reduction
(RRR)?
• Relative risk reduction (RRR) is calculated by dividing the absolute risk reduction
by the control event rate
• RRR = (CER - EER) / CER

What is the Hazard Ratio?
• The Hazard Ratio (HR) is similar to relative risk but is used when risk is not
constant to time. It is typically used when analysing survival over time
What is the equation for calculating Number
needed to treat (NNT)?
• NNT = 1 / (CER - EER), or 1 / Absolute Risk Reduction

What is Numbers needed to treat (NNT)?
• Numbers needed to treat (NNT) is a measure that indicates how many patients
would require an intervention to decrease the expected number of outcomes by
1.
• It is rounded to the next highest whole number

What is Odds Ratio?
• Odds Ratio may be defined as the ratio of the odds of a particular outcome with
experimental treatment and that of control
• Odds - remember a ratio of the number of people who incur a particular

outcome to the number of people who do not incur the outcome, NOT TO THE
TOTAL NUMBER OF PEOPLE
What is Pre-test probability = Prelivence of a
condition?
• The proportion of people with the target disorder in the population at risk at a
specific time (point prevalence) or time interval (period prevalence)
• For example, the prevalence of rheumatoid arthritis in the UK is 1%

What is Post-test probability?
• The proportion of patients with that particular test result who have the target
disorder
• Post-test probability = post test odds / (1 + post-test odds)

What is Pre-test odds?
• The odds that the patient has the target disorder before the test is carried out
• Pre-test odds = pre-test probability / (1 - pre-test probability)

What is Post-test odds?
• The odds that the patient has the target disorder after the test is carried out
• Post-test odds = pre-test odds x likelihood ratio
• Where the likelihood ratio for a positive test result = sensitivity / (1 - specificity)
What is Incidence and Prevalence?
• These two terms are used to describe the frequency of a condition in a

population.
What is Incidence?
• The incidence is the number of new cases per population in a given time period
• For example, if condition X has caused 40 new cases over the past 12 months per
1,000 of the population the annual incidence is 0.04 or 4%.
What is the Prevalence?
• The prevalence is the total number of cases per population at a particular point in
time
• For example, imagine a questionnaire is sent to 2,500 adults asking them how
much they weigh.
• If from this sample population of 500 of the adults were obese then the
prevalence of obesity would be 0.2 or 20%.
What is the relationship between incidence and
prevalence?
• prevalence = incidence * duration of condition
• In chronic diseases the prevalence is much greater than the incidence
• In acute diseases the prevalence and incidence are similar.
• For conditions such as the common cold the incidence may be greater than the
prevalence
Can you give a story to elaborate true positive,
negative and false positive, negative?
• Imagine a scenario where people are tested for a disease. The test outcome can be positive (sick)
or negative (healthy), while the actual health status of the persons may be different. In that
setting:
• True positive: Sick people correctly diagnosed as sick

• False positive: Healthy people wrongly
• True negative: Healthy people correctly identified as healthy
• False negative: Sick people wrongly identified as healthy
Can you give an equation for sensitivity?
• Sensitivity = Number of True positives /(Number of true positives + Number of

false negatives)
What is a test with 100% sensitivity?
• A sensitivity of 100% means that the test recognizes all sick people as such. Thus
in a high sensitivity test, a negative result is used to rule out the disease.
If sensitivity is used to evaluate the effectiveness
of the test to diagnose positive cases, what is the
parameter used to diagnose negative cases?
• Sensitivity alone does not tell us how well the test predicts other classes (that is,
about the negative cases).
• In the binary classification, as illustrated above, this is the corresponding

specificity test, equivalently, the sensitivity for the other classes.
What is the difference between sensitivity and
positive predictive value?
• Sensitivity is not the same as the positive predictive value (ratio of true positives
to combined true and false positives), which is as much a statement about the
proportion of actual positives in the population being tested as it is about the
test.
When you’re doing a test over a number of samples to
determine the sensitivity of the test, what would you do about
samples who give intermediate results (not positive or
negative)?
• The calculation of sensitivity does not take into account indeterminate test
results.
• If a test cannot be repeated, the options are to exclude indeterminate samples

from analysis (but the number of exclusions should be stated when quoting
sensitivity), or, alternatively, indeterminate samples can be treated as false
negatives (which gives the worst-case value for sensitivity and may therefore
underestimate it).
What is the equation for specificity?
• Specificity = Number of true negatives / (Number of true negatives + Number of

False Positives)
What is the meaning of Specificity of 100%?
• A specificity of 100% means that the test recognizes all healthy people as healthy.
Thus a positive result in a high specificity test is used to confirm the disease.
• The maximum is trivially achieved by a test that claims everybody healthy
regardless of the true condition.
• Therefore, the specificity alone does not tell us how well the test recognizes
positive cases.
• We also need to know the sensitivity of the test to the class, or equivalently, the
specificities to the other classes.
A test with high specificity has a low Type I error
rate, T/F?
• True
What is the difference between specificity and
Percision?
• Specificity is sometimes confused with the precision or the positive predictive

value, both of which refer to the fraction of returned positives that are true
positives.
• The distinction is critical when the classes are different sizes. A test with very high
specificity can have very low precision if there are far more true negatives than
true positives, and vice versa.
Increasing the cut-off of a positive test result
will decrease the number of false positives and
hence increase the specificity, T/F?
• True
What is Sensitivity?
• Sensitivity = TP / (TP + FN)
• how many of the sick patients can the test identify by %

What is Specificity?
• Specificity = TN / (TN + FP)
• how many of the healthy patients can the test identify by %

What is Positive predictive value?
• Positive predictive value = TP / (TP + FP)
• how many of the test +ve samples are actually sick

What is Negative predictive value ?
• Negative predictive value = TN / (TN + FN)
• how many of the test –ve samples are actually healthy

What is the Likelihood ratio for a positive test
result?
• Likelihood ratio for a positive test result = sensitivity / (1- specificity)

What is Likelihood ratio for a negative test
result ?
• Likelihood ratio for a negative test result = (1- sensitivity) /specificity

Positive and negative predictive values are not prevalence
dependent. Likelihood ratios are prevalence dependent, T/F?
• False
• Positive and negative predictive values are prevalence dependent.
• Likelihood ratios are not prevalence dependent

What is The correlation coefficient?
• The correlation coefficient (sometimes referred to as Pearson's product-moment

coefficient) indicates how closely the points lie to a line drawn through the
plotted data.
• It is denoted by the value R which may lie anywhere between -1 and 1.

What is the difference between correlation
coefficient differ from Linear regression?
• Whilst correlation coefficients give information about how one variable may
increase or decrease as another variable increase they do not give information
about how much the variable will change.
• They also do not provide information on cause and effect
• In contrast to the correlation coefficient, linear regression may be used to predict

how much one variable changes when a second variable is changed.
What is Randomized Controlled Trial (RCT)?
• Randomized Controlled Trial (RCT) involves the random allocation of different

interventions (treatments or conditions) to subjects.
• As long as the numbers of subjects are sufficient, randomization is an effective

method for balancing confounding factors between treatment groups
• Randomized treatment prevents systemic diference between treatment groups

What is a Cohort Study ?
• Cohort Study is done for a group of people who share a common characteristic or experience
within a defined period (e.g., are born, leave school, lose their job, are exposed to a drug or a
vaccine, etc.).
• Thus a group of people who were born on a day or in a particular period, say 1948, form a birth
cohort.
• The comparison group may be the general population from which the cohort is drawn, or it may
be another cohort of persons thought to have had little or no exposure to the substance under
investigation, but otherwise similar.
• Alternatively, subgroups within the cohort may be compared with each other
What is a Case-Control study??
• Case-Control is a type of epidemiological study design.
• Case-control studies are used to identify factors that may contribute to a medical
condition by comparing subjects who have that condition (the 'cases') with
patients who do not have the condition but are otherwise similar (the 'controls')
What are Cross-Sectional Studies ?
• Cross-Sectional Studies (also known as Cross-sectional analysis) form a class of

research methods that involve observation of some subset of a population of
items all at the same time, in which, groups can be compared at different ages
with respect of independent variables, such as IQ and memory.
• Cross-sectional studies are used in most branches of science, in the social
sciences and in other fields as well.
• Crosssectional research takes a 'slice' of its target group and bases its overall
finding on the views or behaviours of those targeted, assuming them to be typical
of the whole group.
What is difference between Cross-sectional and
Longitudinal studies?
• The fundamental difference between cross-sectional and longitudinal studies is

that crosssectional
• studies take place at a single point in time and that a longitudinal study involves a
series of
• measurements taken over a period of time. Both are a type of observational
study.
What are Ia, Ib, IIa evidences of a study?
• Ia - evidence from meta-analysis of randomised controlled trials
• Ib - evidence from at least one randomised controlled trial
• IIa - evidence from at least one well designed controlled trial which is not
randomised
What are IIb, III, IV evidences of a study?
• IIb - evidence from at least one well designed experimental trial
• III - evidence from case, correlation and comparative studies
• IV - evidence from a panel of experts

Review the last page of statisitics ( you didn’t
complete it yet)
• It’s about the study design, Superiority needs large number of patients
• Equality , confidence interval between – delta to + Delta
• Non inferiority, confidence interval needs to be in an area not less than – delta
• For drug companies, they aim to non inferiority then compete on price range

Statistics

Uploaded by

Copyright:

Available Formats

Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics

Uploaded by

Copyright:

Available Formats

‫صدقة جارية لي و الهلي‪ ,‬أرجوكم ال تنسونا من‬

• The p value is the probability of obtaining a result at least as extreme as the

• power = 1 - the probability of a type II error

• power can be by increasing the sample size

• Student's t-test - paired or unpaired

• Pearson's product- moment coefficient - correlation

• Mann-Whitney - unpaired data

• In statistics, a meta-analysis refers to methods focused on contrasting and

• A funnel plot is a useful graph designed to check the existence of publication

• Variation from this assumption can indicate publication bias.

• A symmetrical, inverted funnel shape indicates that publication bias is unlikely

• Conversely, an asymmetrical funnel indicates a relationship between treatment

• This indicates either publication bias or a systematic difference between smaller

• SD = square root (variance)

• In Normal distributions: mean = median = mode

• Positively skewed distribution: mean > median > mode

• Negatively skewed distribution mean < median < mode

• The Standard Error of the mean (SEM)

• SEM = S / square root (n)

• EER = rate at which events occur in the experimental group

• Absolute risk reduction = (Control event rate) - (Experimental event rate)

• RRR = (CER - EER) / CER

• NNT = 1 / (CER - EER), or 1 / Absolute Risk Reduction

• It is rounded to the next highest whole number

• Odds - remember a ratio of the number of people who incur a particular

• For example, the prevalence of rheumatoid arthritis in the UK is 1%

• Post-test probability = post test odds / (1 + post-test odds)

• Pre-test odds = pre-test probability / (1 - pre-test probability)

• Post-test odds = pre-test odds x likelihood ratio

• These two terms are used to describe the frequency of a condition in a

• prevalence = incidence * duration of condition

• In chronic diseases the prevalence is much greater than the incidence

• In acute diseases the prevalence and incidence are similar.

• True positive: Sick people correctly diagnosed as sick

• Sensitivity = Number of True positives /(Number of true positives + Number of

• In the binary classification, as illustrated above, this is the corresponding

• If a test cannot be repeated, the options are to exclude indeterminate samples

• Specificity = Number of true negatives / (Number of true negatives + Number of

• Specificity is sometimes confused with the precision or the positive predictive

• Sensitivity = TP / (TP + FN)

• how many of the sick patients can the test identify by %

• Specificity = TN / (TN + FP)

• how many of the healthy patients can the test identify by %

• Positive predictive value = TP / (TP + FP)

• how many of the test +ve samples are actually sick

• Negative predictive value = TN / (TN + FN)

• how many of the test –ve samples are actually healthy

• Likelihood ratio for a positive test result = sensitivity / (1- specificity)

• Likelihood ratio for a negative test result = (1- sensitivity) /specificity

• Positive and negative predictive values are prevalence dependent.

• Likelihood ratios are not prevalence dependent

• The correlation coefficient (sometimes referred to as Pearson's product-moment

• It is denoted by the value R which may lie anywhere between -1 and 1.

• In contrast to the correlation coefficient, linear regression may be used to predict

• Randomized Controlled Trial (RCT) involves the random allocation of different

• As long as the numbers of subjects are sufficient, randomization is an effective

• Randomized treatment prevents systemic diference between treatment groups

• Case-Control is a type of epidemiological study design.

• Cross-Sectional Studies (also known as Cross-sectional analysis) form a class of

• The fundamental difference between cross-sectional and longitudinal studies is