0% found this document useful (0 votes)

37 views26 pages

Lecture 7

This document discusses statistical significance and receiver-operator characteristic (ROC) curves. It defines key terms like p-value, sensitivity, specificity, and area under the ROC curve (AROC). A good diagnostic test is represented by an ROC curve that climbs rapidly towards the upper left corner, indicating low false positive and false negative rates across cut-off values. The AROC quantifies test accuracy, with higher values indicating better discrimination. An AROC of 0.5 represents random chance, while values above 0.7 indicate minimum acceptable accuracy.

Uploaded by

Monrell Tai Baba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views26 pages

Lecture 7

Uploaded by

Monrell Tai Baba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

Evaluation Measurements

1
Statistical significance

• In statistics, a result is significant if it is unlikely to have occurred

by chance, given that a presumed null hypothesis is true.
• More precisely, in traditional frequentist statistical hypothesis testing
, the significance level of a test is the maximum probability of
accidentally rejecting a true null hypothesis (a decision known as a
Type I error) and accept the Alternative hypothesis.
• The significance of a result is also called its p-value; the smaller the
p-value, the more significant the result is said to be.

2
Statistical significance

• For example, one may choose a significance level of, say, 5%, and
calculate a critical value of a statistic (such as the mean) so that the
probability of it exceeding that value, given the truth of the
null hypothesis, would be 5%. If the actual, calculated statistic value
exceeds the critical value, then it is significant "at the 5% level".
Symbolically speaking, the significance level is denoted by α.

3
Statistical significance

• If the significance level is smaller, a value will be less likely to be

more extreme than the critical value. So a result which is "significant
at the 1% level" is more significant than a result which is "significant
at the 5% level". However a test at the 1% level is more likely to
have a Type II error than a test at the 5% level, and so will have less
statistical power. In devising a hypothesis test, the tester will aim to
maximize power for a given significance, but ultimately have to
recognise that the best which can be achieved is likely to be a
balance between significance and power, in other words between
the risks of Type I and Type II errors. It is important to note that
Type I error is not necessarily any worse than a Type II error, and
vice versa. The severity of an error depends on each individual
case.

4
Statistical significance

• If the alternative hypothesis is in fact true, then a sufficiently large

sample size is likely to give a highly significant result, even if the
difference between the null hypothesis and the alternative
hypothesis is very small. The statistical significance of a result is
therefore not an indication of how substantial or important the
difference is.

5
Receiver-Operator-Characteristic
(ROC)
• It is Statistics
A ROC curve shows the relationship of probability of false alarm
(x-axis) to probability of detection (y-axis) for a certain test.
• Expressed in medical terms: the probability of a positive test,
given no disease to the probability of a positive test, given
disease.
• The ROC curve may be used to determine an optimal cutoff point
for the test.

6
Determining significance of
database matches
• When searching a DB, the challenge for
analysis methods is to determine if
matches are related (true-positive, TP) or
unrelated (true-negative, TN)
• At a given scoring threshold, it is likely that
unrelated sequences will be matched
erroneously (false-positives, FP) & some
correct matches will be missed (false-
negative, FN)
• The aim is to improve the resolution
between the curves - in the overlap, it is
difficult or impossible to establish if
matches are significant
• Different methods tackle this problem in
different ways

7
Practical Problem

• For example, doctors have measured the S100 protein in serum and
found that higher values tend to be associated with Creutzfeldt-
Jakob disease. The median value is 395 pg/ml for the 108 patients
with the disease and only 109 pg/ml for the 74 patients without the
disease.
• The doctors set a cut off of 213 pg/ml, even though they realized
that 22.2% of the diseased patients had values below the cut off and
18.9% of the disease-free patients had values above the cut off.

8
Practical Problem

• The two percentages listed above are the false negative and false
positive rates, respectively.
• If we lowered the cut off value, we would decrease the false
negative rate probability, but we would also increase the false
positive rate.
• Similarly, if we raised the cut off, we would decrease the false
positive rate, but we would increase the false negative rate.

9
Practical Problem: Graphical
Representation

N
True negative

True positive

False negative False positive Score

10
ROC

• An ROC curve is a graphical representation of the trade off between

the false negative and false positive rates for every possible cut off.
Equivalently, the ROC curve is the representation of the tradeoffs
between sensitivity (Sn) and specificity (Sp).
• By tradition, the plot shows the FP rate on the X axis and (1 –FN)
rate on the Y axis.
• You could also describe this as a plot with 1-Sp on the X axis and
Sn on the Y axis.

11
Sensitivity (Sn)

• The sensitivity of a binary classification test or algorithm, such as a

blood test to determine if a person has a certain disease, is a
parameter that expresses something about the test's performance.
• The sensitivity of such a test is the proportion of those cases having
a positive test result of all positive cases (e.g., people with the
disease, faulty products) tested.
• A sensitivity of 100% means that all sick people or faulty products
were recognized as such.
• Sensitivity alone does not tell us all about the test, because a 100%
sensitivity can be trivially achieved by labeling all test cases
positive. Therefore, we also need to know the specificity of the test.

# TP
Sn 
# TP  # FN
12
Specificity (Sp)

• In binary testing, e.g. a medical diagnostic test for a certain disease,

specificity is the proportion of true negatives of all the negative
samples tested.
• For a test to determine who has a certain disease, a specificity of
100% means that all people labeled as non-sick are actually non-
sick.
• Specificity alone does not tell us all about the test, because a 100%
specificity can be trivially achieved by labeling all test cases
negative. Therefore, we also need to know the sensitivity of the test.
# TN
Sp 
# TN  # FP

13
Precision

• Also called positive predictive value

• which is as much a statement about the proportion of actual
positives in the population being tested as it is about the test.

# TP
Precision 
# TP  # FP

14
Accuracy

• Correctly indentify TP and TN from all data

# TP  # TN
Accuracy 
# TP  # TN  # FP  # FN

15
How tell a good ROC curve from a
bad one?
• It is the diagnostic test which can be good or bad.
• A good diagnostic test is one that has small FP and FN rates across
a reasonable range of cut off values.
• A bad diagnostic test is one where the only cut offs that make the
FP rate low have a high FN rate and vice versa.
• Good test: ROC curve climbs rapidly towards upper left hand corner
of the graph. This means that (1- FN) rate is high and the FP rate is
low.
• Poor test: ROC curve follows a diagonal path from the lower left
hand corner to the upper right hand corner. This means that every
improvement in FP rate is matched by a corresponding decline in
the FN rate.

16
Area under ROC curve

• You can quantify how quickly the ROC curve rises to the upper left
hand corner by measuring the area under the curve.
• The larger the area, the better the diagnostic test.
– If the area is 1.0, you have an ideal test, because it achieves
both 100% sensitivity and 100% specificity.
– If the area is 0.5, then you have a test which has effectively 50%
sensitivity and 50% specificity. This is a test that is no better
than flipping a coin.
• Area under the curve does have one direct interpretation. If you take
a random healthy patient and get a score of X and a random
diseased patient and get a score of Y, then the area under the curve
is an estimate of P[Y>X] (assuming that large values of the test are
indicative of disease).

17
Area under ROC curve (AROC)

• As a rough guide, the area under ROC (AROC) value 1.0

represents a perfect prediction, values 0.9 to 1.0 represent excellent
accuracy, 0.8 to 0.9 represent good accuracy, 0.7 to 0.8 represent
minimum acceptable accuracy, while 0.5 to 0.7 represent
predictions that indicate random choice (Swets 1988)

18
Example of an ROC curve

• Consider a diagnostic test that

can only take on five values,
A, B, C, D, and E. Test Value Diseased Healthy
• We classify diseased (D+) and
healthy (D-) patients by this
A 2 28
test and get the following B 4 14
results:
C 10 5
D 14 2
E 20 1
Total 50 50

19
• convert this table to cumulative
percentages. Test
• A row (*) to represent the Value Diseased Healthy
cumulative percentage of 0%
which will end up corresponding to * 0 0
a diagnostic test where all the
results are considered positive
A 4 56
regardless of the test value.
B 12 84
• The last row represents the other
extreme, where all the results are C 32 94
considered negative regardless of
the test value. D 60 98

E 100 100

20
• The complementary Test Value TP FP
percentages shown above
represent the TP rate (or Sn)Always
and the FP rate (or 1-Sp). Postive 100 100

A is negative
B-E is positive 96 44

A-B is negative
C-E is positive 88 16

A-C is negative
D-E is positive 68 6
A-D is negative
E is positive 40 2
Never positive 0 0
21
22
• Here is information about Area Under the Curve. This area (0.91) is
quite good; it is close to the ideal value of 1.0 and much larger than
worst case value of 0.5.

• Next is an artificial ROC curve with an area equal to 0.5.

– Notice that each gain in sensitivity is balanced by the exact
same loss in specificity and vice versa. Also notice that this
curve includes the point corresponding to 50% for both
sensitivity and specificity. You could achieve this level of
diagnostic ability by flipping a coin. Clearly, this curve represents
a worst case scenario.

23
24
What's a good value for the area
under the curve?
• Tricky and it depends a lot on the context of your individual problem.

• So here is one interpretation of the areas:

0.50 to 0.75 = fair
0.75 to 0.92 = good
0.92 to 0.97 = very good
0.97 to 1.00 = excellent.

• These are very rough guidelines

25
Reference

• The magnificent ROC (Receiver Operating Characteristic curve)

. Jo van Schalkwyk. Accessed on 2003-09-08.
www.anaesthetist.com/mnm/stats/roc/

BioStats and Epidemiology BNB Notes
No ratings yet
BioStats and Epidemiology BNB Notes
39 pages
Assignment - Case Study
0% (2)
Assignment - Case Study
3 pages
The Receiver Operating Characteristic ROC Curve
No ratings yet
The Receiver Operating Characteristic ROC Curve
3 pages
Diagnostic Test: Magdalena Sidhartani
No ratings yet
Diagnostic Test: Magdalena Sidhartani
17 pages
Lecture Construct Validity
No ratings yet
Lecture Construct Validity
12 pages
ROC Curve
No ratings yet
ROC Curve
20 pages
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
No ratings yet
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
8 pages
Habibzadeh 2016
No ratings yet
Habibzadeh 2016
11 pages
Flach Roc Analysis
No ratings yet
Flach Roc Analysis
12 pages
ROC
No ratings yet
ROC
5 pages
Receiver Operating Characteristic
No ratings yet
Receiver Operating Characteristic
19 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
16 pages
Journal of Statistical Software: Plotroc: A Tool For Plotting Roc Curves
No ratings yet
Journal of Statistical Software: Plotroc: A Tool For Plotting Roc Curves
19 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
AUC ROC curve
No ratings yet
AUC ROC curve
5 pages
Curvas ROC
No ratings yet
Curvas ROC
2 pages
Roc Curve in Python
No ratings yet
Roc Curve in Python
58 pages
lecture11evaluationmetricsforclassification-240913060639-0c766554
No ratings yet
lecture11evaluationmetricsforclassification-240913060639-0c766554
28 pages
accuracy of test
No ratings yet
accuracy of test
26 pages
Tests For Two ROC Curves: PASS Sample Size Software
No ratings yet
Tests For Two ROC Curves: PASS Sample Size Software
16 pages
Receiver Operating Characteristic (ROC) Curve For Medical Researchers
No ratings yet
Receiver Operating Characteristic (ROC) Curve For Medical Researchers
11 pages
Diagnostic and Screening Tests Lecture Notes
No ratings yet
Diagnostic and Screening Tests Lecture Notes
43 pages
83_ROCCurves
No ratings yet
83_ROCCurves
9 pages
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis With Practical SAS Implementations
No ratings yet
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis With Practical SAS Implementations
9 pages
Jurnal Sensitivitas & Spesifisitas
No ratings yet
Jurnal Sensitivitas & Spesifisitas
9 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
Comparing Two ROC Curves-Independent Groups Design
No ratings yet
Comparing Two ROC Curves-Independent Groups Design
21 pages
Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
Reciever Operating Characeteristics
No ratings yet
Reciever Operating Characeteristics
2 pages
Kohl PerformanceMeasures2012
No ratings yet
Kohl PerformanceMeasures2012
4 pages
Confidence_Intervals_for_the_Area_Under_an_ROC_Curve
No ratings yet
Confidence_Intervals_for_the_Area_Under_an_ROC_Curve
7 pages
ROC Covariado
No ratings yet
ROC Covariado
20 pages
Statistics in Medicine - 2005 - Walter - The Partial Area Under the Summary ROC Curve
No ratings yet
Statistics in Medicine - 2005 - Walter - The Partial Area Under the Summary ROC Curve
16 pages
Statistical Foundations: SOST70151 - LECTURE 7
No ratings yet
Statistical Foundations: SOST70151 - LECTURE 7
46 pages
Biostat Usmle Step 1
No ratings yet
Biostat Usmle Step 1
34 pages
COGM 001 - Lec9 - 2018
No ratings yet
COGM 001 - Lec9 - 2018
34 pages
Likelihood Ratio PDF
No ratings yet
Likelihood Ratio PDF
5 pages
Critical Appraisal UFH EM IFS
No ratings yet
Critical Appraisal UFH EM IFS
69 pages
The Optimization Problem
No ratings yet
The Optimization Problem
45 pages
ROC Curve: Table 1: Sensitivity and (1
No ratings yet
ROC Curve: Table 1: Sensitivity and (1
9 pages
Session 2
No ratings yet
Session 2
43 pages
Holte Slides (1)
No ratings yet
Holte Slides (1)
47 pages
Module2 DS Ppt
No ratings yet
Module2 DS Ppt
46 pages
IS2021 01 21 HypothesisTesting
No ratings yet
IS2021 01 21 HypothesisTesting
42 pages
AUC and Concordance
No ratings yet
AUC and Concordance
16 pages
4.9 Estimating the Performance of a Classifier II
No ratings yet
4.9 Estimating the Performance of a Classifier II
16 pages
AI Performance Evaluation - Annotated
No ratings yet
AI Performance Evaluation - Annotated
52 pages
hyp
No ratings yet
hyp
19 pages
The ROC Curve
No ratings yet
The ROC Curve
5 pages
Lecture 2.3
No ratings yet
Lecture 2.3
9 pages
BM28 1 010101
No ratings yet
BM28 1 010101
10 pages
ML CH 5
No ratings yet
ML CH 5
5 pages
Unit IV- Analytics Tasks (Students)
No ratings yet
Unit IV- Analytics Tasks (Students)
127 pages
Unit 3 Computational Statistics
No ratings yet
Unit 3 Computational Statistics
5 pages
Inferential Statistics
No ratings yet
Inferential Statistics
35 pages
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
No ratings yet
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
5 pages
1603 - EvaluatingDiagnosis - PDF Version 1
No ratings yet
1603 - EvaluatingDiagnosis - PDF Version 1
5 pages
2 - BIOSTATISTICS
No ratings yet
2 - BIOSTATISTICS
7 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Classification Metrics.pptx
No ratings yet
Classification Metrics.pptx
39 pages
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet
Yang 2018
No ratings yet
Yang 2018
6 pages
Test Paper 1 2023
No ratings yet
Test Paper 1 2023
3 pages
Eda Research
No ratings yet
Eda Research
11 pages
Regression1 Framework
No ratings yet
Regression1 Framework
52 pages
TB_Units6and7review_67f7c6d591d6c6.67f7c6d8c09ba3.44597054 (1)
No ratings yet
TB_Units6and7review_67f7c6d591d6c6.67f7c6d8c09ba3.44597054 (1)
17 pages
ANSWERS PSRM 2023 Semester Test 3 Information and Additional Exercises_024345
No ratings yet
ANSWERS PSRM 2023 Semester Test 3 Information and Additional Exercises_024345
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
21 pages
A Study of Cross-Validation and Bootstrap
No ratings yet
A Study of Cross-Validation and Bootstrap
7 pages
Z Score Table
No ratings yet
Z Score Table
4 pages
Introduction To Applied Research
No ratings yet
Introduction To Applied Research
5 pages
ST 3009 Lecture 4 - 1
No ratings yet
ST 3009 Lecture 4 - 1
11 pages
Bowerman Regression CHPT 1
100% (2)
Bowerman Regression CHPT 1
18 pages
Quiz 3 Practice PDF
100% (1)
Quiz 3 Practice PDF
4 pages
To Pool or Not To Pool: Homogeneous Versus Heterogeneous Estimators Applied To Cigarette Demand
No ratings yet
To Pool or Not To Pool: Homogeneous Versus Heterogeneous Estimators Applied To Cigarette Demand
10 pages
Chi Square
No ratings yet
Chi Square
34 pages
Ancova 2
No ratings yet
Ancova 2
8 pages
SDPDmod
No ratings yet
SDPDmod
22 pages
(3, 4) Sampling Methods, Descriptive Statistics, & Data Collection CE
No ratings yet
(3, 4) Sampling Methods, Descriptive Statistics, & Data Collection CE
49 pages
Get (eBook PDF) Statistics for the Behavioral Sciences 5th Edition PDF ebook with Full Chapters Now
100% (1)
Get (eBook PDF) Statistics for the Behavioral Sciences 5th Edition PDF ebook with Full Chapters Now
50 pages
Heteroskedasticity
No ratings yet
Heteroskedasticity
2 pages
Lecture 2 - Inferential Statistics
No ratings yet
Lecture 2 - Inferential Statistics
75 pages
2023 Stat441 Lec2
No ratings yet
2023 Stat441 Lec2
13 pages
Caro 2013
No ratings yet
Caro 2013
9 pages
Statistics Note Card Front and Back
No ratings yet
Statistics Note Card Front and Back
2 pages
Jurnal Nur Aeni Salsabila 119020059 Word
No ratings yet
Jurnal Nur Aeni Salsabila 119020059 Word
11 pages
Session 1 Forecasting: Advanced Management Accounting
100% (1)
Session 1 Forecasting: Advanced Management Accounting
40 pages
Model Selection and Model Averaging
No ratings yet
Model Selection and Model Averaging
16 pages
Chi Square
100% (3)
Chi Square
2 pages
Analisis Berganda SPSS Dan Responden
No ratings yet
Analisis Berganda SPSS Dan Responden
3 pages