02-2021 - Quant Advanced 2
02-2021 - Quant Advanced 2
02-2021 - Quant Advanced 2
• PCA/PLS Principle
• Component assignment
• Change path / Copy Spectra
• Set test set by wet chem
• Set test set by PCA
• Remove redundant
• Sample statistics
• Explain each plot in graph
• Explain validation report
• Explain regression and loading
• Routine analysis
• Quant2/Filelist
-Bruker Confidential- 2
Principles and properties of factor
analysis
-Bruker Confidential- 3
Factor analysis of spectra
Factor Analysis breaks apart the spectral data into the most
common spectral variations (factors, loadings, principal
components) and the corresponding scaling coefficients
(scores)
p d p
Factors
Scores
Spectral data matrix = d
n n
-Bruker Confidential- 4
Factor analysis:PCA
Without component values
-Bruker Confidential- 5
Inverse Factor analysis: PCA
Reconstruction of spectrum using all factors
7.731
+ -0.699
+ 3.67E-04
+ -1.15E-02
+ -2.04E-02
-Bruker Confidential- 6
Inverse Factor analysis: PCA
Reconstruction of spectrum using all factors
7.731
+ -0.699
• In the software the spectra are not reconstructed. The spectra are
represented just by the few scores values (data compression) which
are used in the modeling calculations.
-Bruker Confidential- 7
Moving from factor analysis to PLS
-Bruker Confidential- 8
PLS Factors for components A and B
PLS factors
A B Comp. A
1
2
3
PLS factors
Comp. B
1
2
3
-Bruker Confidential- 9
Analysis of spectra using PCA or PLS
models are based on scores and
loadings
7.731
+ -0.699
• For the measured spectrum the scores are calculated according to the
factors (loadings) stored in the model.
• The scores are used for the final evaluation in the PCA model
(identification) or PLS model (quantification).
-Bruker Confidential- 10
Component Assignment
Case sensitive!
-Bruker Confidential- 11
Component Assignment
2. Click
arrow
1. Select
parameter
-Bruker Confidential- 12
Component Assignment
3. Add
component
values
Important:
Use average component
values for spectra of the
same sample!
-Bruker Confidential- 13
Change path
-Bruker Confidential- 14
Change path
1
2
-Bruker Confidential- 15
COPY Spectra
-Bruker Confidential- 16
COPY Spectra – Standard
CAUTION: Be careful if spectra has same name but in
different folders!
1 2
-Bruker Confidential- 17
COPY Spectra – AVERAGE
1 2
-Bruker Confidential- 18
COPY Spectra – AVERAGE
Output
-Bruker Confidential- 19
Selection of calibration and test
samples
• Calibration and test set samples should be well distributed over the entire
property range
• As many as possible samples should be used for the test set but important
samples must be in the calibration. In case of big data sets the splitting is
done by having 50-50 or 60-40 in the calibration and test set, respectively
-Bruker Confidential- 20
Distribution of samples
Prediction
„rare sample“ or
outlier
typical
concentration range
The concentration range of the calibration
should extent the expected analysis range
if possible.
Reference value
-Bruker Confidential- 21
Validation set
• Cross validation is used only when there are limited number of samples
(feasibility test or very costly wet chem). Normally, 2-5% are used for leave
out. There is no statistical benefits of full cross validation (leave one out)
over leave 2-5% out.
• Test set is more reliable method but you need to have sufficient samples in
both calibration and test set. The sufficient number of samples in calibration
set ensure proper PLS loading calculation (normally needs 10 samples per
rank/factor). On the other hand, sufficient number of samples in test set
ensure that the validation results (RMSEP, Bias, RPD) are not over
optimistic.
• Cross validation should be used first before test set selection to remove
severe outlier.
-Bruker Confidential- 22
How to select Test set samples
• Remove obvious outliers by (i) look at spectra; (ii) use cross validation; (iii)
check PCA scores
• If there is one or very few kinds of samples in the matrix, use component
values to separate test set. Caution: This cannot be done if component
values of some samples are missing. Those samples must be excluded first.
-Bruker Confidential- 23
Automatic selection of test samples on
component values (Kennard-Stone)
Samples with
lowest and
highest
property values
are in the
calibration set,
the next inner
ones in the test
set
-Bruker Confidential- 24
Automatic selection of test samples on
component values (Kennard-Stone)
Next test
sample is
chosen with the
Next test
maximum
sample
distance from
the already
selected ones in
all dimensions
(properties).
Here it is found
in the middle.
-Bruker Confidential- 25
Automatic selection of test samples on
component values (Kennard-Stone)
10 % Test samples
Next test
sample is
chosen with the
maximum
distance from
the already
selected ones in
all dimensions
(properties)
until the
required
percentage of
test samples is
reached.
-Bruker Confidential- 26
Automatic selection of test samples on
component values (Kennard-Stone)
20 % Test samples
-Bruker Confidential- 27
Automatic selection of test samples on
component values (Kennard-Stone)
50 % Test samples
-Bruker Confidential- 28
Set Test set by component values
3
4
-Bruker Confidential- 29
Set Test set by component values
-Bruker Confidential-
Clear Test Set
-Bruker Confidential- 31
Set Test set by PCA
2
3
-Bruker Confidential- 32
Set Test set by PCA
-Bruker Confidential- 33
Set Test set by PCA
1
2
-Bruker Confidential- 34
Set Test set by PCA
-Bruker Confidential- 35
Remove redundant
• This is needed when you have too big populations such as several thousands
samples and many have repeated information.
• All samples must be “Calibration” samples.
1
2
3
-Bruker Confidential- 36
Remove redundant samples
-Bruker Confidential-
Quant2 OPUS 7: exclude redundant
samples
-Bruker Confidential-
Quant2 OPUS 7: exclude redundant
samples
RMSEP = 0.73
-Bruker Confidential-
Sample Statistics
-Bruker Confidential- 40
Sample Statistics
-Bruker Confidential- 41
Error values for characterizing
calibration performance and validation
-Bruker Confidential- 42
Error values for characterizing
calibration performance and validation
NIR Value
NIR Value
Manager YOU
-Bruker Confidential-
Normal (Gaussian) Distribution
+/- 2σ
• +/- 2 standard deviations
(95.5%)
95.5%
• +/- 3 standard deviations
+/- 3σ (99.7%)
99.7%
-Bruker Confidential-
Ruminant Feed – Fat: Test Set Validation
Error distribution
-Bruker Confidential-
R2 and its meaning: expresses the
relation of error bar and value range
R2 = 66.4%
R2 = 81.4%
R2 = 98.9%
-Bruker Confidential-
R2: Calibration of Fat in Milk
RMSEP
-Bruker Confidential-
Statistics for the model validation
-Bruker Confidential- 50
Mahalanobis Distance threshold
-Bruker Confidential-
Mahalanobis Distance threshold
In OPUS 7 the
threshold is set based
on the calibration set
statistic (99.9% prob)
Almost all calibration
spectra will be below
the threshold. This is
logical because those
samples belong to the
calibration set.
However, if your
calibration set is still
small, setting the MDI
factor of 2 in O/LAB or
ME could be a good
idea
-Bruker Confidential-
Regression coefficients (b-vector)
The regression
coefficients are
showing the
weighting of data
point
(wavenumbers or
wavelength) in
the model.
-Bruker Confidential- 53
PLS loadings (factors)
-Bruker Confidential- 54
Robust model does not mean lowest
RMSEP
-Bruker Confidential- 55
Prediction of independent samples
across instruments
38 38
33 33
Model 2
28 28
RMSECV = 0.99
23 23
SEP = 1.7
18 18
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191
Protein
Model 1
38
RMSECV = 1.0 33
SEP = 1.3 28
Model 3
23
RMSECV = 1.1
18
SEP = 1.7
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191
-Bruker Confidential- 56
Prediction of independent samples
across instruments
38 38
33 33
28 28
23 23
Protein Protein
Model 4 Model 5
RMSECV = 1.1 RMSECV = 1.2
SEP = 1.7 SEP = 2.5
-Bruker Confidential- 57
Routine analysis
Methods must be validated over time
Calibration
Val
Calibration
Set
Val Val Val Val
Set Set Set Set
Test
Calibration
Set
Test Val
Calibration
Set Set
robustness
of model
-Bruker Confidential- 59
Quant 2/Filelist
-Bruker Confidential- 60
Adding true values (reference) for
comparison with predictions
-Bruker Confidential- 61
Copy/paste of true values (reference)
for comparison with predictions
-Bruker Confidential- 62
Predictions overview
-Bruker Confidential- 63
Prediction vs. true value (reference)
with target and regression line (blue)
-Bruker Confidential- 64
Easy comparison of different models
-Bruker Confidential- 65
Difference vs. true value (reference)
with bias line (blue)
-Bruker Confidential- 66
Quant2/Filelist
Marking of outside and outlier
Marking according to
the indication in the
table on page ‘Analysis
Results’:
MD/range OK
MD not OK
(outlier)
out of range
MD and range
not OK
-Bruker Confidential-
Result statistics
-Bruker Confidential- 68
Trouble shooting in case of poor
prediction
-Bruker Confidential- 69
Trouble shooting in case of outliers
• Don’t just throw away the samples after NIR or wet chem analysis.
Sometime, revision of measurement (either NIR or wet chem) will be
required for outliers
• Don’t throw all the red dots when develop calibrations. The red dots only
indicate potential outliers.
-Bruker Confidential- 70
Innovation with Integrity
©Copyright
Copyright Bruker
© 2011 Bruker Corporation.
Corporation. All rights
All rights reserved. reserved.-Bruker Confidential-
www.bruker.com