Orthogonal Signal Correction of Near-Infrared Spectra: Svante Wold, Henrik Antti, Fredrik Lindgren, Jerker Ohman
Orthogonal Signal Correction of Near-Infrared Spectra: Svante Wold, Henrik Antti, Fredrik Lindgren, Jerker Ohman
Orthogonal Signal Correction of Near-Infrared Spectra: Svante Wold, Henrik Antti, Fredrik Lindgren, Jerker Ohman
175185
a,)
Research Group for Chemometrics, Department of Organic Chemistry, Umea Uniersity, S-901 87 Umea,
Sweden
b
Astra-Draco, Box 34, S-221 00 Lund, Sweden
c
Umetri, Box 7960, S-907 19 Umea,
Sweden
Received 30 October 1997; revised 21 April 1998; accepted 12 June 1998
Abstract
Near-infrared NIR. spectra are often pre-processed in order to remove systematic noise such as base-line variation and
multiplicative scatter effects. This is done by differentiating the spectra to first or second derivatives, by multiplicative signal
correction MSC., or by similar mathematical filtering methods. This pre-processing may, however, also remove information
from the spectra regarding Y the measured response variable in multivariate calibration applications.. We here show how a
variant of PLS can be used to achieve a signal correction that is as close to orthogonal as possible to a given Y-vector or
Y-matrix. Thus, one ensures that the signal correction removes as little information as possible regarding Y. In the case when
the number of X-variables K. exceeds the number of observations N., strict orthogonality is obtained. The approach is called
orthogonal signal correction OSC. and is here applied to four different data sets of multivariate calibration. The results are
compared with those of traditional signal correction as well as with those of no pre-processing, and OSC is shown to give
substantial improvements. Prediction sets of new data, not used in the model development, are used for the comparisons.
q 1998 Elsevier Science B.V. All rights reserved.
Keywords: Orthogonal signal correction; Near-infrared spectra; Multiplicative signal correction
1. Introduction
Near-infrared NIR. spectroscopy is being increasingly used for the characterisation of solid,
semi-solid, fluid and vapour samples. Frequently the
objective with this characterisation is to determine the
value of one or several concentrations in the samples. Multivariate calibration is then used to develop
a quantitative relation between the digitised spectra
the matrix X. and the concentrations in the matrix
Y., as reviewed by Martens and Naes w1x. NIR spectroscopy is also increasingly used to infer other prop)
0169-7439r98r$ - see front matter q 1998 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 9 8 . 0 0 1 0 9 - 9
176
as transposed vectors, e.g., vX , and hence transponation is indicated by a prime X . The index i is used as
sample index rows in X and Y; i s 1, 2, . . . , N. and
the index k as index of X-variables k s 1, 2, . . . , K..
3. Filtering and calibration
Before multivariate calibration, the unanimous
agreement was that ideal spectra looked like well resolved NMR or IR spectra, i.e., mainly a straight
baseline plus some narrow and symmetrical peaks
unambiguously raising above this baseline. Noise introduced wiggles, but could be removed by a judicious filtering of the spectra. Much of the objectives of filtering are still formulated accordingly, as a
way to make signals and spectra smooth and pleasing for the eye, and thus easy to interpret.
With multivariate analysis, however, we evaluate
our spectra by means of mathematical methods buried
in computers. There is not much evidence that
smooth and eye-pleasing signals contain more information for computerised mathematical approaches
than do rough un-filtered spectra.
In essence, in order to construct an efficient filtering approach, we need to quantitatively formulate
criteria for what the filtering is supposed to achieve.
In multivariate calibration, we can quantitatively
specify at least one objective of filtering, namely that
the filtering should NOT remove information about Y
from the spectra X.. Here Y is what we calibrate
against, i.e., analyte concentrations or other sample
properties. This non-removing can be stringently
formulated mathematically, namely that the information in Y should be un-related, orthogonal, to what is
removed from X by the filtering.
To achieve such orthogonality, however, we must
express our filters in such a way that their orthogonality to a vector or matrix, Y, can be quantified. This
seems easiest accomplished by expressing the filtering as removing a bilinear structure from X, i.e., a
product of a score matrix, T, times a loading matrix, PX . Orthogonality to Y then means that both TX Y
and Y X T are matrices with only zero-valued elements.
As discussed by Sun w5x, we can formulate a class
of filters as PCA-like multivariate projections. Basing the filter on unmodified PCA has the advantage
that a PC model describes as much as possible of X,
177
178
t) s 1 y Y Y Y .
y1
Y .t
t s Y X y Y X Y Y X Y.
Y X t) s Y X 1 y Y Y X Y .
YX .
y1
YX .t s 0
y1
YX .x k
179
180
7. Scaling
The results of any projection, including OSC, are
influenced by the scaling of the original data in X. In
NIR applications one normally either uses un-scaled
data, data scaled to unit variance auto-scaling., or
something in between these two, e.g., so called Pareto
scaling w10x.
A problem with scaling of the original data occurs
when much of the variation in the X-data is due to
light scattering and other phenomena which will be
removed by the OSC-filtration. Then, the auto-scaling or Pareto scaling will be based on major variation in X that is irrelevant to the actual calibration
model.
Hence the use of un-scaled X-data would seem to
be more appropriate for OSC-filtering. To circumvent this difficulty, one can run the OSC-algorithm on
the original scaled or un-scaled. data and then use
the filtered X-matrix to calculate a new scaling of the
original data. The OSC-algorithm is run again on the
re-scaled X-data.
8. Data sets
In this study four different data sets were used for
comparison of filtering signal correction. methods.
Three of the data sets are NIR data collected on cellulose derivatives in order to predict the measured
viscosity. The fourth data set is NIR data on pulp
samples from the pulp and paper industry on which
17 physical properties have been measured.
181
Table 2
RMSEP-values for the test set for each of the calculated models
Ground
XFM
Sheets
Pulp
Raw
MSC
OSC
199.76
616.56
138.55
85.51
195.13
478.99
130.54
96.48
183.16
428.48
78.43
78.74
Ground
XFM
Sheets
Pulp
Raw
MSC
OSC
0.72
0.614
0.536
0.54
0.623
0.618
0.604
0.54
0.94
0.839
0.863
0.57
Ground
XFM
Sheets
Pulp
Raw
MSC
OSC
5
3
4
4
3
6
6
4
3
2
1
2
182
183
Fig. 3. Observed vs. predicted viscosity values for the PLS models calculated from the ground cellulose data set. Calibration set filled circles., test set open circles.. a. Results from a model based on raw data. b. Results from a model based on MSC filtered data. c. Results
from a model based on OSC filtered data.
10. Conclusions
Since evidently projection methods such as PLS
are affected by strong systematic variation in the predictor matrix X. which is unrelated to the response
matrix, Y, there is a need for removing such variation from X before further modelling. We have here
presented an approach where signal correction filtering. is made in such a way that the removed parts are
linearly un-related orthogonal. to the response matrix, Y. OSC seems to have additional advantages
beyond improved predictability of the PLS model,
such as substantially simpler fewer components.
calibration models, which facilitates the interpretation of the models.
Computationally, we have based OSC on PLS and
the NIPALS algorithm, calculating one OSC component at a time. This choice was made because for us
this is the easiest way to implement the orthogonality
constraint, and also to make the method applicable to
incomplete data matrices.
To apply OSC to the filtering of a signal matrix,
one needs of course, also a response vector or matrix, Y. This is always present in multivariate calibration applications, but in other cases of filtering it may
not be available. In signal analysis, for instance, one
often wants to filter time series from unwanted noise.
Similarly, spectra used for the characterization of
materials or products such as pharmaceutical tablets,
may look noisy and a filtering would be warranted.
184
When one scrutinizes the objective of the characterization, however, it is often possible to construct a
fuzzy or soft response matrix, Y. In time series
analysis, for instance, we may want to use the signals to look for trendslinear or quadratic, or maybe
exponentialand then we can construct a Y-matrix
accordingly with columns varying linearly, quadratically, and exponentially with time. Analogously, if a
spectral matrix from a material characterization is
used for classificatione.g., bad materials vs. acceptable onesone may be able to construct a Ymatrix corresponding to this classification.
Other application areas where OSC may be found
useful include 3D-QSAR, where the structures of a
set of molecules are translated to a set of structure
descriptor vectors by means of, e.g., CoMFA w12x or
GRID w13x. The resulting X-matrix which often has
thousands of columns but just a few rows, say 15 to
50, is then related by PLS to a matrix Y with measured biological activity values. Since there often is
huge parts of X that is unrelated to Y, OSC may help
to clean up the data before the analysis, and hence
improve the predictivity and interpretability of the
solution.
When the number of X-variables, K, is larger than
N the number of training samples., it is always possible to find an exactly orthogonal OSC solution,
while if K - N this is not always possible. This also
means that for K ) N, there are infinitely many OSC
solutions, where the present algorithm is set up to find
the one that models as much of X as possible in each
component. This solution may not always be the best,
however, and additional constraints on the OSC w
vectors may be warranted. This question and other
possible OSC modifications will hopefully be investigated in this laboratory in the near future, together
with the performance of OSC in other applications
than multivariate calibration based on NIR spectroscopy.
Acknowledgements
Financial support to SW and HA by the Swedish
Natural Science Research Council NFR. and the
Swedish Foundation for Strategic Research is gratefully acknowledged. We are most grateful for permission to use data from our collaborations with Akzo
y1
YX .t
References
w1x H. Martens, T. Naes, Multivariate Calibration, Wiley, New
York, 1989.
w2x A. Savitzky, M.J.E. Golay, Anal. Chem. 65 1993. 3279
3289.
w3x P. Geladi, D. MacDougall, H. Martens, Linearization and
scatter-correction for near-infrared reflectance spectra of
meat, Appl. Spectrosc. 3 1985. 491500.
w4x P.C. Williams, K. Norris, Near-Infrared Technology in Agricultural and Food Industries, American Cereal Association,
St. Paul, MN, 1987.
w5x J. Sun, Statistical analysis of NIR data: data pretreatment, J.
Chemometr. 11 1997. 525532.
w6x M. Baroni, S. Clementi, G. Cruciani, G. Constantino, D. Riganelli, Predictive ability of regression models: Part 2. Selec-
w7x
w8x
w9x
w10x
w11x
w12x
w13x
185
1988. 211228.
O.M. Kvalheim, T.V. Karstang, Interpretation of latent-variable regression models, Chemometrics and Intelligent Laboratory Systems 7 1989. 3951.
O.H.J. Christie, Data laundering by target rotation in chemistry-based oil exploration, J. Chemometr. 10 1996. 453
461.
S. Wold, PLS for multivariate linear modelling, in: H. van de
Waterbeemd Ed.., QSAR: Chemometric Methods in Molecular Design, Methods and Principles in Medicinal Chemistry,
Vol 2, Verlag Chemie, Weinheim, Germany, 1995.
R.J. Barnes, M.S. Dhanoa, S.J. Lister, Standard normal variate transformation and de-trending of near-infrared diffuse
reflectance spectra, Appl. Spectrosc. 43 1989. 772777.
R.D. Cramer III, D.E. Patterson, J.D. Bruce, Comparative
molecular field analysis CoMFA.: I. Effect of shape on
binding of steroids to carrier proteins, J. Am. Chem. Soc. 110
1988. 59595967.
P.J. Goodford, A computational procedure for determining
energetically favourable binding sites on biologically important macromolecules, J. Med. Chem. 28 1985. 849857.