Flexible Model Selection Criterion For Multiple Regression: Kunio Takezawa
Flexible Model Selection Criterion For Multiple Regression: Kunio Takezawa
Flexible Model Selection Criterion For Multiple Regression: Kunio Takezawa
ABSTRACT
Predictors of a multiple linear regression equation selected by GCV (Generalized Cross Validation) may contain undesirable predictors with no linear functional relationship with the target variable, but are chosen only by accident. This is
because GCV estimates prediction error, but does not control the probability of selecting irrelevant predictors of the
target variable. To take this possibility into account, a new statistics GCVf (f stands for flexible) is suggested. The
rigidness in accepting predictors by GCVf is adjustable; GCVf is a natural generalization of GCV. For example, GCVf is
designed so that the possibility of erroneous identification of linear relationships is 5 percent when all predictors have
no linear relationships with the target variable. Predictors of the multiple linear regression equation by this method are
highly likely to have linear relationships with the target variable.
Keywords: GCV; GCVf; Identification of Functional Relationship; Knowledge Discovery; Multiple Regression;
Significance Level
1. Introduction
There are two categories of methods for selecting predictors of regression equations such as multiple linear
regression. One includes methods using statistical tests
such as the F-test. The other one includes methods of
choosing predictors by optimizing statistics such as GCV
or AIC (Akaikes Information Criterion). The former
methods have a problem in that they examine only a part
of multiple linear regression equations among many
applicants of the predictors (e.g., p. 193 in Myers [1]). In
this point, all possible regression procedures are desirable.
It has spread the use of statistics such as GCV and AIC to
produce multiple linear regression equations.
Studies of statistics such as GCV and AIC aim to construct multiple linear regression equations with a small
prediction error in terms of residual sum of squares or
log-likelihood. In addition, discussion on the practical
use of multiple linear regression equations advances on
the assumption of the existence of a linear relationship
between the predictors adopted in a multiple linear regression equation and the target variables. However, we
should consider the possibility that some predictors used
in a multiple linear regression equation have no linear
relationships with the target variable. If we cannot neglect the probability that some predictors with no linear
relationships with the target variable reduce the prediction error by accident, there is some probability that
one or more predictors with no linear relationships with
OJS
K. TAKEZAWA
402
(Intercept)
28.0049
10.6679
2.625
0.01179
x.1
6.7668
1.6640
4.067
0.00019***
x.2
0.5825
0.3324
1.752
0.08651
x.3
7.3779
1.1860
6.221
1.47e07***
x.4
0.5784
1.854
0.07037
***
0.3121
**
RSS q
q 1
n 1
RSS q yi a0 2 if q 0
i 1
q
n
RSS q yi a0 a j xij
i 1
j 1
if q 1,
(2)
(1)
OJS
K. TAKEZAWA
403
n q 1
n q
1.
(7)
n q 1 .
(8)
n q 1
That is,
F n, q
Figure 2. Frequencies of the number of predictors selected
by GCV.
RSS q
n q 1
(3)
1.
(4)
(5)
n q
1 R2 q
n q 1
~ F1, n q 1 .
(9)
(10)
n q 1
F n,q
q
RSS q 1
GCV q
n
2
GCV q 1
q 1
RSS q 1 1
F n, q
1
GCV q 1 n q 1
R 2 q R 2 q 1
GCV q
F n, q
Hence, we have
F n, q
q 1 n
y n y j
i 1
j 1
,
R2 q
2
n
n
1
y
y
i n j
i 1
j 1
RSS q 1
n q 1
1 .
RSS q
RSS q 1
n q 1
n q
(6)
den 1, n q 1, x dx,
(11)
K. TAKEZAWA
404
n 2q 4.
Hence, we have
Copyright 2012 SciRes.
(12)
RSS q
AIC q AIC q 1 n log
RSS q
RSS q 1
n log
2.
2 n log
n
RSS q 1
(13)
F n, q
2
1 exp .
n
n q 1
(15)
That is,
2
F n, q n q 1 exp 1 .
n
(16)
K. TAKEZAWA
405
4. Introduction of GCVf
In the previous section, we associate GCV and AIC with
the forward and backward selection method using F. This
indicates that GCV is desirable as long as p is nearly
independent of q. However, if p corresponding to a
model selection criterion should be independent of q, we
may well develop a new model selection criterion that
meets the requirement. Then, if p is given, F(n, q, p) is
calculated using
p
F n,q, p
den 1, n q 1, x dx.
(17)
F n, q, p
1.
n
(19)) be CGCVf(q). Then, we have
CGCV f q
GCV f q
RSS q
n2
if q 0
F n, k , p
1 if q 1. (19)
n 2 k 1 n k 1
(20)
in GCVf(q) (Equation
n 2 q F n, k , p
1.
n k 1 n k 1
(21)
(22)
CGCV q
q 1
1
2q 2
1
,
n
RSS q
CGCV f q
n q 1
q 1
1
RSS q
(18)
RSS q
1
2
q2
(23)
n 2 q F n, k , 0.1573
n k 1 1
n k 1
n2 q
2
n k 1 n k 1
(24)
2q 2
2 2q
1 1
.
1
n
n
n
If q = 1, we have
CGCV 1
2
1 .
n
1
1
n
2
CGCV f 1 .
n
(25)
(26)
K. TAKEZAWA
406
p = 0.1
p = 0.06
p = 0.05
p = 0.05
404
446
453
461
66
44
37
31
29
10
10
OJS
K. TAKEZAWA
and the target variable was wrapped. When model selection by GCVf with p = 0.05 was carried out for 500
datasets, we obtained Table 2. {x1, x3, x4} was chosen for
166 datasets. On the other hand, {x1, x2, x3} was selected
for 157 datasets. Therefore, {x1, x3, x4} is not the only
choice as a set of predictors with linear relationships with
the target variable. {x1, x2, x3} is also a possible choice
when we proceed with the discussion on this data.
6. Conclusions
We have assumed that when GCV or AIC yields a
multiple linear regression equation with a small prediction error, there is a linear functional relationship between the predictors employed in the regression equation
and the target variable. Not much attention has been paid
to the probability that one or more selected predictors
actually have no linear functional relationships with the
target variable. However, we should not ignore the possibility that when several predictors with no linear functional relationships with the target variable are contained
in the applicants of the predictors, one or more such predictors are adopted as appropriate predictors in a multiple
linear regression equation. This is because when many
applicants of the predictors have no linear relationships
with the target variable, one or more such predictors will
be selected at a high probability, since p in Figure 4 does
not depend on the number of applicants of the predictors.
Hence, another statistics for model selection based on
an approach different from the use of prediction error is
required for choosing predictors with linear relationships
with the target variable. The new statistics should make
the threshold high for accepting predictors when quite a
few predictors have no linear functional relationship with
the target variable. Although this strategy poses a relatively high risk of rejecting predictors that actually
REFERENCES
[1]
[2]
D. C. Montgomery, E. A. Peck and G. G. Vining, Introduction to Linear Regression Analysis, 3rd Edition,
Wiley, New York, 2001.
[3]
Y. Wang, Smoothing Splines: Methods and Applications, Chapman & Hall/CRC, Boca Raton, 2011.
doi:10.1201/b10954
Frequency
Predictor
Frequency
166
{x2, x3}
14
157
{x1, x3}
87
{x1, x2}
71
407
OJS