0% found this document useful (0 votes)
59 views7 pages

7 Single Index Models

This document summarizes semiparametric single index models. It discusses Ichimura's estimator for semiparametric single index regression models. Ichimura proposed replacing the unknown link function g with a leave-one-out kernel weighted average of the observed y values. This estimator is consistent for estimating the parameter β. The document also derives the asymptotic distribution of Ichimura's estimator, showing it is asymptotically normal.

Uploaded by

dssd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views7 pages

7 Single Index Models

This document summarizes semiparametric single index models. It discusses Ichimura's estimator for semiparametric single index regression models. Ichimura proposed replacing the unknown link function g with a leave-one-out kernel weighted average of the observed y values. This estimator is consistent for estimating the parameter β. The document also derives the asymptotic distribution of Ichimura's estimator, showing it is asymptotically normal.

Uploaded by

dssd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

8 Semiparametric Single Index Models

8.1 Index Models


A object of interest such as the conditional density f (y j x) or conditional mean E (y j x) is a
single index model when it only depends on the vector x through a single linear combination x0 :
Most parametric models are single index, including Normal regression, Logit, Probit, Tobit,
and Poisson regression.
In a semiparametric single index model, the object of interest depends on x through the function
g (x0 ) where 2 Rk and g : R ! R are unknown. g is sometimes called a link function. In single
index models, there is only one nonparametric dimension. These methods fall in the class of
dimension reduction techniques.
The semiparametric single index regression model is

E (y j x) = g x0 (1)

where g is an unknown link function.


The semiparametric single index binary choice model is

P (y = 1 j x) = E (y j x) = g x0 (2)

where g is an unknown distribution function. We use g (rather than, say, F ) to emphasize the
connection with the regression model.
In both contexts, the function g includes any location and level shift, so the vector Xi cannot
include an intercept. The level of is not identi…ed, so some normalization criterion for is needed.
0
It is typically easier to impose this on than on g. One approach is to set = 1. A second
approach is to set one component of to equal one. (This second approach requires that this
variable correctly has a non-zero coe¢ cient.)
The vector Xi must be dimension 2 or larger. If Xi is one-dimensional, then is simply
normalized to one, and the model is the one-dimensional nonparametric regression E (y j x) = g (x)
with no semiparametric component.
Identi…cation of and g also requires that Xi contains at least one continuously distributed
variable, and that this variable has a non-zero coe¢ cient. If not, Xi0 only takes a discrete set of
values, and it would be impossible to identify a continuous function g on this discrete support.

8.2 Single Index Regression and Ichimura’s Estimator


The semiparametric single index regression model is

yi = g Xi0 + ei
E (ei j Xi ) = 0

68
This model generalizes the linear regression model (which sets g(z) to be linear), and is a
restriction of the nonparametric regression model.
The gain over full nonparametrics is that there is only one nonparametric dimension, so the
curse of dimensionality is avoided.
Suppose g were known. Then you could estimate by (nonlinear) least-squares. The LS
criterion would be
n
X 2
Sn ( ; g) = yi g Xi0 :
i=1

We could think about replacing g with an estimate g^; but since g(z) is the conditional mean of yi
given Xi0 = z; g depends on ; so a two-step estimator is likely to be ine¢ cient.
In his PhD thesis, Ichimura proposed a semiparametric estimator, published later in the Journal
of Econometrics (1993).
Ichimura suggested replacing g with the leave-one-out NW estimator

P (Xj Xi )0
j6=i k yj
0 h
g^ i Xi = :
P (Xj Xi )0
j6=i k
h

The leave-one-out version is used since we are estimating the regression at the i’th observation i:
Since the NW estimator only converges uniformly over compact sets, Ichimura introduces trim-
ming for the sum-of-squared errors. The criterion is then
n
X 2
Sn ( ) = yi g^ i Xi0 1i (b)
i=1

He is not too speci…c about how to pick the trimming function, and it is likely that it is not
important in applications.
The estimator of is then
^ = argmin Sn ( ) :

The criterion is somewhat similar to cross-validation. Indeed, Hardle, Hall, and Ichimura (An-
nals of Statistics, 1993) suggest picking and the bandwidth h jointly by minimization of Sn ( ):
In his paper, Ichimura claims that the g^ i (Xi0 ) could be replaced by any other uniformly
consistent estimator and the consistency of ^ would be maintained, but his asymptotic normality
result would be lost. In particular, his proof rests on the asymptotic orthogonality of the derviative
of g^ 0
i (Xi ) with ei ; which holds since the former is a leave-one-out estimator, and fails if it is a
conventional NW estimator.

8.3 Asymptotic Distriubution of Ichimura’s Estimator


Let 0 denote the true value of :

69
The tricky thing is that g^ 0 ) is not estimating g(Xi0
i (Xi 0 ); rather it is estimating

G Xi0 = E yi j Xi0 = E g Xi0 0 j Xi0

the second equality since yi = g(Xi0 0) + ei :


That is
G (z) = E yi j Xi0 = z

and G (Xi0 ) is then evaluated at Xi0 :


Note that
G Xi0 0 = g Xi0 0

but for other values of ;


G Xi0 6= g Xi0

Hardle, Hall, and Ichimura (1993) show that the LS criterion is asymptotically equivalent to re-
placing g^ 0 ) with G (Xi0 ) ; so
i (Xi

n
X 2
S n ( ) ' Sn ( ) = yi G Xi0 :
i=1

This approximation is essentially the same as Andrews’ MINPIN argument, and relies on the
estimator g^ 0
i (Xi
) being a leave-one-out estimator, so that it is orthogonal with the error ei :
This means that ^ is asymptotically equivalent to the minimizer of Sn ( ) ; a NLLS problem.
As we know from the Econ710, the asymptotic distribution of the NLLS estimator is identical to
least-squares on
@
Xi = G Xi0 :
@
This implies
p
n ^ 0 !d N (0; V )

1 1
V = Q Q
Q = E Xi Xi 0
= E Xi Xi 0 e2i

To complete the derivation, we now …nd this Xi :


As ^ is n 1=2 consistent, we can use a Taylor expansion of g (Xi0 0) to …nd

g Xi0 0 ' g Xi0 + g (1) Xi0 Xi0 ( 0 )

where
d
g (1) (z) = g (z) :
dz

70
Then

G Xi0 = E g Xi0 0 j Xi0


' E g Xi0 + g (1) Xi0 Xi0 ( 0 ) j Xi0
0
= g Xi0 g (1) Xi0 E Xi j Xi0 ( 0)

since g (Xi0 ) and g (1) (Xi0 ) are measureable with respect to Xi0 . Another Taylor expansion
forg (Xi0 ) yields that this is approximately

0
G Xi0 ' g Xi0 0 + g (1) Xi0 Xi E Xi j Xi0 ( 0)
0
' g Xi0 0 + g (1) Xi0 0 Xi E Xi j Xi0 0 ( 0)

the …nal approximation for in a n 1=2 neighborhood of 0: (The error is of smaller stochastic
order.)
We see that
@
Xi = G Xi0 ' g (1) Xi0 0 Xi E Xi j Xi0 0 :
@
Ichimura rigorously establishes this result.
This asymptotic distribution is slightly di¤erent than that which would be obtained if the func-
tion g were known a priori. In this case, the asymptotic design depends on Xi ; not E (Xi j Xi0 0) :

2
Q = E g (1) Xi0 0 Xi Xi0

This is the cost of the semiparametric estimation.


Recall when we described identi…cation that we required the dimension of Xi to be 2 or larger.
Suppose that Xi is one-dimensional. Then Xi E (Xi j Xi0 0) = 0 so Q = 0 and the above theory
is vacuous (as it should be).
The Ichimura estimator achieves the semiparametric e¢ ciency bound for estimation of when
the error is conditionally homoskedastic. Ichimura also considers a weighted least-squares estimator
setting the weight to be the inverse of an estimate of the conditional variance function (as in
Robinson’s FGLS estimator). This weighted LS estimator is then semiparametrically e¢ cient.

8.4 Klein and Spady’s Binary Choice Estimator


Klein and Spady (Econometrica, 1993) proposed an estimator of the semiparametric single index
binary choice model which has strong similarities with Ichimura’s estimator.
The model is

yi = 1 Xi0 ei

where ei is an error.

71
If ei is independent of Xi and has distribution function g; then the data satisfy the single-index
regression
E (y j x) = g x0 :

It follows that Ichimura’s estimator can be directly applied to this model.


Klein and Spady suggest a semiparametric likelihood approach. Given g; the log-likelihood is
n
X
Ln ( ; g) = yi ln g Xi0 + (1 yi ) ln 1 g Xi0 :
i=1

This is analogous to the sum–of-squared errors function Sn ( ; g) for the semiparametric regression
model.
Similarly with Ichimura, Klein and Spady suggest replacing g with the leave-one-out NW esti-
mator
P (Xj Xi )0
j6=i k yj
0 h
g^ i Xi = :
P (Xj Xi )0
j6=i k
h
Making this substitution, and adding trimming function, this leads to the feasible likelihood
criterion
n
X
Ln ( ) = yi ln g^ i Xi0 + (1 yi ) ln 1 g^ i Xi0 1i (b):
i=1

Klein and Spady emphasize that the trimming indicator should not be a function of ; but instead
of a preliminary estimator. They suggest

1i (b) = 1 f^X 0 ~ Xi0 ~ b

where ~ is a preliminary estimator of ; and f^ is an estimate of the density of Xi0 ~ : Klein and
Spady observe that trimming does not seem to matter in their simulations.
The Klein-Spady estimator for is the value ^ which maximizes Ln ( ):
In many respects the Ichimura and Klein-Spady estimators are quite similar.
Unlike Ichimura, Klein-Spady impose the assumption that the kernel k must be fourth-order
(e.g. bias reducing). They also impose that the bandwidth h satisfy the rate n 1=6 <h<n 1=8 ;

which is smaller than the optimal n 1=9 rate for a 4th order kernel. It is unclear to me if these are
merely technical su¢ cient conditions, or if there a substantive di¤erence with the semiparametric
regression case.
Klein and Spady also have no discussion about how to select the bandwidth. Following the
ideas of Hardle, Hall and Ichimura, it seems sensible that it could be selected jointly with by
minimization of Ln ( ); but this is just a conjecture.
They establish the asymptotic distribution for their estimator. Similarly as in Ichimura, letting

72
g denote the distribution of ei ; de…ne the function

G Xi0 = E g Xi0 0 j Xi0 :

Then
p
n ^ 0 !d N 0; H 1

@ @ 0 1
H=E G Xi0 G Xi0
@ @ g (Xi0 0 ) (1 g (Xi0 0 ))

They are not speci…c about the derivative component, but if I understand it correctly it is the same
as in Ichimura, so

@
G Xi0 ' g (1) Xi0 0 Xi E Xi j Xi0 0 :
@
The Klein-Spady estimator achieves the semiparametric e¢ ciency bound for the single-index
binary choice model.
Thus in the context of binary choice, it is preferable to use Klein-Spady over Ichimura. Ichimura’s
LS estimator is ine¢ cient (as the regression model is heteroskedastic), and it is much easier and
cleaner to use the Klein-Spady estimator rather than a two-step weighted LS estimator.

8.5 Average Derivative Estimator


Let the conditional mean be

E (y j x) = (x)

Then the derivative is


(1) @
(x) = (x)
@x
and a weighted average is
(1)
E (X)w(X)

where w(x) is a weight function. It is particularly convenient to set w(x) = f (x); the marginal
density of X: Thus Powell, Stock and Stoker (Econometrica, 1989) de…ne this as the average
derivative
(1)
=E (X)f (X) :

This is a measure of the average e¤ect of X on y: It is a simple vector, and therefore easier to
report than a full nonparametric estimator.
There is a connection with the single index model, where

(x) = g x0

73
for then
(1)
(x) = g (1) (x0 )

=c

where
c = E g (1) (x0 )f (X) :

Since is identi…ed only up to scale, the constant c doesn’t matter. That is, a (normalized)
estimate of is an estimate of normalized :
PSS observe that by integration by parts

(1)
= E (X)f (X)
Z
(1)
= (x)f (x)2 dx
Z
= 2 (x)f (x)f (1) (x)dx

= 2E (X)f (1) (X)

= 2E yf (1) (X)

By the reasoning in CV, an estimator of this is


n
X
^= 2 (1)
yi f^( i) (Xi )
n 1
i=1

(1)
where f^( i) (Xi ) is the leave-one-out density estimator, and f^( i) (Xi ) is its …rst derivative.
This is a convenient estimator. There is no denominator messing with uniform convergence.
There is only a density estimator, no conditional mean needed.
PSS show that ^ is n 1=2 consistent and asy. normal, with a convenient covariance matrix.
The asymptotic bias is a bit complicated.
Let q = dim(X): Set p = ((q + 4)=2 if q is even and p = (q + 3)=2) if q is odd. e.g. p = 2 for
q = 1; p = 3 for q = 2 or q = 3 and p = 4 for q = 4:
PSS require that the kernel for estimation of f be of order at least p: Thus a second-order kernel
for q = 1; a fourth order for q = 2; 3, or 4.
PSS then show that the asymptotic bias is

n1=2 E^ = O n1=2 hp

which is o(1) if the bandwidth is selected so that nh2p ! 0: This is violated (too big) if h is selected
to be optimal for estimation of f^ or f^(1) : This requirement needs the bandwidth to undersmooth to
reduce the bias. This type of result is commonly seen in semiparametric methods. Unfortunately,
it does not lead to a practical rule for bandwidth selection.

74

You might also like