0% found this document useful (0 votes)

130 views27 pages

GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen

This document introduces Generalized Additive Models (GAM) and argues they are a powerful yet underutilized technique for predictive modeling. Key points: - GAMs are interpretable like GLMs but more flexible by allowing nonlinear relationships without specifying them in advance. - GAMs automatically derive predictor functions during estimation, avoiding tedious transformations required by parametric models. - Regularization of predictor functions in GAMs helps prevent overfitting, balancing bias and variance like l2 regularization. An example demonstrates how the smoothing parameter in GAMs controls this tradeoff. In summary, the document advocates for wider use of GAMs in predictive modeling tasks.

Uploaded by

Nick Corona

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views27 pages

GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen

Uploaded by

Nick Corona

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

GAM: The Predictive Modeling Silver Bullet

Author: Kim Larsen

Introduction

Imagine that you step into a room of data scientists; the dress code is casual
and the scent of strong coffee is hanging in the air. You ask the data scientists if
they regularly use generalized additive models (GAM) to do their work. Very
few will say yes, if any at all.
Now let’s replay the scenario, only this time we replace GAM with, say, random
forest or support vector machines (SVM). Everyone will say yes, and you might
even spark a passionate debate.
Despite its lack of popularity in the data science community, GAM is a powerful
and yet simple technique. Hence, the purpose of this post is to convince more
data scientists to use GAM. Of course, GAM is no silver bullet, but it is a
technique you should add to your arsenal. Here are three key reasons:

• Easy to interpret.
• Flexible predictor functions can uncover hidden patterns in the data.
• Regularization of predictor functions helps avoid overfitting.

In general, GAM has the interpretability advantages of GLMs where the contribu-
tion of each independent variable to the prediction is clearly encoded. However, it
has substantially more flexibility because the relationships between independent
and dependent variable are not assumed to be linear. In fact, we don’t have to
know a priori what type of predictive functions we will eventually need. From an
estimation standpoint, the use of regularized, nonparametric functions avoids the
pitfalls of dealing with higher order polynomial terms in linear models. From an
accuracy standpoint, GAMs are competitive with popular learning techniques.
In this post, we will lay out the principles of GAM and show how to quickly get
up and running in R. We have also put together an PDF that gets into more
detail around smoothing, model selection and estimation.

What is GAM?

Generalized additive models were originally invented by Trevor Hastie and Robert
Tibshirani in 1986 (see [1], [2]). The GAM framework is based on an appealing
and simple mental model:

• Relationships between the individual predictors and the dependent variable

follow smooth patterns that can be linear or nonlinear.

1
• We can estimate these smooth relationships simultaneously and then predict
g(E(Y ))) by simply adding them up.

Mathematically speaking, GAM is an additive modeling technique where the

impact of the predictive variables is captured through smooth functions which—
depending on the underlying patterns in the data—can be nonlinear:

s1(x1) s2(x2) sp(xp)

g(E(Y)) = + +…+

x1 x2 xp

We can write the GAM structure as:

g(E(Y )) = α + s1 (x1 ) + · · · + sp (xp ),

where Y is the dependent variable (i.e., what we are trying to predict), E(Y )
denotes the expected value, and g(Y ) denotes the link function that links the
expected value to the predictor variables x1 , . . . , xp .
The terms s1 (x1 ), . . . , sp (xp ) denote smooth, nonparametric functions. Note
that, in the context of regression models, the terminology nonparametric means
that the shape of predictor functions are fully determined by the data as opposed
to parametric functions that are defined by a typically small set of parameters.
This can allow for more flexible estimation of the underlying predictive patterns
without knowing upfront what these patterns look like. For more details on
how to create these smooth functions, see the section called “Splines 101” in the
PDF.
Note that GAMs can also contain parametric terms as well as two-dimensional
smoothers. Moreover, like generalized linear models (GLM), GAM supports
multiple link functions. For example, when Y is binary, we would use the logit
link given by

P (Y = 1)
g(E(Y )) = log .
P (Y = 0)

Why Use GAM?

As mentioned in the intro, there are at least three good reasons why you want
to use GAM: interpretability, flexibility/automation, and regularization. Hence,
when your model contains nonlinear effects, GAM provides a regularized and
interpretable solution – while other methods generally lack at least one of

2
these three features. In other words, GAMs strike a nice balance between the
interpretable, yet biased, linear model, and the extremely flexible, “black box”
learning algorithms.

Interpretability

When a regression model is additive, the interpretation of the marginal impact

of a single variable (the partial derivative) does not depend on the values of the
other variables in the model. Hence, by simply looking at the output of the model,
we can make simple statements about the effects of the predictive variables that
make sense to a nontechnical person. For example, for the graphic illustration
above, we can say that the (transformed) expected value of Y increases linearly
as x2 increases, holding everything else constant. Or, the (transformed) expected
value of Y increases with xp until xp hits a certain point, etc.
In addition, an important feature of GAM is the ability to control the smoothness
of the predictor functions. With GAMs, you can avoid wiggly, nonsensical
predictor functions by simply adjusting the level of smoothness. In other words,
we can impose the prior belief that predictive relationships are inherently smooth
in nature, even though the dataset at hand may suggest a more noisy relationship.
This plays an important role in model interpretation as well as in the believability
of the results.

Flexibility and Automation

GAM can capture common nonlinear patterns that a classic linear model would
miss. These patterns range from “hockey sticks” – which occur when you observe
a sharp change in the response variable – to various types of “mountain shaped”
curves:

g(E(Y))

When fitting parametric regression models, these types of nonlinear effects are
typically captured through binning or polynomials. This leads to clumsy model
formulations with many correlated terms and counterintuitive results. Moreover,
selecting the best model involves constructing a multitude of transformations,
followed by a search algorithm to select the best option for each predictor – a
potentially greedy step that can easily go awry.
We don’t have this problem with GAM. Predictor functions are automatically
derived during model estimation. We don’t have to know up front what type of

3
functions we will need. This will not only save us time, but will also help us find
patterns we may have missed with a parametric model.
Obviously, it is entirely possible that we can find parametric functions that look
like the relationships extracted by GAM. But the work to get there is tedious,
and we do not have 20/20 hindsight prior to model estimation.

Regularization

As mentioned above, the GAM framework allows us to control smoothness of the

predictor functions to prevent overfitting. By controlling the wiggliness of the
predictor functions, we can directly tackle the bias/variance tradeoff. Moreover,
the type of penalties applied in GAMs have connections to Bayesian regression
and l2 regularization (see the PDF for details).
In order to see how this works, let’s look at a simple, simulated example in R.
We are simulating a dataset with 100 data points and two variables, x and Y .
The true relationship between x and Y follows the sine function, but our data
has normally distributed random errors.

set.seed(3)
x <- seq(0,2*pi,0.1)
z <- sin(x)
y <- z + rnorm(mean=0, sd=0.5*sd(z), n=length(x))
d <- cbind.data.frame(x,y,z)

We want to predict Y given x by fitting the simple model:

y = sλ (x) + e,

where sλ (x) is some smooth function. The level of smoothness is determined by

the smoothing parameter, which we denote by λ. The higher the value of λ, the
smoother the curve. In the PDF, you can find a more details on how λ works to
create smoothness as well as how to estimate s(x). But for now, let’s just think
of s(x) as a smooth function. For more details on smoothers, see the section
called “Splines 101” in the PDF.
We fit the model above to the simulated data with four different values for λ.
For each value of λ, we calculated the distance against the true function (the
underlying sine curve). The results are shown in the charts below. The dots
represent the actual data points, the punctuated line is the true curve, and the
solid line is this smoother.
Clearly, the model with λ = 0 provides the best fit of the data, but the resulting
curve looks very wiggly and would be hard to explain. Moreover, it has the
highest distance to the sine curve, which means that it does not do a good job

4
of capturing the true relationship. Indeed, the best choice in this case seems to
be some intermediate value, like λ = 0.6.
Notice how the smoothing parameter allows us to explicitly balance the
bias/variance tradeoff; smoother curves have more bias (in-sample error), but
also less variance. Curves with less variance tend to make more sense and
validate better in out-of-sample tests. However, if the curve is too smooth, we
may miss an important pattern.

Lambda=0, Dist = 2.31 Lambda=0.6, Dist = 0.95

1 1

0 0
y

−1 −1

0 2 4 6 0 2 4 6
x x

Lambda=0.3, Dist = 1.62 Lambda=1, Dist = 2.94

1 1

0 0
y

−1 −1

0 2 4 6 0 2 4 6
x x

Smoothing 101

Smoothers are the cornerstones of GAM and hence a quick overview is in order
before we get into model estimation. At a high level, there are three classes of
smoothers used for GAM:

• Local regression (loess)

• Smoothing splines
• Regression splines (B-splines, P-splines, thin plate splines)

In general, regression splines are more practical. They are computationally

cheap, and can be written as linear combinations of basis functions that do not

5
depend on the dependent variable, Y , which is convenient for prediction and
estimation.

Local Regression (loess)

Loess belongs to the class of nearest neighborhood-based smoothers. In order

to appreciate loess, we have to understand the most simplistic member of this
family: the running mean smoother.
Running mean smoothers are symmetric, moving averages. Smoothing is achieved
by sliding a window based on the nearest neighbors across the data, and com-
puting the average of Y at each step. The level of smoothness is determined by
the width of the window. While appealing due to their simplicity, running mean
smoothers have two major issues: they’re not very smooth and they perform
poorly at the boundaries of the data. This is a problem when we build predictive
models, and hence we need more sophisticated choices, such as loess.

How Loess Works

Loess produces a smoother curve than the running mean by fitting a weighted
regression within each nearest-neighbor window, where the weights are based
on a kernel that suppresses data points far away from the target data point.
For example, to produce a loess-smoothed value for target data point x, loess
deploys the following steps:

1. Determine smoothness using the span parameter. For example, if span =

0.6, each symmetric sliding neighborhood will contain 60% of the data –
(30% to the left and 30% to the right).
2. Calculate di = (xi − x)/h where h is the width of the neighborhood.
Create weights using the tri-cube function wi = (1 − d3i )3 , if xi is inside
the neighborhood, and 0 elsewhere.
3. Fit a weighted regression with Y as the dependent variable using the weights
from step 3. The fitted value at target data point x is the smoothed value.

Below is a loess smoother applied to the simulated data, loess function in R with
a span of 0.6. As we can see, loess overcomes the issues with the running mean
smoother:

Smoothing Splines

Smoothing splines take a completely different approach to deriving smooth

curves. Rather than using a nearest-neighbor moving window, we estimate the
smooth function by minimizing the penalized sum of squares

6
Basic Runnuing Mean Loess

1 1

0 0

−1 −1

0 2 4 6 0 2 4 6
x x

n
X Z
2
(yi − f (xi )) + λ (s00 (x))2 dx,
i=1

where the residual sum of squares

n
X
(yi − s(xi ))2
i=1

ensures that we fit the observed data, while the penalty term
Z
λ (s00 (x))2 dx

imposes smoothness (i.e., penalizes wiggliness).

Note that the penalty term imposes smoothness by calculating the integrated
square of the second derivatives. Intuitively, this makes sense: the second
derivative measures the slopes of the slopes. Thus, a wiggly curve will have
large second derivatives, while a straight line will have second derivatives of 0.
Hence we’re essentially “adding up” the squared second derivatives to measure
the wiggliness of the curve.

7
The tradeoff between model fit and smoothness is controlled by the smoothing
parameter, λ. Clearly, the smoothing parameter operates in a different way than
the span parameter in loess, which controls the width of the window, although
they both serve the same ultimate purpose.
Interestingly, it turns out that the function that minimizes the penalized sum
of squares is a natural cubic spline with knots at every data point, which is
also known as a smoothing spline. However, for predictive modeling, smoothing
splines have a major drawback: it is not practical to have knots at every data
point when dealing with large models. Moreover, having knots at every data
point only helps us in situations where we want to estimate wiggly functions
with small values of λ. This is a rare use case for predictive modeling where
we generally want to avoid overfitting. Thus, a smoothing spline is essentially
wasteful, as the effective degrees of freedom used will be much smaller than the
number of knots (due to the penalty).

Regression Splines

Regression splines offer a more practical alternative to smoothing splines. The

main advantage of regression splines is that they can be expressed as a linear
combination of a finite set of basis functions that do not depend on the dependent
variable Y , which is practical for prediction and estimation.
We can write a regression spline of order q as

K
X
s(x) = Bl,q (x)βl = B 0 β,
l=1

where Bp,1 (x), . . . , Bp,K (x) are basis functions, B is the model matrix of basis
functions, and β = [β1 : β2 : · · · : βp ] are the coefficients. The number of basis
functions depends on the number of inner knots – a set of ordered, distinct values
of xj – as well as the order of the spline. Specifically, if we let m denote the
number of inner knots, the number of basis functions is given by K = p + 1 + m.

How Regression Splines Work

To see how this works, let’s try fitting a simple, non-penalized cubic B-spline to
the data from the example above. This requires a total of 2(q + 1) + m knots,
where the additional knots are q + 1 equal boundary knots and q + 1 equal lower
boundary knots. The boundary knots are arbitrary as long as they are outside
the inner knots. The equivalent knots at the boundaries are needed to ensure
that the spline passes through the boundary knots.
Given a set of knots, k1 , . . . , k(2(q+1)+m) , we can calculate the basis functions
using a recursive formula (see [6])

8
Bj,0 (x) = I(kj ≤ x < kj+1 ) Bj,q (x)
x − kj kj+q+1 − x
= Bj,i−1 (x) + Bj+1,q−1 (x).
kj+q − kj kj+q+1 − kj+1

In this example we are taking the easy route by using quantiles to define the
inner knots. The outer knots are set to the min and max of x:

min(x)
> 0
max(x)
> 6.2
quantile(x, probs=c(0.25, .50, .75))
> 1.55 3.10 4.65

Since this is a cubic spline, we only need the third order basis functions – i.e.,
B3,1 , . . . , B3,7 . But, due to the recursive relationship, we cannot calculate these
functions without first calculating the lower order basis functions.
Here is how you can use R to create basis functions and estimate their corre-
sponding, non-penalized coefficients:

### Create basis functions

B <- bs(x, degree=3, intercept=TRUE, Boundary.knots=c(0, 6.2), knots=c(1.55, 3.10, 4.65))

### Get the coefficients

model <- lm(y~0 + B)

### The fitted values from the lm object are the smooth values
smoother <- lm$fitted

Generally, one does not need to worry too much about knot placement. Quantiles
seem to work well in most cases (although more than three knots is usually
required). For example, here we are getting a decent fit with only three inner
knots based on arbitrarily chosen quantiles, see the graphic on page 10.
Last, but not least, plotting the basis functions, along with the final spline, helps
illuminate what is going on behind the scenes. The plot below shows the basis
functions multiplied by their respective coefficients – i.e., B3,j βj – along with
the final spline. It is easy to imagine how more knots, which means more basis
functions, create a more flexible curve, see the graphic on page 10.

Penalized Regression Splines

In the simple example above, the only lever we have to control the smoothness of
the B-spline is the number of knots – fewer knots translate into more smoothness.

9
Cubic B−Spline (3 inner knots, no penalty)

−1

0 2 4 6
x

0.5

variable

B13

B23
0.0
B33

B43

B53

B63

B73

Spline
−0.5

−1.0

0 2 4 6
x

10
However, we can apply a penalty when estimating the basis function coefficients
to promote smoothness, just like we do with full smoothing splines. Thus, instead
of solving for βj with a standard linear model like we did above, we can minimize
the penalized sum of squares to get the smoother for x
n o
min ||y − B 0 β||2 + β 0 P β .
β

Note that the coefficients applied to the basis functions are essentially amplifiers
of the curvature of the spline. Hence, a popular way to penalize B-spline basis
functions is to use P-splines which efficiently impose smoothness by directly
penalizing the differences between adjacent coefficients. For example, for a
P-spline, the penalty term can look like this:

K−1
X
β0P β = (βl+1 − βl )2 .
l=1

There are many other available choices for regression splines, but that is beyond
the scope of this post. Typically, you don’t need anything fancier than the splines
covered here. For more on penalty matrices and different type of smoothers, see
[3].
In the next section we will discuss how to minimize the penalized sum of squares
for a model with more than one smoother – which is the ultimate use case we
are after.

Estimating GAMs

As mentioned in the intro, GAMs consist of multiple smoothing functions. Thus,

when estimating GAMs, the goal is to simultaneously estimate all smoothers,
along with the parametric terms (if any) in the model, while factoring in the
covariance between the smoothers. There are two ways of doing this:

• Local scoring algorithm.

• Solving GAM as a large GLM with penalized iterative reweighted least
squares (PIRLS).

For details on GAM estimation, see the “Estimation” section in the PDF.
In general, the local scoring algorithm is more flexible in the sense that you can
use any type of smoother in the model whereas the GLM approach only works
for regression splines (see the “Smoothing 101” section in the PDF). However,
the local scoring algorithm is computationally more expensive and it does not
lend itself as nicely to automated selection of smoothing parameters as the GLM
approach.

11
When fitting a GAM, the choice of smoothing parameters – i.e., the parameters
that control the smoothness of the predictive functions – is key for the aesthetics
and fit of the model. We can choose to pre-select the smoothing parameters or
we may choose to estimate the smoothing parameters from the data. There are
two ways of estimating the smoothing parameter for a logistic GAM:

• Generalized cross validation criteria (GCV).

• Mixed model approach via restricted maximum likelihood (REML).

REML only applies if we are casting GAM as a large GLM. Generally the REML
approach converges faster than GCV, and GCV tends to under-smooth (see [3],
[9]).

Penalized Likelihood

For both local scoring and the GLM approach, the ultimate goal is to maximize
the penalized likelihood function, although they take very different routes. The
penalized likelihood function is given by

2l(α, s1 (x1 ), . . . , sp (xp )) − penalty,

where l(α, s1 , . . . , sp ) is the standard log likelihood function. For a binary GAM
with a logistic link function, the penalized likelihood is defined as

n
X
l(α, s1 (x1 ), . . . , sp (xp )) = (yi log p̂i + (1 − yi ) log(1 − p̂i )).
i=1

 −1
p
X
p̂i = 1 + exp(−α̂ − sj (xij )) .
j=1

where p̂i is given by

 −1
p
X
p̂i = P (Y = 1|x1 , . . . , xp ) = 1 + exp(−α̂ − sj (xij )) .
j=1

The penalty can, for example, be based on the second derivatives

p
X Z
penalty = λj (s00j (xj ))2 dx.
j=1

12
The parameters, λ1 , . . . , λp , are the aforementioned smoothing parameters which
control how much penalty (smoothness) we want to impose on the model. The
higher the value of λj , the smoother the curve. These parameters can be
preselected or trained from the data. See the “Estimation” section of the PDF
for more details.
Intuitively, this type of penalty function makes sense: the second derivative
measures the slopes of the slopes. This means that wiggly curve will have large
second derivatives, while a straight line will have second derivatives of 0. Thus we
can quantify the total wiggliness by “adding up” the squared second derivatives.

Local Scoring Algorithm

The local scoring algorithm is an extension of the backfitting algorithm, which

in turn is based on the Gauss-Seidel procedure for solving linear systems.
This is an iterative procedure with multiple nested loops. The backfitting/Gauss-
Seidel framework can be used to solve a wide array of messy systems, and does
not require calculation of derivatives. Here is how it works (see [2]):
Step 1: Set all smooth functions to 0, i.e., sj (xj ) ≡ 0
Step 2: Cycle through variables to get the smooth functions
First, define the estimated log-odds for observation i , i = 1, . . . , n, as

p
X
νi = α̂ + sj (xij ),
j=1

and then construct the pseudo dependent variable

yi − p̂i
zi = νi + ,
p̂i (1 − p̂i )

as well as the weights

wi = p̂i (1 − p̂i ).

To get the function s1 (x1 ) simply smooth the pseudo dependent variable z
against x1 , using the weights defined above. We can then do the same thing for
x2 , after adjusting ν, p̂, w to account for the change to s1 (x1 ). Same goes for
x3 , . . . , x p .
Note that cycling through the predictor to get the weighted smoothers requires
an extra layer of iterations because the weights change at every iteration. Hence,
there are a significant amount of computations to be done.
Step 3: Repeat step 2 until the functions converge.

13
Solving GAM as a Large GLM

The basic idea here is to recast a GAM as a parametric, penalized GLM. The
GLM approach is a more direct approach as it reduces step 2 of the local scoring
to a single step where the coefficients are estimated simultaneously. Moreover, it
comes with the properties of the battle-tested GLM framework.

Casting GAM as a Large GLM

As discussed above, regression splines can be written as a linear combination of
the basis functions and the corresponding coefficients. For example, the spline
for predictor xj can be written as

sj (xj ) = Bj0 βj ,

where Bj is a matrix of basis functions for xj , and βj = [βj,1 , . . . , β1,Kj ] denotes

the corresponding coefficients.
If we create the pseudo dependent variable z as in step 2 of the local scoring
algorithm, and the corresponding weighting matrix W = diag(w1 , . . . , wp ), we
can get the coefficients by minimizing the penalized sum of squares given by

√
|| W (z − B 0 β)||2 + β 0 P β.

Here B = [B1 : B2 : · · · : Bp ] is P
the overall model matrix, which contains all basis
functions and hence will have Kl columns and n rows. The overall penalty
matrix P is a block diagonal matrix defined as

P = block diag(λ1 P1 , . . . , λp Pp ).

Penalized Re-weighted Iterative Least Squares (PIRLS)

On the surface, this looks like a garden-variety, penalized least squares problem.
However, z and W are not static since they depend on the fitted values (estimated
probabilities). Hence, we need an iterative approach to get β̂. This approach is
called penalized re-weighted iterative least squares (PIRLS) which is an extension
of the widely used iterative re-weighted least squares (IRLS) used for GLMs.
The coefficient estimates at iteration k are given by

β̂ (k+1) = (B 0 W(k) B + P )−1 B 0 W(k) z(k) ,

and we continue to increment k until we reach convergence. Note that the

iteration subscript applied to W and z indicate that they both depend on the
estimated probabilities p̂ which change at every iteration.

14
Bayesian Interpretation

It turns out that the penalties imposed by GAM are equivalent to treating the
coefficients of the large GLM as random variables with the normally distributed
priors (see [9], [10])

β ∼ N 0, σ 2 P −1 .

This connection makes sense: when we estimate smooth functions in a GAM

we are essentially imposing the prior belief that the predictive functions follow
smooth patterns. Moreover, the fact that GAM is a regularized model means
that GAMs have some level of built-in fortification against multicollinearity.
Note that if we have no penalty – i.e., if λj → 0 for all j – the variance goes
to infinity and we get a “free” estimate. Conversely, if we set all λj = ∞, all
coefficients are shrunk to 0. Moreover, if the penalty matrix is an identity matrix,
which means that we are simply “muting” the splines in a blanket fashion, we
get ridge regression (L2 regularization).

Effective Degrees of Freedom

For a regular GLM, the degrees of freedom equal the number of parameters to
be estimated, which can also be calculated by taking the trace of the hat matrix:

model df = tr(H) = tr X(X 0 X)−1 X 0 .

We can follow the same idea for a GAM and calculate the effective degrees of
freedom as

model edf = tr B(B 0 W B + P )−1 B 0 W .

If there is no penalty, the model is a regular large GLM and the trace is the
number of parameters. However, the number of effective degrees of freedom will
decrease as the as the smoothing parameters, λ1 , . . . , λp , increase.
This is important, as GAM software packages will show effective degrees of
freedom as part of the model output.

Selecting the Smoothing Parameter

Until now, we have assumed that the smoothing parameters are pre-determined.
However, although a λ of 0.6 seems to work quite well for most models, we can
determine the smoothing parameters from the data. It is more costly from a
computational perspective, but more appealing from a scientific perspective.

15
At a high level, there are two ways of estimating the smoothing parameter for a
logistic GAM:

• Generalized cross validation criteria (CGV).

• Mixed model approach via restricted maximum likelihood (REML).

REML only applies if we are casting GAM as a large GLM. Generally the REML
approach converges faster than GCV, and GCV tends to under-smooth (see [3],
[9]).
Optimizing the smoothing parameters requires an extra layer of iterations. This
is either done through an inner loop or an outer loop. The inner loop approach
optimizes the smoothing parameter within each iteration of PIRLS, while the
outer loop runs a full PIRLS for each trial vector λ = λ1 , . . . , λp . In the mgcv
R package, which we will discuss later, these two approaches are referred to as
performance iterations and nested iterations, respectively.

Generalized Cross Validation Criteria (GCV)

This is based on a “leave one out” cross-validation approach. The strategy is to

remove one data point at a time, fit a smoother to the remaining data, and then
fit of the smoother against the entire dataset. The goal is to pick the λj that
minimizes the average error across all n validations.
Fortunately, it turns out that we do not have to fit n smoothers to achieve this
(for details see [3], [7]). In fact, for a logistic GAM, we can use the GCV statistic:
√
n|| W (z − B 0 β)||2
GCV = ,
(n − tr(H))2

where H is the hat matrix and B is the model matrix consisting of basis functions.
This statistic essentially calculates the error of the model and adjusts for the
degrees of freedom and is a linear transformation of the AIC statistic. Hence
we can use this statistic for model comparison in general, not just selection of
smoothing parameters.
When running GAMs in R, you will see the UBRE score in outputs. This is
essentially the GCV score when the scale parameter of the distribution of the
response variable is known. For example, when Y is binomial where the scale
parameter is known, you will see the UBRE score used in outputs. See [3] for
more details.

REML

Since GAM has a Bayesian interpretation (see the section on Bayesian Interpre-
tation above), we can treat it like a standard mixed model by separating out the

16
fixed effects and estimating the smoothing parameters as variance parameters.
(Note that the variance of the coefficients depend on P , which in turn depends
on λ = λ1 , . . . , λp .)
Here is how it works: the restricted likelihood function, given the vector of
smoothing parameters, λ, is obtained by integrating out beta from the joint
density of the data and the coefficients
Z
lr (β̂, λ) = f (y|β)f (β)dβ.

The restricted likelihood function depends on λ and the estimates β̂ (through

the penalty), but not the random parameters β. Thus we can use this function
to derive trial vectors for λ for a nested PIRLS iteration:

1. Given a trial vector λ, estimate β using PIRLS.

2. Update λ by maximizing the restricted log likelihood.
3. Repeat steps 1 and 2 until convergence.

For more details see [4] and [10].

Variable Selection

GAM is not the type of modeling technique where you can leisurely throw in
hundreds of variables to see what “sticks.” GAMs are somewhat parameter-heavy,
which can carry nontrivial computational costs for large models. Moreover, GAM
is not immune to multicollinearity, which means that the “kitchen sink” approach
can lead to strange results.
Hence it is advised to pre-screen the candidate predictors before we run a GAM.
In other words, we need a way to measure the “univariate” strength of each
predictor and use this information to remove variables that are never going to
contribute in any meaningful way.

Variable Pre-Screening With the Information Value

Perhaps the most powerful framework for exploring the univariate strength of
a predictive variable, within the context of a binary regression model, is an
approach that has close ties to the Kullback-Leibler divergence: the information
value (IV) and weight of evidence (WOE).
This framework has a number of appealing features:

• Detect any relationship between a predictor xj and a dependent variable

Y , whether it is nonlinear or linear.

17
• Visualize the relationship between xj and Y .
• Assess the predictive value of missing values.
• Seamlessly compare the strength of continuous and categorical variables.

The WOE/IV framework is based on the following relationship:

P (Y = 1|xj ) P (Y = 1) f (xj |Y = 1)
log = log + log .
P (Y = 0|xj ) P (Y = 0) f (xj |Y = 0)

This is saying that the conditional logit, given xj , can be written as the overall
log-odds (the “intercept”) plus the log-density ratio – also known as the weight of
evidence (WOE). Thus, when we are fitting a logit model, we are – whether we like
it or not – attempting to estimate WOE. Hence, for pre-modeling visualization,
we can plot WOE against xj in order to visualize how xj affects Y in a univariate
setting.

Measuring Univariate Variable Strength

We can leverage WOE to measure the predictive strength of xj – i.e., how well
it helps us separate cases when Y = 1 from cases when Y = 0. This is done
through the information value (IV), which is defined like this:

f (xj |Y = 1)
Z
IV = log (f (xj |Y = 1) − f (xj |Y = 0)) dx.
f (xj |Y = 0)

Note that the IV is essentially a weighted sum of all the individual WOE
values where the weights incorporate differences between the numerators and
the denominators. Generally, if IV < 0.05, the variable has very little predictive
power.

Estimating WOE
The most common approach to estimating the conditional densities needed to
calculate WOE is to bin xj and then use a histogram-type estimator.
Here is how it works: create a k × 2 table where k is the number of bins, and
the cells within the two columns count the number of records where Y = 1 and
Y = 0, respectively. The conditional densities are then obtained by calculating
the “column percentages” from this table. The typical number of bins used is
10-20. If xj is categorical, no binning is needed and the histogram estimator can
be used directly.
If B1 , . . . , Bk denote the bins for xj , the WOE for bin i can be written as

P (Xj ∈ Bi |Y = 1)
WOE(xj )i = log ,
P (Xj ∈ Bi |Y = 0)

18
and the IV is the weighted sum of the k WOE values

X
IV(xj ) = k(P (Xj ∈ Bi |Y = 1) − P (Xj ∈ Bi |Y = 0)) × WOE(xj )i
i=1

You can use the Information R package to do these calculations. The package
also allows for cross validation of IV and can be downloaded from this repository:
https://github.com/klarsen1/Information

Multivariate Variable Selection

Pre-screening of variables is simply a way to reduce the pool of variables we

want to search from. It does not dictate the final model. Whether or not we
choose to perform variable screening, chances are that we still need to apply
some sort of “multivariate” selection of variables inside GAM to select the final
model – just like we would with any other type of regression model.
There are two available approaches to variable selection in R:

1. Stepwise selection (forward and backward)

2. Shrinkage

Stepwise selection needs no explanation. The spirit of the shrinkage approach

is similar to lasso regression, although the paradigm of these two approaches
is quite different. The basic idea is to add additional penalties to the model,
by adding constants to the diagonal of the penalty matrix, in order to “shrink”
smoothers that are very wiggly (during REML) to 0. Hence, instead of physically
removing variables in a stepwise fashion, we are leaving weak smoothers in the
model with coefficients near 0. For more information, see [3] and the mgcv
documentation.

Fitting GAMs in R

The two main packages in R that can be used to fit generalized additive models
are gam and mgcv. The gam package was written by Trevor Hastie and closely
follows the theory outlined in [2]. The mgcv package was written by Simon
Wood, and, while it follows [2] in many ways, it is much more general because it
considers GAM to be any penalized GLM (for more details, see [3]).
The differences are described in detail in the documentation for mgcv. Here is a
cheat sheet:

Component gam mgcv

Confidence intervals Frequentist Bayesian

19
Component gam mgcv
Splines Smoothing splines and Does not support loess
loess or smoothing splines, but
supports a wide array
of regression splines (P-
splines, B-splines, thin
plate splines, tensors) +
tensors
Parametric terms Supported Supported, and you can pe-
nalize or treat as random
effects
Variable selection Stepwise selection Shrinkage
Optimization Local scoring PIRLS
Selecting smoothing No default approach Finds smoothing parame-
parameters ters by default. Supports
both REML and GCV
Large datasets Can parallelize stepwise Special bam function for
variable selection with the large datasets. Can also
doMC package parallelize certain opera-
tions in the gam function
through openMP
Missing values Clever approach to No special treatment.
dealing with miss- Omits observations with
ing values through missing values
na.action=gam.replace
Multi dimensional Supported with loess Supported with tensors
smoothers and thin plate splines
Model diagnostics Standard GAM diagnos- Standard GAM diagnos-
tics tics + the concurvity mea-
sure which is a generaliza-
tion of collinearity

gam and mgcv do not work well when loaded at the same time. Restart the R
session if you want to switch between the two packages – detaching one of the
packages is not sufficient.
Here is an example of how to fit a GAM in R:

### GAM example using mgcv

library(mgcv)
library(ggplot2)
# fake data
n <- 50
sig <- 2

20
dat <- gamSim(1,n=n,scale=sig)

# P-spline smoothers (with lambda=0.6) used for x1 and x2; x3 is parametric.

b1 <- mgcv::gam(y ~ s(x1, bs='ps', sp=0.6) + s(x2, bs='ps', sp=0.6) + x3, data = dat)
summary(b1)
plot(b1)

# plot the smooth predictor function for x1 with ggplot to get a nicer looking graph
p <- predict(b1, type="lpmatrix")
beta <- coef(m)[grepl("x1", names(coef(b1)))]
s <- p[,grepl("x1", colnames(p))] %*% beta
ggplot(data=cbind.data.frame(s, dat$x1), aes(x=dat$x1, y=s)) + geom_line()

# predict
newdf <- gamSim(1,n=n,scale=sig)
f <- predict(b1, newdata=newdf)

# select smoothing parameters with REML, using P-splines

b2 <- mgcv::gam(y ~ s(x1, bs='ps') + s(x2, bs='ps') + x3, data = dat, method="REML")

# select variables and smoothing parameters

b3 <- mgcv::gam(y ~ s(x0) + s(x1) + s(x2) + s(x3) , data = dat, method="REML", select=TRUE)

# loess smoothers with the gam package (restart R before loading gam)
library(gam)
b4 <- gam::gam(y ~ lo(x1, span=0.6) + lo(x2, span=0.6) + x3, data = dat)
summary(b4)

Comparing GAM Performance With Other Techniques

Business Problem

We will be using a marketing example from the insurance industry (source

undisclosed). The data contains information on customer responses to a historical
direct mail marketing campaign. Our goal is to improve the performance of
future waves of this campaign by targeting people who are likely to take the
offer. We will do this by building a “look-alike” model to predict the probability
that a given client will accept the offer, and then use that model to select the
target audience going forward[1f].
Obviously, we want a model that is accurate so that we can find the best
possible target audience. In addition, we want to be able to provide insights

21
from the model, such as partial impact charts, that show how the average
propensity changes across various client features. We want to make sure that
the relationships we find stand to reason from a business perspective.

Data

The dataset has 68 predictive variables and 20k records. For modeling and
validation purposes, we split the data into 2 parts:

• 10k records for training. This dataset will be used to estimate models.
• 10k records for testing. This dataset will be kept in a vault to the very
end and used to compare models.

The success of the model will be based on its ability to predict the probability
that the customer takes the offer (captured by the PURCHASE indicator), for
the validation dataset.
Most variables contain credit information, such as number of accounts, active
account types, credit limits, and utilization. The dataset also captures the age
and location of the individuals.
Let’s return to our marketing case study. Recall that we are trying to predict
whether a person takes a direct marketing offer. Hence, we are trying to build a
GAM that looks like this:

P (convert)
log = s1 (x1 ) + · · · + sp (xp ) + x0 β
1 − P (convert)

where
( x’β
)areparametricterms(dummyvariablesinourcase).

Model Comparison Strategy

We built six models with six different techniques using the training dataset. The
models were then validated against the validation dataset. The area under the
ROC curve was used to evaluate model performance.
In order to make the comparison as fair as possible, we used the same set
of variables for each model. The variables were selected using the following
procedure:

1. Remove all variables with an information value (IV) less than 0.05. (See
the PDF for more details on IV.). You can use the Information Package
to calculation information values.

22
2. Eliminate highly correlated variables using variable clustering (ClustOfVar
package). We generated 20 clusters and picked the variable with the highest
IV within each cluster.

Obviously, we could have used variable selection techniques inside GAM as

described above, but we wanted to use the same 20 variables for each model.
List of the seven models tested:

1. Random forest with 100 trees using the openMP enabled randomForestSRC
package.
2. GAM (mgcv) using P-splines with smoothing parameters of 0.6 for all
variables (except dummy variables).
3. Same as #2, but optimal smoothing parameters are selected with REML
(instead of using 0.6 for all variables).
4. Same as #2, but optimal smoothing parameters are selected with REML
(see the PDF for details) and weak variables are shrunk towards 0 using
selection=TRUE in mgcv.
5. SVM built with the e1071 package, using a Gaussian radial kernel.
6. KNN classifier with k=100. Distance metrics were weighted using an
Epanechnikov kernel. See the kknn package for more details.
7. Linear logistic regression model.

The code used can be downloaded from the github repository.

Testing Results

Model Validation AUROC Estimation Time Scoring Time

Random forest 0.809 6.39 39.38
GAM, lambda=0.6 0.807 3.47 0.52
GAM, estimate lambdas 0.815 42.72 0.29
GAM, estimate lambdas, extra shrinkage 0.814 169.73 0.33
SVM 0.755 13.41 1.12
Linear logit 0.800 0.1 0.006
KNN with K=100 0.800 NA 3.34

Note that in order to get the AUROC for the SVM, we used the enhanced
version of Platt’s method to convert class assignments into probabilities (to get a
continuous measure, see [11]). Settings for KNN and SVM were based on trying
different combinations.
As we can see, GAM performs well compared to the other methods. Obviously,
this test is based on a single dataset, so no universal conclusions can be drawn,

23
but the dataset has enough correlation and “chunky” variables to make the
results relevant.
The GAM models where smoothing parameters were automatically selected with
REML perform better than the model where we used a flat smoothing parameter
of 0.6 across all variables (which tends to work well for most models). However,
in this example, the models with automatic selection also tend to produce more
wiggly functions than the model with λ = 0.6 across all variables. For a targeting
model, the additional wiggliness is not worth the loss of model intuition.
The biggest surprises in this test are the performances of SVM and the linear
logit model. The linear logit model is performing surprisingly well given that
the strongest variable in the model (N_OPEN_REV_ACTS) is not linearly
correlated with the log odds of success (PURCHASE). The reason could be
that, while this relationship is not linear, it is monotonic. Also, the AUROC is
based on the estimated probability, which is indeed not linear in the predictive
variables due to the Sigmoidal transformation (1 + exp(−ν))−1 . SVM on the
other hand, is performing surprisingly poorly. However, it should be mentioned
that the author of this post has very little experience with SVM, which could
be a disadvantage for SVM. Also, the Pratt conversion of SVM classification to
probabilities could play a role here.

Partial Relationships

As stated earlier, an important part of this modeling exercise is to look at the

partial relationships between the binary dependent variable (PURCHASE) and
the predictors.
This is shown below for the variable N_OPEN_REV_ACTS (number of open
revolving accounts) for random forest and GAM. Note that for the random forest
model, these plots are generated by sending different values of xj (in our case
20) through the forest and getting the estimated probabilities at each value of
xj . For GAM, we simply plot the final regression spline.
Note that, unlike GAM, random forest does not try to promote smoothness.
This clearly shows in the chart below, as the GAM-based predictive function is
smoother than the one from random forest. However, the GAM model does some
potentially dangerous interpolation beyond x = 20 where the data is thin. (only
1.64% of the sample have N_OPEN_REV_ACTS>20 although the conversion
rate for this group is 2.3 times higher than the average).
And here are the partial impact plots for one the weakest variables. The random
forest curve does not look very intuitive:

24
Random Forest GAM (lambda=0.6)

0.6 0.6

0.5 0.5

0.4 0.4
P(Y=1)

0.3 0.3

0.2 0.2

0 20 40 0 20 40
x x

Random Forest GAM (lambda=0.6)

0.30 0.30

0.25 0.25
P(Y=1)

0.20 0.20

0.15 0.15

0 25 50 75 100 0 25 50 75 100
x x

25
Final Words

As stated in the introduction, the purpose of this post is to get more data
scientists to use GAM. Hopefully, after reading this post, you’ll agree that GAM
is a simple, transparent, and flexible modeling technique that can compete with
other popular methods. The code in the github repository should be sufficient
to get started with GAM.
Of course, GAM is no silver bullet; one still needs to think about what goes
into the model to avoid strange results. In fact, random forest is probably the
closest thing to a silver bullet. However, random forest is much more of a black
box, and you cannot control smoothness of the predictor functions. This means
that you cannot combat the bias variance tradeoff as directly as with GAMs or
ensure interpretable predictor functions. For those reasons, every data scientist
should make room in their toolbox for GAM.

References

[1] Hastie, Trevor and Tibshirani, Robert. (1990), Generalized Additive Models,
New York: Chapman and Hall.
[2] Hastie, Trevor and Tibshirani, Robert. (1986), Generalized Additive Models,
Statistical Science, Vol. 1, No 3, 297-318.
[3] Wood, S. N. (2006), Generalized Additive Models: an introduction with R,
Boca Raton: Chapman & Hall/CRC
[4] Wood, S. N. (2004). Stable and efficient multiple smoothing parameter
estimation for generalized additive models. Journal of the American Statistical
Association 99, 673–686
[5] Marx, Brian D and Eilers, Paul H.C. (1998). Direct generalized additive
modeling with penalized likelihood, Computational Statistics & Data Analysis
28 (1998) 193-20
[6] Sinha, Samiran, A very short note on B-splines, http://www.stat.tamu.edu/
~sinha/research/note1.pdf
[7] German Rodrıguez (2001), Smoothing and Non-Parametric Regression,
http://data.princeton.edu/eco572/smoothing.pd
[8] Notes on GAM By Simon Wood. http://people.bath.ac.uk/sw283/mgcv/
tampere/gam.pdf
[9] Notes on Smoothing Parameter Selection By Simon Wood, http://people.
bath.ac.uk/sw283/mgcv/tampere/smoothness.pdf
[10] Notes on REML & GAM By Simon Wood, http://people.bath.ac.uk/sw283/
talks/REML.pdf

26
[11] Karatzoglou, Alexandros, Meyer, David and Hornik, Kurt (2006), Support
Vector Machines in R, Journal of Statistical Software Volume 15, Issue 9, http:
//www.jstatsoft.org/v15/i09/paper
[12] “e1071” package, https://cran.r-project.org/web/packages/e1071/e1071.pdf
[13] “mgcv” package, https://cran.r-project.org/web/packages/mgcv/mgcv.pdf
[14] “gam” package, https://cran.r-project.org/web/packages/gam/gam.pdf
[15] “randomForestSRC” package, https://cran.r-project.org/web/packages/
randomForestSRC/randomForestSRC.pdf
[1f] When we target clients with the highest propensity, we may end up preaching
to the choir as opposed to driving uplift. But that is beyond the scope of this
post.

Test To Identify Outliers in Data Series
100% (1)
Test To Identify Outliers in Data Series
16 pages
Advanced Strategies For Metabolomic Data Analysis
100% (1)
Advanced Strategies For Metabolomic Data Analysis
31 pages
cfpb_adult-fin-ed_chinese-style-guide-glossary
No ratings yet
cfpb_adult-fin-ed_chinese-style-guide-glossary
81 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Cockroach and Frog
No ratings yet
Cockroach and Frog
3 pages
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
No ratings yet
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
521 pages
Neonatal Sepsis
100% (1)
Neonatal Sepsis
38 pages
Notes
No ratings yet
Notes
5 pages
Farasapy-Tests - Ipynb - Colaboratory
No ratings yet
Farasapy-Tests - Ipynb - Colaboratory
7 pages
Missing and Outlier
No ratings yet
Missing and Outlier
20 pages
LEAP T304p UserGuide
No ratings yet
LEAP T304p UserGuide
46 pages
Glimmix
No ratings yet
Glimmix
244 pages
ED345942
No ratings yet
ED345942
21 pages
Difference Between Logit and Probit Models
100% (1)
Difference Between Logit and Probit Models
7 pages
Spline and Penalized Regression
No ratings yet
Spline and Penalized Regression
45 pages
5.13. What Are The Advantages and Disadvantages of Allowing Employees To Use Their Personal Smartphones For Work?
No ratings yet
5.13. What Are The Advantages and Disadvantages of Allowing Employees To Use Their Personal Smartphones For Work?
3 pages
Faith First Chapter Reviews PDF
No ratings yet
Faith First Chapter Reviews PDF
2 pages
Western Museums Should Repatriate Cultural Artefacts - Debating Matters
No ratings yet
Western Museums Should Repatriate Cultural Artefacts - Debating Matters
10 pages
Syarikat Jurukur Malaysia
No ratings yet
Syarikat Jurukur Malaysia
6 pages
Adventure Works Cycles
No ratings yet
Adventure Works Cycles
2 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Philippine Waste Agricultural Biomass: Prospects and Opportunities
No ratings yet
Philippine Waste Agricultural Biomass: Prospects and Opportunities
2 pages
2A, 55V, 100Khz Step-Down Converter
No ratings yet
2A, 55V, 100Khz Step-Down Converter
12 pages
Apply Automotive Mechanical System Fundamental TTLM Final
No ratings yet
Apply Automotive Mechanical System Fundamental TTLM Final
69 pages
A Seminar Report On
No ratings yet
A Seminar Report On
4 pages
ARMA-Stochastic Time Series Modeling
100% (1)
ARMA-Stochastic Time Series Modeling
19 pages
BCM 241 - DA - Rubric - A2022
No ratings yet
BCM 241 - DA - Rubric - A2022
2 pages
Modeling With Penalized Splines
No ratings yet
Modeling With Penalized Splines
50 pages
Gamlss-Manual Instructions On How To Use The Gamlss Package 2008
No ratings yet
Gamlss-Manual Instructions On How To Use The Gamlss Package 2008
206 pages
TCM X30017
No ratings yet
TCM X30017
40 pages
Study Guide Answers Exam 2 Bio210 2021 - Tagged
No ratings yet
Study Guide Answers Exam 2 Bio210 2021 - Tagged
13 pages
Outlier Detection: Techniques and Applications: N. N. R. Ranga Suri Narasimha Murty M G. Athithan
No ratings yet
Outlier Detection: Techniques and Applications: N. N. R. Ranga Suri Narasimha Murty M G. Athithan
227 pages
Handbook Statistical Foundations of Machine Learning
No ratings yet
Handbook Statistical Foundations of Machine Learning
267 pages
Chap 3
No ratings yet
Chap 3
20 pages
Stan Reference 2.7.0
No ratings yet
Stan Reference 2.7.0
534 pages
1 - Course Slides - Data Science and ML Fundamentals
No ratings yet
1 - Course Slides - Data Science and ML Fundamentals
92 pages
Statistics I
100% (2)
Statistics I
686 pages
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
No ratings yet
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
658 pages
Dealer Agreement International
No ratings yet
Dealer Agreement International
7 pages
JD - Strategy Manager - YES BANK
No ratings yet
JD - Strategy Manager - YES BANK
2 pages
Bilateral Cleft Lip and Palate
No ratings yet
Bilateral Cleft Lip and Palate
10 pages
Maha Ganapati Homam
100% (1)
Maha Ganapati Homam
36 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Exploring the Limits of Bootstrap
No ratings yet
Exploring the Limits of Bootstrap
458 pages
Binomial Logistic Regression Using SPSS
No ratings yet
Binomial Logistic Regression Using SPSS
11 pages
Doug Bates Mixed Models
No ratings yet
Doug Bates Mixed Models
75 pages
MiniTab Introduction
100% (1)
MiniTab Introduction
124 pages
R Manual To Agresti's Categorical Data Analysis
100% (1)
R Manual To Agresti's Categorical Data Analysis
280 pages
Bayesian Lecture Notes
No ratings yet
Bayesian Lecture Notes
28 pages
BBAG6 MidQ1-HELE
No ratings yet
BBAG6 MidQ1-HELE
5 pages
BiodiversityR PDF
No ratings yet
BiodiversityR PDF
128 pages
Regular 1811835011024 Gorakh Nath: EXAMINATION 2018-19
No ratings yet
Regular 1811835011024 Gorakh Nath: EXAMINATION 2018-19
1 page
4. AASHTO T 309-20 Standard Method of Test for Temperature of Freshly Mixed Portland Cement Concrete
No ratings yet
4. AASHTO T 309-20 Standard Method of Test for Temperature of Freshly Mixed Portland Cement Concrete
4 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Diggle 2013 Statistical Analysis of Spatial and
No ratings yet
Diggle 2013 Statistical Analysis of Spatial and
69 pages
In Class Exercise Linear Regression in R
No ratings yet
In Class Exercise Linear Regression in R
6 pages
Anthony Tito: Education
No ratings yet
Anthony Tito: Education
2 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
Priors Algorithms Bayesian
No ratings yet
Priors Algorithms Bayesian
108 pages
Kape 204
100% (1)
Kape 204
16 pages
Applied Regression Analysis: Third Edition
0% (1)
Applied Regression Analysis: Third Edition
9 pages
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
No ratings yet
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
600 pages
Statistical Modeling
No ratings yet
Statistical Modeling
22 pages
Fuse box relay Vauxhall Opel Vectra C (Signum) relay with assignment and location
No ratings yet
Fuse box relay Vauxhall Opel Vectra C (Signum) relay with assignment and location
9 pages
GAMS Getting Started
No ratings yet
GAMS Getting Started
31 pages
Generalised Linear Models and Bayesian Statistics
No ratings yet
Generalised Linear Models and Bayesian Statistics
35 pages
Prophet R
No ratings yet
Prophet R
18 pages
Sampling Distribution and Simulation in R
No ratings yet
Sampling Distribution and Simulation in R
10 pages
Solution CH # 5
No ratings yet
Solution CH # 5
39 pages
Regression Splines
No ratings yet
Regression Splines
4 pages
One-Sample T-Test
No ratings yet
One-Sample T-Test
9 pages
Regression
No ratings yet
Regression
46 pages
Survival Plots SURVMINER Package Tutorial
No ratings yet
Survival Plots SURVMINER Package Tutorial
5 pages
Intermediate R - Nonlinear Regression in R
No ratings yet
Intermediate R - Nonlinear Regression in R
4 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
2003 Makipaa 1
No ratings yet
2003 Makipaa 1
15 pages
Statistical Inference For Data Science
100% (1)
Statistical Inference For Data Science
124 pages
13 Pag Design and Analysis of Experiments in The Health Sciences
100% (1)
13 Pag Design and Analysis of Experiments in The Health Sciences
13 pages
R Markdown
No ratings yet
R Markdown
15 pages
Generalized Additive Model
No ratings yet
Generalized Additive Model
10 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Syllabus of Ph.D. Course
No ratings yet
Syllabus of Ph.D. Course
6 pages
The Evolution of An In-Line Inspection Solution: Axial Flaw Detection
No ratings yet
The Evolution of An In-Line Inspection Solution: Axial Flaw Detection
6 pages
Random Variable Generation
No ratings yet
Random Variable Generation
5 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Examples and Problems in Mathematical Statistics
From Everand
Examples and Problems in Mathematical Statistics
Shelemyahu Zacks
5/5 (2)
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
From Everand
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
Partha Pritam Deka
No ratings yet
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
From Everand
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
Granino A. Korn
No ratings yet
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
An Introduction to Statistical Computing: A Simulation-based Approach
From Everand
An Introduction to Statistical Computing: A Simulation-based Approach
Jochen Voss
No ratings yet