Axioms 13 00346 v2
Axioms 13 00346 v2
Axioms 13 00346 v2
Article
Local Influence for the Thin-Plate Spline Generalized
Linear Model
Germán Ibacache-Pulgar 1,2, * , Pablo Pacheco 3 , Orietta Nicolis 4 and Miguel Angel Uribe-Opazo 5
1 Institute of Statistics, Universidad de Valparaíso, Av. Gran Bretaña 1111, Valparaíso 2360102, Chile
2 Centro de Estudios Atmosféricos y Cambio Climático (CEACC), Universidad de Valparaíso,
Valparaíso 2360102, Chile
3 Dirección de Educación Virtual, Universidad de Playa Ancha, Avenida Guillermo González de Hontaneda
855, Playa Ancha, Valparaíso 2360072, Chile; ppachecog@upla.cl
4 Facultad de Ingenieria, Universidad Andres Bello, Calle Quillota 980, Viña del Mar 2520000, Chile;
orietta.nicolis@unab.cl
5 Centro de Ciências Exatas e Tecnológicas, Western Paraná State University (UNIOESTE),
Cascavel 85819-110, Paraná, Brazil; miguel.opazo@unioeste.br
* Correspondence: german.ibacache@uv.cl
Abstract: Thin-Plate Spline Generalized Linear Models (TPS-GLMs) are an extension of Semipara-
metric Generalized Linear Models (SGLMs), because they allow a smoothing spline to be extended to
two or more dimensions. This class of models allows modeling a set of data in which it is desired
to incorporate the non-linear joint effects of some covariates to explain the variability of a certain
variable of interest. In the spatial context, these models are quite useful, since they allow the effects
of locations to be included, both in trend and dispersion, using a smooth surface. In this work, we
extend the local influence technique for the TPS-GLM model in order to evaluate the sensitivity of
the maximum penalized likelihood estimators against small perturbations in the model and data.
We fit our model through a joint iterative process based on Fisher Scoring and weighted backfitting
algorithms. In addition, we obtained the normal curvature for the case-weight perturbation and
response variable additive perturbation schemes, in order to detect influential observations on the
model fit. Finally, two data sets from different areas (agronomy and environment) were used to
illustrate the methodology proposed here.
Citation: Ibacache-Pulgar, G.; Keywords: exponential family; smoothing spline; penalized likelihood function; weighted back-
Pacheco, P.; Nicolis, O.; Uribe-Opazo, fitting algorithm; diagnostics measures
M.A. Local Influence for the
Thin-Plate Spline Generalized Linear MSC: 62P12; 62J20; 62G05
Model. Axioms 2024, 13, 346. https://
doi.org/10.3390/axioms13060346
where θi is the canonical form of the location parameter and depends on the mean µi .
The term ai (ϕ) represents a known function of the unknown dispersion parameter ϕ
(or a vector of unknown dispersion parameters). The function c depends on both the
dispersion parameter and the responses, while ψ is a known function, such that the mean
and variance of yi are given by: µi = E(yi ) = ∂ψ(θi )/∂θi and Var(yi ) = ai (ϕ) Vi , with
Vi = V(µi ) = ∂2 ψ(θi )/∂θi2 , respectively. The TPS-GLM is defined by Equation (1) and the
following systematic component:
f ( t1 )
.. T
f= . = Eδ + T a ,
f (tn )
1 1 ... 1
T= .
t1 t2 ... tn
η = X β + Eδ,
T T
where the regression matrix is structured as X = TT W , with W = w1⊤ , . . . , w⊤
n ,
T
and the vector of regression coefficients as β = aT α = β 1 , . . . , β p+3 , where
Axioms 2024, 13, 346 4 of 20
where
" #
yi θi − ψ ( θi )
Li (θ) = + c ( yi , ϕ ) .
ai ( ϕ )
To ensure the identifiability of the parameter vector α, we assume that f belongs to the
function space where all partial derivatives of total order m reside within the Hilbert space
L2 [ Ed ], the space of square-integrable functions on Euclidean d-space. Incorporating a
penalty function over f , we have that the penalized log-likelihood function can be expressed
as (see, for instance, Green and Silverman [7])
d
Lp (θ, λ f ) = L(θ) + λ∗f Jm (f) , (3)
For simplicity, in this work, we will consider the case in which d = 2, m = 2 and
g = g(t1 , t2 ). Consequently, the penalty function J22 ( f ) is expressed in the form
!2 !2 !2 )
∂2 f ∂2 f ∂2 f
Z Z
(
J22 ( f ) = +2 + dt1 dt2 ,
R2 ∂t21 ∂t1 ∂t2 ∂t22
and measures the rapid variation in f and the departure from local linearity. In this case, the
estimation of f leads to a natural thin-plate spline. According to Green and Silverman [7],
we may express the penalty functional as J22 ( f ) = δ T Eδ. Then, if we consider λ∗f = −λ f /2,
the penalized log-likelihood function (3) can be expressed as
λf T
Lp (θ, λ f ) = L(θ) − δ Eδ . (4)
2
Axioms 2024, 13, 346 5 of 20
The first term in the right-hand side of Equation (4) measures the goodness-of-fit,
while the second terms penalizes the roughness of f with a fixed parameter λ f . Selecting
appropriate parameters is crucial in the estimation process, as they determine the balance
between the goodness-of-fit and the smoothness (or regularity) of the estimated function. It
is important to emphasize that selecting appropriate parameters is crucial in the estimation
process because they control the trade-off between goodness-of-fit and the smoothness (or
regularity) of the estimated function. In this work, the smoothing parameter is selected
through the Akaike Criterion (AIC) based on the penalized log-likelihood function given in
Equation (3). More details of the method are given in Section 3.7.
β ∂Lp (θ, λ f )
Up (θ) = = X⊤ T
e (y − µ ) ,
∂β
h i
e = diag (ai (ϕ))−1 ∂µi V −1 is a
where X is an (n × 3 + p) matrix whose ith row is xi⊤ , T ∂η i i
(n × 3 + p) matrix, with Vi = V (µi ) = ∂2 ψ(θi )/∂θi2 the variance function, ai (ϕ) a function
of ϕ, y = (y1 , ..., yn )⊤ and µ = (µ1 , ..., µn )⊤ are (n × 1) vectors.
Conversely, to derive the score function for δ, we need to compute ∂Lpi (θ, λ f )/∂δℓ for
i ∈ {1, . . . , n} and ℓ ∈ {1, . . . , n}. Again, after some algebraic operations, the score function
for δ can be written in matrix as follows:
∂Lp (θ, λ f )
Upδ (θ) = = E⊤ T
e (y − µ) − λ f Eδ ,
∂δ
where the matrix E is defined in Section 2.1. Finally, the score function for ϕ is given by
∂Lp (θ, λ f ) n n
= − ∑ (ai (ϕ))−2 {yi θi − ψ(θi )} + ∑ c′ (yi , ϕ),
ϕ
Up (θ) =
∂ϕ i =1 i =1
with c′ (yi , ϕ) = ∂c(yi , ϕ)/∂ϕ, for i ∈ {1, . . . , n}. Thus, the vector of penalized score func-
tions of θ can be expressed compactly as
β
Up (θ)
Up (θ) = Upδ (θ) .
ϕ
Up (θ)
Note that if the model under consideration only considers the parametric component
in the linear predictor, that is, the nonparametric component is omitted, the expressions of
the remaining score functions are reduced to those obtained under the classical generalized
linear model.
Axioms 2024, 13, 346 6 of 20
ββ ∂2 Lp (θ, λ f )
Lp = = −X⊤ M∗ X ,
∂β∂α⊤
∂2 Lp (θ, λ f )
Lpδδ = ⊤
= −E⊤ M∗ E − λ f E and
∂δ∂δ
∂2 Lp (θ, λ f ) n n
∑ 2(ai (ϕ))−3 (yi θi − ψ(θi )) + ∑ c′′ (yi , ϕ)) ,
ϕϕ
Lp = =
∂ϕ2 i =1 i =1
where M∗ = diag1≤i≤n (ai (ϕ))−1 (∂µi /∂ηi )2 Vi−1 and c′′ (yi , ϕ) = ∂2 c(yi , ϕ)/∂ϕ2 , for 1 ≤
h i
i ≤ n. The elements outside the main diagonal of the Hessian matrix take the form
βδ ∂2 Lp (θ, λ f )
Lp = = −X⊤ M∗ E ,
∂β∂δ⊤
∂2 Lp (θ, λ f ) n
( )
βjϕ ∂µ
Lp = = − ∑ (ai (ϕ))−2 (yi − µi )Vi−1 i xij and
∂α j ∂ϕ i =1
∂ηi
∂2 Lp (θ, λ f ) n
( )
∂µ
= − ∑ (ai (ϕ)) −2
(yi − µi )Vi−1 i eiℓ
δ ϕ
Lpℓ = ,
∂δℓ ∂ϕ i =1
∂ηi
where xij denotes the (i, j)th element of the matrix X and eij denotes the (i, ℓ)th element of
the matrix E, for i ∈ {1, . . . , n}, j ∈ {1, . . . , p + 2} and ℓ ∈ {1, . . . , n}. Thus, the penalized
Hessian matrix can be represented as
ββ βδ βϕ
Lp Lp Lp
βδ⊤ δϕ
Lp (θ) = .
Lp Lpδδ Lp
βϕ⊤ δϕ⊤ ϕϕ
Lp Lp Lp
It is noteworthy that this matrix simplifies to the Hessian matrix used in generalized
linear models when the nonparametric component is absent. The primary application of
this matrix lies in the normal curvature, which is essential for developing the local influence
technique. This will be discussed in the following section.
∂2 Lp (θ, λ)
" #
Jp (θ) = −E .
∂θ∂θT
where
X⊤ M∗ X X⊤ M∗ E
βδ
Jp (θ) =
E M X E M∗ E + λ f E
⊤ ∗ ⊤
and
n n
∑ −2(ai (ϕ))−3 (µi θi − ψ(θi )) − ∑ E(c′′ (yi , ϕ)),
ϕϕ
Jp (θ) =
i =1 i =1
zed likelihood function given by Equation (4), we have to solve the equations
U1p (θ) = 0
U2p (θ) = 0.
These estimating equations are nonlinear, and necessitate an iterative approach for
their solution. An alternative frequently proposed in the context of generalized linear
models is the Fisher scoring algorithm (Nelder and Wedderburn, [33]), considering the fact
that in some situations the matrix −Lp (θ) can be non-positive definite. Then, the algorithm
for estimating θ1 , with ϕ fixed, is given by
old
θnew = θold −1 old 1
βδ
1 1 + (Jp ( θ) ) Up (θ),
Sold βnew
! old old
I β E S0 z
new = , (5)
Sold
δ X I δ Sold
1 z
old
old old
(X⊤ M∗ X ) −1 X⊤ M∗
ϑ=β
Sϑ =
(E⊤ M∗old E + λ E)−1 E⊤ M∗old
ϑ=δ.
f
It is crucial to note that the system of Equations (5) is consistent, and the back-fitting
algorithm converges to a solution for any initial values, provided that the weight matrix
Axioms 2024, 13, 346 8 of 20
δnew = Sold
δ z
old
.
Summarizing, each iteration of the Fisher scoring algorithm updates β and δ using
Equations (6) and (7), and evaluating matrices Sϑ and M∗ at the MPLE of θ obtained in the
previous iteration, that is, θold , until convergence is obtained. The joint iterative process
that resolves Up (θ) = 0 is presented below.
δ + TT b
bf = Eb a, ,
Cov
d (bθ) ≈ Jp−1 (θ) bθ .
d f (λ f ) = tr(E⊤ Sδ ),
which approximately represents the number of effective parameters used in the modeling
process to estimate the smooth surface f .
Regarding the selection of the smoothing parameter, we propose to use the Akaike
Information Criterion (AIC) (see, for instance, [24,35]), defined as
where L p (θ, λ f ) denote the penalized likelihood function evaluated at MPLE of θ, and p
denote the number of parameters in β. As usual, the idea is to select the value of λ f that
minimizes AIC (λ f ).
Axioms 2024, 13, 346 9 of 20
4. Local Influence
In this section, we extend the local influence technique to evaluate the sensitivity of
the MPLE under the TPS-GLM. Specifically, we present some theoretical aspects of the
method and, subsequently, we derive the normal curvature for three perturbation schemes.
2 [ L p (b θω , λ f )] ≥ 0,
θ, λ f ) − L p (b
where θ bω is the MPL estimate under L p (θ, λ f | ω). The measure LD(ω) is useful for
assessing the distance between θ
b and θ
bω . Cook [10] suggested examining the local behavior
of LD(ω) around ω0 . The procedure involves selecting a unit direction d ∈ Ω, with ∥d∥ = 1,
and then plotting LD(ω0 + ad) against a, where a ∈ R. This plot, called the lifted line, can be
characterized by considering the normal curvature Cd (θ) around a = 0. The suggestion is to
assume the direction d = dmax corresponding to the largest curvature Cdmax (θ). The index
plot of dmax can identify those cases that, under small perturbations, have a significant
potential influence on LD(ω). According to Cook [10], the normal curvature in the unit
direction d is expressed as
with
∂2 L p (θ, λ f ) ∂2 L p (θ, λ f ω)
Lp = and ∆p = .
∂θ∂θ⊤ θ=θ
b ∂θ∂ω⊤ θ=θ,
b ω=ω0
Note that −Lp represents the penalized observed information matrix evaluated at
b (see Section 3.2), and ∆p is the penalized perturbation matrix evaluated at θ
θ b and ω0 .
It is essential to highlight that Cd (θ) denotes the local influence on the estimate θ b after
perturbing the model or data. Escobar and Meeker [36] suggested examining the normal
curvature in the direction d = ei , where ei is an n × 1 vector with a one at the ith position
and zeros elsewhere. Consequently, the normal curvature, referred to as the total local
influence of the ith case, takes the form Cei (θ) = 2|cii | for i ∈ {1, . . . , n}, where cii is the ith
principal diagonal element of the matrix C = ∆p⊤ Lp−1 ∆p .
∆ p = E⊤ Dτ ,
b⊤
u
where the matrix Dτ = diag1≤i≤n τi and u = (u1 , . . . , un )⊤ , with τi = (ai (ϕ))−1 (yi −
′−1 ′−1 ′
∂ψ(h(ηi ))/∂h(ηi ))∂h(ηi )/∂ηi , h(ηi ) = ψ (ηi ), ψ (·) denotes the inverse function of ψ (·),
ui = −(ai (ϕ))−2 (yi h(ηi ) − ψ(h(ηi )) + c′ (yi , ϕ)ein
⊤ , and e a vector with 1 at the ith position
in
and zero elsewhere.
To perturb the response variable values, we consider yiω = yi + ωi for i ∈ {1, . . . , n},
where ω = (ω1 , . . . , ωn )⊤ is the vector of perturbations. The vector of no perturbation
is ω = (0, . . . , 0)⊤ . The perturbed penalized log-likelihood function is constructed from
Equation (3) with yi replaced by yiω , as follows:
n λf ⊤
L p (θ, λ g ω) = L(θ ω) − ∑ δ Eδ ,
i =1
2
where L(·) is defined in Equation (2) with yiω replacing yi . By differentiating L p (θ, λ | ω)
with respect to the elements of θ and ωi , and after some algebraic manipulation, we obtain:
X⊤ Dc
∆ p = E⊤ Dc ,
b⊤
d
where the matrix Dc = diag1≤i≤n ci and d = (d1 , . . . , dn )⊤ , with ci = ∂h(ηi )/∂ηi and
5. Applications
In this section, we show the applicability of the TPS-GLM and the local influence
method with two real data applications. The model estimation and diagnostics have been
implemented using MatLab 9.13.0 (R2022b) software [38] (the developed code is available
on request by the authors).
per plant, latitude, and longitude. Figure 1 shows the scatterplots between the response
variable Soya and the explanatory variables Height and Pods. In addition, the plot of the
response variable against the coordinates is shown. Clearly, from Figure 1a,b, it can be seen
that the explanatory variables Height (X2) and Pods (X3) are linearly related to the response
variable Soya (Z). The spatial effect given by the coordinates (X,Y) will be incorporated into
the model through a smooth surface.
(a) (b)
6 6
5.5 5.5
5 5
4.5 4.5
4 4
Z
Z
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
30 35 40 45 50 55 60 65 20 25 30 35 40 45 50 55 60 65
X2 X3
(c)
4
Z
1
2.37
2.368 7.2512
7.251
×10 5 2.366
7.2508 ×10 6
2.364
7.2506
X 2.362 7.2504 Y
Figure 1. Scatter plots: Soya versus Height (a), Soya versus Pods (b), and Soya versus coordinates in
UTM (c).
Table 1. MPLEs with their standard errors (within parenthesis), AIC and R2 (Adj).
Model
Parameters Gaussian Linear TPS-GLM
β0 1.1921 (0.672) 0.497 (0.751)
β1 0.0116 (0.0128) 0.032 (0.015)
β2 0.0339 (0.0079) 0.030 (0.008)
AIC 149.99 139.9992
R2 (Adj) 0.168 0.315
The value of the smoothing parameter λ f was selected in such a way that the AIC
criterion was minimized. The adjusted determinant coefficients (R2 (Adj)) are evaluated
for assessing the goodness-of-fit of the two models. It is important to note that our model
have a lower AIC and an higher R2 (Adj), compared to the multiple regression model that
does not consider the spatial effect. Figure 2a shows the QQ-plot for the standardized
residuals, whose adjustment to the Gaussian TPS-GLM seems to be reasonable. However,
the presence of some atypical observations is observed in one of the tails of the distribution.
Figure 2b displays the scatter plot between the observed values, Soya, and their estimated
values, Soya.
[ Considering the trend of the points, we conclude that the estimates are good,
since they generate consistent adjusted values of the response variable.
(a)
(b)
1.5
2.0
1.5
1
Empirical Quantiles
1.0
0.5
0.5
0.0
0
−0.5
-0.5
−1.0
−2 −1 0 1 2
-1
1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Norm Quantiles
Figure 2. QQ-plots of the standardized residuals for the TPS-GLM with its confidence interval
(dashed lines) (a) and scatterplot between Soya and Soya
[ (b), under model fitted to Wypych data.
(a) (b)
1 0.6
0.8 6
6
0.4
0.6
Bi
Bi
66
61
0.4 38
61 0.2
69 71
0.2
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Index Index
(a) (b)
0.6 0.4
0.4
80
80
Bi
Bi
32 0.2
3 42
88 75
0.2
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Index Index
We conclude that the maximum penalized likelihood estimates (MPLE) of the regres-
sion coefficients and the smooth surface exhibit sensitivity to modifications made to the
data or the model. This analysis has shown that observations identified as influential for the
parametric component do not necessarily exert influence on the non-parametric component,
and vice versa. For instance, under the case-weight perturbation scheme, observations
#69 and #71 were detected as influential for the parametric component, but not for the
nonparametric component.
Table 2. Relative changes (RCs) (in %) in the MPL estimates of β j in cases-weight perturbation
under the TPS-GLM. The last two columns indicate the AIC and R2 (Adj) of the model with dropped
observations.
On the other hand, Table 3 shows the relative changes in the vector of regression
coefficients under the additive perturbation scheme of the response variable. Here, we
consider the four most influential observations. As can be seen in the table, observations
#32, #69, #75 and #80 generate important relative changes in the estimates of the parametric
component of the model. However, no significant inferential changes were observed. About
the AIC and &R2 (Adj), there are not evident differences.
Table 3. Relative changes (RCs) (in %) in the MPL estimates of β j in response variable perturbation
under the TPS-GLM. The last two columns indicate the AIC and R2 (Adj) of the model with dropped
observations.
(a) (b)
4 4
3 3
log(O3)
log(O3)
2 2
1 1
0 0
100 400
80 400 300 400
60 300 200 300
200 200
40 100
100 100
Temp 20 0 Day Vis 0 0 Day
Figure5.5.3D
Figure 3Dplots
plotsbetween
betweenthe
theresponse
responsevariable
variableand
andthe
theexplanatory
explanatoryvariables:
variables:logarithm
logarithmof ofozone
ozone
data
data versus temperature and day variables (a), and logarithm of ozone data versus visibility andday
versus temperature and day variables (a), and logarithm of ozone data versus visibility and day
variables
variables(b).
(b).
Figure
Figure5a 5ashows
showsaacurved
curvedsurface
surfacein inthe
therelationship
relationshipbetween
betweenthethevariable
variablelog(O3)
log(O3)
and
and the joint effect of the explanatory variables Temp and Day, whereas therelationship
the joint effect of the explanatory variables Temp and Day, whereas the relationship
between
betweenlog(O3)
log(O3)and andthethejoint
jointeffect
effectof ofthe
theexplanatory
explanatoryvariable
variableVis
Visand
andDay
Dayshows
showsless
less
curve;
curve; see Figure 5b. This graphical analysis recommends the inclusion in the modelof
see Figure 5b. This graphical analysis recommends the inclusion in the model ofaa
nonparametric
nonparametriccomponent,
component,specifically
specificallyaasurface,
surface,that
thatcan
canexplain
explainthe
therelationship
relationshipbetween
between
log(O3)
log(O3)andandthe
thecombined
combinedeffecteffectof
ofthe
theexplanatory
explanatoryvariables
variablesTemp
TempandandDay.
Day.For
Forsimplicity,
simplicity,
in
inthis
thiswork,
work,we wewill
willinclude
includethetheeffect
effectofofthe
theexplanatory
explanatoryvariable
variableVis
Visininaalinear
linearform.
form.To To
begin
beginour
ouranalysis,
analysis,we wearearegoing
goingtotoconsider
considerthethefit
fitof
ofaaGLM
GLMassuming
assumingthatthatthe
thevariable
variableofof
interest O3 is Poisson distributed with mean and logarithmic link function.
interest O3 is Poisson distributed with mean µi i and logarithmic link function. Different
µ Different
structures
structuresof ofthe
thelinear
linearpredictor
predictorfor forthe
theexplanatory
explanatoryvariables
variablesVis,
Vis,Temp,
Temp,andandDay
Daywill
willbe
be
considered (see Table
considered (see Table 4). 4).
Table4.4. Four
Table Fourstructures
structuresof
ofthe
thelinear
linearpredictor
predictorfor
forthe
theexplanatory
explanatoryvariables
variablesVis,
Vis,Temp,
Temp,and
andDay,
Day,
assuming
assumingthatthatthe
theresponse
responsevariable
variablelog(O3)
log(O3)follows
followsaaPPOISSON
OISSON((µµi )i )distribution.
distribution.
Model
Model gg((µµi i))==log
log((µµi i))
II ββ00+ +ββ11Vis +ββ22Temp
Visi i+ +ββ33Day
Tempi i+ Dayi i
IIII ββ00+ + β11 Visi i + β22 Tempi i + ff((Day
β Vis + β Temp + Dayi i))
III
III ββ00+ +ββ11Vis +ββ22Temp
Visi i+ +ββ33Day
Tempi i+ Dayi i+ +ββ44Temp
Tempi i x×Day
Dayi i
IV
IV ββ00+ Vis + f ( Temp , Day )
+ β11 Visi i + f (Tempi i , Dayi i )
β
For
For Model
Model I,I, we we consider
consider onlyonly the
the individual
individual effects
effects of
of the
the explanatory
explanatory variables
variables
Vis,
Vis, Temp and Day. Note that all these effects were incorporated in a linear form
Temp and Day. Note that all these effects were incorporated in a linear form in in
the
the systematic component of the model. For Model II, we consider the inclusion of
systematic component of the model. For Model II, we consider the inclusion of aa
nonparametric
nonparametricterm termto tomodel
modelthe thenonlinear
nonlineareffects
effectsof
ofthe
theexplanatory
explanatoryvariable
variableDay;
Day;see see
Ibacache et al. [41]. Model III considers a systematic component that contains
Ibacache et al. [41]. Model III considers a systematic component that contains the individual the individual
effects
effectsof ofthetheexplanatory
explanatoryvariables
variablesVis,
Vis,Temp
TempandandDay,
Day,ininaddition
additionto tothe
theincorporation
incorporationof of
the
theinteraction
interactioneffect effectbetween
betweenthe theexplanatory
explanatoryvariables
variablesTemp
Tempand andDay.
Day.Here,
Here,thetheinteraction
interaction
effect
effectisisintroduced
introducedlinearly
linearlyin inthe
themodel.
model.Model
ModelIV IVcorresponds
correspondsto toaaTPS-GLM
TPS-GLMwhere wherethe the
joint
jointeffect
effectof ofthe
theTemp
Tempand andDayDayexplanatory
explanatoryvariables
variablesisisincluded
includednonlinearly
nonlinearlyby byusing
using
smooth
smoothsurface.
surface.Table
Table55contains
containsthetheML
MLandandMPLMPLestimates
estimatesassociated
associatedwithwiththetheparametric
parametric
component for the four fitted
component for the four fitted models. models.
ItItisisimportant
importantto tonote
notethat
thatboth
boththe
theindividual
individualandandinteraction
interactioneffects
effectsare
arestatistically
statistically
significant,
significant,as asthe
thecorresponding
correspondingp-values p-values(not
(notshown
shownhere)
here)are
areless
lessthan
than0.05.
0.05.Additionally,
Additionally,
the
theestimates
estimatesof of ββ00 are
aresimilar
similaracross
acrossthe
thefour
fourmodels,
models,whereas
whereasthe theestimates
estimatesof of ββ11 vary
vary
considerably,
considerably, particularly
particularly in in Model
Model IV.IV. Concerning
Concerning the the associated
associated standard
standard errors,
errors, all
all
Axioms 2024, 13, 346 16 of 20
the estimators exhibit small values. The last two rows of Table 5 display the Akaike
Information Criterion (AIC) and R2 values, respectively. It is evident that the TPS-GLM,
with AIC(λ f ) = 1777.705, provides the best fit to the Ozone data, followed by Model II
with an AIC of 1806.837. This is corroborated by the QQ-plots in Figure 6, specifically
Figure 6b,d. Furthermore, the R2 value associated with our model is higher than those
of Models I, II, and III. The smoothing parameter λ f was chosen such that the effective
degrees of freedom were approximately 7. Figure 7 illustrates the 3D plot of the adjusted
log(O3) against the explanatory variables Temp and Day, showing an adequate fit of the
TPS-GLM.
Table 5. AIC, R2 (Adj), ML and MPL estimates for all four fitted models to the Ozone data.
Parameters I II III IV
β0 0.577 (0.104) 0.478 (0.142) 0.787 (0.198) 2.507 (0.040)
β1 −0.002 (0.0003) −0.002 (0.0003) −0.002 (0.0003) −0.002 (0.0003)
β2 0.035 (0.001) 0.033 (0.002) 0.032 (0.003) -
β3 −0.001 (0.002) - −0.002 (0.001) -
β4 - - 0.00002 (0.00002) -
AIC 1887.312 1806.837 1887.757 1789.92
R2 (Adj) 0.673 0.715 0.670 0.728
(a) (b)
4
4
3
Empirical Quantiles
Empirical Quantiles
2
2
1
0
0
−1
−2
−2
−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
(c) (d)
3
4
2
Empirical Quantiles
Empirical Quantiles
2
1
0
0
−1
−2
−2
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Figure 6. QQ-plot of the standardized residuals for the models described in Table 5: Model I (a),
Model II (b), Model III (c) and Model IV (d).
Axioms 2024, 13, 346 17 of 20
(b)
log(µ ) 1
–1
100
80 400
60 300
200
40
100
Temp 20 0 Day
(a) (b)
0.6 0.8
0.6 220
167
0.4 167 220
Bi
Bi
0.4
168 177
168177
0.2
0.2
0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Index Index
From the local influence analysis, we conclude that the MPLE of the regression co-
efficients and the smooth surface are sensitive to perturbations in the data or the model.
Furthermore, this analysis revealed that observations identified as influential for the para-
metric component are also influential for the nonparametric component, and vice versa. For
example, under the case-weight perturbation scheme, observations #167, #220, #168, and
#177 were found to be influential for both the parametric and nonparametric components.
Axioms 2024, 13, 346 18 of 20
(a) (b)
0.4 0.4
219
175
220221
125 167
218
Bi
Bi
0.2 219 0.2
125 175
218221
0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Index Index
Table 6. Relative changes (RCs) (in %) in the MPL estimates of β j under the TPS-GLM. The last two
columns indicate the AIC and R2 (Adj) of the model with dropped observations.
it was observed that the adjusted values of the response variable were consistent. In ad-
dition, it was observed that our model presented a better fit to model the soybean yield
and ozone concentration data, compared to some classic parametric and semiparametric
models, respectively. In our analysis, it was found that those observations detected as
potentially influential generated important changes in the estimates, but not significant
inferential changes. In addition, our study confirms the need to develop the local influence
method to evaluate the sensitivity of maximum penalized likelihood estimators and thus
determine those observations that can exert an excessive influence on both the parametric
and non-parametric components, or on both.
As future work, we propose to incorporate a correlation component in the model
and extend the local influence technique to other perturbation schemes, mainly on the
non-parametric component of the model.
Author Contributions: Conceptualization, G.I.-P., P.P., M.A.U.-O. and O.N.; methodology, G.I.-P. and
O.N.; software, G.I.-P. and P.P.; validation, G.I.-P. and P.P.; formal analysis, P.P.; investigation, G.I.-P.,
P.P., O.N. and M.A.U.-O.; data curation, P.P. and M.A.U.-O.; writing—original draft preparation,
G.I.-P. and P.P.; writing—review and editing, O.N. and M.A.U.-O.; supervision, G.I.-P. and O.N.;
project administration, O.N.; funding acquisition, O.N. All authors have read and agreed to the
published version of the manuscript.
Funding: This research was funded by ANID-Fondecyt grant number 1201478.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data cannot be shared openly but are available on request from authors.
Acknowledgments: The third author acknowledges the support of ANID-Fondecyt 1201478.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1989.
2. Duchon, J. Interpolation des fonctions de deux variables suivant le principe de la flexion des plaques minces. RAIRO Anal. Numér.
1976, 10, 5–12. [CrossRef]
3. Duchon, J. Splines minimizing rotation-invariant semi-norms in Sobolev spaces. Lect. Notes Math. 1977, 57, 85–100.
4. Bookstein, F.L. Principal warps: Thin-plate splines and decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell.
1989, 11, 567–585. [CrossRef]
5. Chen, C.; Li, Y.; Yan, C.; Dai, H.; Liu, G. A Thin Plate Spline-Based Feature-Preserving Method for Reducing Elevation Points
Derived from LiDAR. Remote Sens. 2015, 7, 11344–11371. [CrossRef]
6. Wahba, G. Spline Models for Observational Data; SIAM: Philadelphia, PA, USA, 1990.
7. Green, P.J.; Silverman, B.W. Nonparametric Regression and Generalized Linear Models; Chapman and Hall: Boca Raton, FL, USA, 1994.
8. Wood, S.N. Thin plate regression splines. J. R. Stat. Soc. Ser. B (Methodol.) 2003, 65, 95–114. [CrossRef]
9. Moraga, M.S.; Ibacache-Pulgar, G.; Nicolis, O. On an elliptical thin-plate spline partially varying-coefficient model. Chil. J. Stat.
2021, 12, 205–228.
10. Cook, R.D. Assessment of Local Influence. J. R. Stat. Soc. Ser. B (Methodol.) 1986, 48, 133–169. [CrossRef]
11. Thomas, W.; Cook, R.D. Assessing influence on regression coefficients in generalized linear models. Biometrika 1989, 76, 741–749.
[CrossRef]
12. Ouwens, M.N.M.; Tan, F.E.S.; Berger, M.P.F. Local influence to detect influential data structures for generalized linear mixed
models. Biometrics 2001, 57, 1166–1172. [CrossRef]
13. Zhu, H.; Lee, S. Local influence for incomplete-data models. J. R. Stat. Soc. Ser. B 2001, 63, 111–126. [CrossRef]
14. Zhu, H.; Lee, S. Local influence for generalized linear mixed models. Can. J. Stat. 2003, 31, 293–309. [CrossRef]
15. Espinheira, P.L.; Ferrari, P.L.; Cribari-Neto, F. Influence diagnostics in beta regression. Comput. Stat. Data Anal. 2008, 52, 4417–4431.
[CrossRef]
16. Rocha, A.; Simas, A. Influence diagnostics in a general class of beta regression models. TEST 2001, 20, 95–119. [CrossRef]
17. Ferrari, S.; Spinheira, P.; Cribari-Neto, F. Diagnostic tools in beta regression with varying dispersion. Stat. Neerl. 2011, 65, 337–351.
[CrossRef]
18. Ferreira, C.S.; Paula, G.A. Estimation and diagnostic for skew-normal partially linear models. J. Appl. Stat. 2017, 44, 3033–3053.
[CrossRef]
19. Emami, H. Local influence for Liu estimators in semiparametric linear models. Stat. Pap. 2018, 59, 529–544. [CrossRef]
Axioms 2024, 13, 346 20 of 20
20. Liu, Y.; Mao, G.; Leiva, V.; Liu, S.; Tapia, A. Diagnostic Analytics for an Autoregressive Model under the Skew-Normal Distribution.
Mathematics 2020, 8, 693. [CrossRef]
21. Thomas, W. Influence diagnostics for the cross-validated smoothing parameter in spline smoothing. J. Am. Stat. Assoc. 1991,
9, 693–698. [CrossRef]
22. Ibacache, G.; Paula, G.A. Local Influence for student-t partially linear models. Comput. Stat. Data Anal. 2011, 55, 1462–1478.
[CrossRef]
23. Ibacache-Pulgar, G.; Paula, G.A.; Galea, M. Influence diagnostics for elliptical semiparametric mixed models. Stat. Model. 2012,
12, 165–193. [CrossRef]
24. Ibacache, G.; Paula, G.A.; Cysneiros, F. Semiparametric additive models under symmetric distributions. Test 2013, 22, 103–121.
[CrossRef]
25. Zhang, J.; Zhang, X.; Ma, H.; Zhiya, C. Local influence analysis of varying coefficient linear model. J. Interdiscip. Math. 2015,
3, 293–306. [CrossRef]
26. Ibacache-Pulgar, G.; Reyes, S. Local influence for elliptical partially varying coefficient model. Stat. Model. 2018, 18, 149–174.
[CrossRef]
27. Ibacache-Pulgar, G.; Figueroa-Zuñiga, J.; Marchant, C. Semiparametric additive beta regression models: Inference and local
influence diagnostics. REVSTAT-Stat. J. 2019, 19, 255–274.
28. Cavieres, J.; Ibacache-Pulgar, G.; Contreras-Reyes, J. Thin plate spline model under skew-normal random errors: Estimation and
diagnostic analysis for spatial data. J. Stat. Comput. Simul. 2023, 93, 25–45. [CrossRef]
29. Jeldes, N.; Ibacache-Pulgar, G.; Marchant, C.; López-Gonzales, J.L. Modeling Air Pollution Using Partially Varying Coefficient
Models with Heavy Tails. Mathematics 2022, 10, 3677. [CrossRef]
30. Saavedra-Nievas, J.C.; Nicolis, O.; Galea, M.; Ibacache-Pulgar, G. Influence diagnostics in Gaussian spatial—Temporal linear
models with separable covariance. Environ. Ecol. Stat. 2023, 30, 131–155. [CrossRef]
31. Sánchez, L.; Ibacache-Pulgar, G.; Marchant, C.; Riquelme, M. Modeling Environmental Pollution Using Varying-Coefficients
Quantile Regression Models under Log-Symmetric Distributions. Axioms 2023, 12, 976. [CrossRef]
32. Green, P.J. Penalized Likelihood for General Semi-Parametric Regression Models. Int. Stat. Rev. 1987, 55, 245–259. [CrossRef]
33. Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. Ser. A (Gen.) 1972, 135, 370–384. [CrossRef]
34. Wood, S.N. Generalized Additive Models: An Introduction with R, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017.
35. Akaike, H. Information theory as an extension of the maximum likelihood principle. In Proceedings of the Second International
Symposium on Information Theory; Petrov, B.N., Csaki, F., Eds.; Academiai Kiado: Budapest, Hungary, 1973.
36. Escobar, L.A.; Meeker, W.Q. Assessing Influence in Regression Analysis with Censored Data. Biometrics 1992, 48, 507–528.
[CrossRef] [PubMed]
37. Billor, N.; Loynes, R.M. Local influence: A new approach. Comm. Statist. Theory Meth. 1993, 22, 1595–1611. [CrossRef]
38. MathWorks Inc. MATLAB Version: 9.13.0 (R2022b); The MathWorks Inc.: Natick, MA, USA, 2022. Available online: https:
//www.mathworks.com (accessed on 10 October 2022).
39. Uribe-Opazo, M.A.; Borssoi, J.A.; Galea, M. Influence diagnostics in Gaussian spatial linear models. J. Appl. Stat. 2012, 3, 615–630.
[CrossRef]
40. Breiman, L.; Friedman, J.H. Estimating optimal transformations for multiple regression and correlation. J. Am. Stat. Assoc. 1985,
80, 580–598. [CrossRef]
41. Ibacache-Pulgar, G.; Lira, V.; Villegas, C. Assessing Influence on Partially Varying-coefficient Generalized Linear Model. REVSTAT-
Stat. J. 2022. Available online: https://revstat.ine.pt/index.php/REVSTAT/article/view/507 (accessed on 10 October 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.