International Journal of Statistics and Probability; Vol. 8, No. 5; September 2019
ISSN 1927-7032 E-ISSN 1927-7040
Published by Canadian Center of Science and Education
On Generalized Gamma Distribution and Its Application to
Survival Data
Kiche J1 , Oscar Ngesa2 & George Orwa3
1
Mathematics Department, Pan African University, Institute of Basic Sciences, Technology, and Innovation, Kenya
2
Mathematics and Informatics Department, Taita-Taveta University, Kenya
3
Statistics and Actuarial Sciences Department, Jomo Kenyatta University of Agriculture and Technology, Kenya
Correspondence: Kiche J, Mathematics Department, Pan African University, Institute of Basic Sciences, Technology, and
Innovation, Kenya. E-mail: kicheopiyo07@gmail.com
Received: July 7, 2019
doi:10.5539/ijsp.v8n5p85
Accepted: August 27, 2019
Online Published: August 30, 2019
URL: https://doi.org/10.5539/ijsp.v8n5p85
Abstract
The generalized gamma distribution is a continuous probability distribution with three parameters. It is a generalization
of the two-parameter gamma distribution. Since many distributions commonly used for parametric models in survival
analysis (such as the Exponential distribution , the Weibull distribution and the Gamma distribution) are special cases
of the generalized gamma, it is sometimes used to determine which parametric model is appropriate for a given set of
data. Generalized gamma distribution is one of the distributions used in frailty modeling. In this study , it is shown that
generalized gamma distribution has three sub-families and its application to the analysis of a survival data has also been
explored. The parametric modeling approach has been carried out to find the expected results.
Keywords: generalized gamma, sub-families, Kullback-Leibler divergence
1. Introduction
The early generalization of gamma distribution can be traced back to Amoroso(1925)who discussed a generalized gamma distribution and applied it to fit income rates. Johnson et al. gave a four parameter generalized gamma distribution
which reduces to the generalized gamma distribution defined by Stacy (1962) when the location parameter is set to zero.The generalized gamma defined by Stacy (1962) is a three-parameter exponentiated gamma distribution. Mudholkar
and Srivastava (1993) introduced the exponentiated method to derive a distribution. Agarwal and Al-Saleh (2001) applied
generalized gamma to study hazard rates. Balakrishnan and Peng (2006) applied this distribution to develop generalized gamma frailty model.Nadarajah and Gupta (2007) proposed another type of generalized gamma distribution with
application to fitting drought data. Cordeiro et al. (2012) derived another generalization of Stacy’s generalized gamma
distribution using exponentiated method, and applied it to life time and survival analysis.
The generalized gamma distribution presents a flexible family in the varieties of shapes and hazard functions for modeling
duration. Distributions that are used in duration analysis in economics include exponential, log-normal, gamma, and
Weibull . The generalized gamma family, which encompasses exponential, gamma, and Weibull as subfamilies, and lognormal as a limiting distribution, has been used in economics by Jaggia, Yamaguchi, and Allenby et al . Some authors have
argued that the flexibility of generalized gamma makes it suitable for duration analysis, while others have advocated use
of simpler models because of estimation difficulties caused by the complexity of generalized gamma parameter structure.
Obviously, there would be no need to endure the costs associated with the application of a complex generalized gamma
model if the data do not discriminate between the generalized gamma and members of its subfamilies, or if the fit of a
simpler model to the data is as good as that for the complex generalized gamma. Hager and Bain inhibited applications
of the generalized gamma model.
Prentice resolved the convergence problem using a nonlinear transformation of generalized gamma model. Maximumlikelihood estimation of the parameters and quasi maximum likelihood estimators for its subfamily (two-parameter gamma
distribution) can be found. Hwang, T. et al introduced a new moment estimation of parameters of the generalized gamma
distribution using it’s characterization. In information theory, thus far a maximum entropy (ME) derivation of generalized
gamma is found in Kapur, where it is referred to as generalized Weibull distribution, and the entropy of GG has appeared
in the context of flexible families of distributions. Some concepts of this family in information theory has introduced by
85
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
Dadpay et al .
2. Parametric Distributions and Their Properties
2.1 Log-Normal Distribution
A log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally
distributed. If the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Likewise, if
Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random
variable which is log-normally distributed takes only positive real values. A log-normal process is the statistical realization
of the multiplicative product of many independent random variables, each of which is positive. This is justified by
considering the central limit theorem in the log domain. The log-normal distribution is the maximum entropy probability
distribution for a random variateX for which the mean and variance of ln(X) are specified.
A positive random variable X is log-normally distributed if the logarithm of X is normally distributed, ln(X) ∼ N(µ, σ2 )
The probability density function of the log-normal distribution is given by:
2
fX (x) =
(lnx−µ)
1 1
)
√ exp(−
x σ 2π
2σ2
(1)
The cumulative distribution is given as
(lnx) − µ
)
σ
(2)
lnx − µ
1
[1 + er f ( √ )]
2
σ 2
(3)
F X (x) = Φ(
which may also be expressed as :
where erf is the error function.
Approximating formular for the characteristic function φ(t) can be given as
2 µ
2
)+2ω(−itσ
exp( −ω (−itσ e2σ
2
φ(t) ≈
√
1 + ω(−itσ2 eµ )
2 µ
e )
(4)
where ω is the Lambert W function.
For a log-normal random variable, the partial expectation is given by
µ + σ2 − lnk
)
σ
(5)
2σ2 er f −1 (2F − 1))
(6)
1
2
g(k) = eµ+ 2 σ Φ(
The quantile for the log-normal distribution is given by
exp(µ +
√
while the variance is given as
The skewness is given by the formular
[exp(σ2 ) − 1]exp(2µ + σ2 )
(7)
√
2
(eσ + 2) eσ2 − 1
(8)
exp(4σ2 ) + 2exp(3σ2 ) + 3exp(2σ2 ) − 6
(9)
The Ex.kurtosis is then denoted by
The Support is over x ∈ (0, +∞) while the mean is exp(µ +
1 √
log2 (σeµ+ 2 2π)
86
σ2
2 ),
the mode is exp(µ − σ2 ) and the Entropy is given as
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
2.2 Weibull Distribution
The Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi
Weibull, who described it in detail in 1951, although it was first identified by Frechet (1927) and first applied by Rosin
and Rammler (1933) to describe a particle size distribution.
The Weibull distribution is one of the most widely used lifetime distributions in reliability engineering. It is a versatile
distribution that can take on the characteristics of other types of distributions, based on the value of the shape parameter.
The probability density function of a Weibull random variable is
ν x
ν
,x≥0
(10)
f (x, ξ, ν) = ( )ν−1 e−(x/ξ)
ξ ξ
where ν > 0 is the shape parameter and ξ > 0 is the scale parameter of the distribution.
The cumulative distribution function for the Weibull distribution is
ν
F(x, ξ, ν) = 1 − e−(x/ξ)
(11)
The failure rate h (or hazard function) is given by
ν x
h(x, ξ, ν) = ( )ν−1
ξ ξ
(12)
The quantile (inverse cumulative distribution) function for the Weibull distribution is
Q(p; ξ, ν) = ξ(−ln(1 − p))1/ν
The moment generating function of the logarithm of a Weibull distributed random variable is given by
t
E[etlogX ] = ξt Γ( + 1)
ν
The mean and variance of a Weibull random variable can be expressed respectively as
and
(13)
(14)
1
E(X) = ξΓ(1 + )
ν
(15)
2
1
Var(X) = ξ2 [Γ(1 + ) − (Γ(1 + ))2 ]
ν
ν
(16)
The skewness is given by
3
Γ(1 + )ξ3 − 3µσ2 − µ3
ν
γ1 =
σ3
where the mean is denoted by µ and the standard deviation is denoted by σ. The excess kurtosis is given by
γ2 =
(17)
−6Γ41 + 12Γ21 Γ2 − 3Γ22 − 4Γ1 Γ3 + Γ4
(18)
ξ
1
H(ξ, ν) = γ(1 − ) + ln( ) + 1
ν
ν
(19)
[Γ2 − Γ21 ]2
The information entropy is given by
where γ is the EulerMascheroni constant.
The Kullback-Leibler divergence for Weibull distribution is given by
DKL (Weib1 ∥ Weib2 ) = log
ν1
ξ1
ν2
ν2
γ
− log ν2 + (ν1 − ν2 )[logξ1 − ] + ( )ν2 Γ( + 1) − 1
ν1
ξ2
ν1
ξ1ν1
ξ2
The variances and covariances of ν̂ and ξ̂ are estimated from the inverse local Fisher matrix as
−1
2
∂2 Λ
∂ Λ
−
−
ˆ
ˆ
Var(ν̂)
Cov(ν̂,
ξ̂) ∂ν2
∂ν∂ξ
.
= .
2
2
ˆ
ˆ ξ̂)
∂ Λ
Cov(ν̂,
ξ̂) Var(
∂ Λ
−
− 2
∂ν∂ξ
∂ξ ν=ν̂,ξ=ξ̂
87
(20)
(21)
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
2.3 Exponential Distribution
The exponential model, with only one unknown parameter, is the simplest of all life distribution models. The exponential
distribution is one of the widely used continuous distributions. It is often used to model the time elapsed between events.
That is the exponential distribution is often concerned with the amount of time until some specific event occurs. For
example, the amount of time (beginning now) until an earthquake occurs has an exponential distribution. Other examples
include the length, in minutes, of long distance business telephone calls, and the amount of time, in months, a car battery
lasts. An interesting property of the exponential distribution is that it can be viewed as a continuous analogue of the
geometric distribution.
The probability density function of exponential distribution is given by
f (t, θ) = θe−θt ,
t≥0
(22)
The cumulative distribution function is thus denoted by
F(t) = 1 − e−θt
(23)
The reliability and failure rate of the exponential distribution are given respectively as
R(t) = e−θt and h(t) = θ
while the mean and variance are given by mean = 1/θ and variance = 1/θ2 .
Note that the failure rate reduces to the constant θ for any time. The exponential distribution is the only distribution to
have a constant failure rate. Also, another name for the exponential mean is the Mean Time To Failure or MTTF and we
have MTTF = 1/θ.
Kullback-Leibler divergence is given by
DKL (exp0 ∥ exp1 ) = log(θ0 ) − log(θ) +
θ
−1
θ0
(24)
The most important property is that the exponential distribution is memoryless.Specifically, the memoryless property says
that an exponentially distributed random variable T obeys the relation:
Pr(T > a + t/T > a) = Pr(T > t),
∀a, t ≥ 0
(25)
Proof
Pr(T > a + t/T > a) =
Pr(T > a + t ∩ T > a)
Pr(T > a)
Pr(T > a + t)
=
Pr(T > a)
−θ(a+t)
e
= −θa
e
= e−θt
= Pr(T > t)
(26)
(27)
(28)
(29)
(30)
hence the proof.
The memoryless property says that knowledge of what has occurred in the past has no effect on future probabilities.
2.4 Gamma Distribution
The gamma distribution is another widely used distribution. Its importance is largely due to its relation to exponential
and normal distributions. The gamma distribution is a two-parameter family of continuous probability distributions. The
exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution.
The gamma distribution is used to model errors in multi-level Poisson regression models, because the combination of
the Poisson distribution and a gamma distribution is a negative binomial distribution. The gamma distribution is widely
used as a conjugate prior in Bayesian statistics. It is the conjugate prior for the precision (i.e. inverse of the variance)
of a normal distribution. It is also the conjugate prior for the exponential distribution. The gamma distribution can be
88
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
parameterized in terms of a shape parameter φ = κ and an inverse scale parameter β = 1/θ, called a rate parameter. A
random variable X that is gamma-distributed with shape φ and rate β is denoted by
X ∼ Γ(φ, β) ≡ Gamma(φ, β)
The corresponding probability density function in the shape-rate parametrization is
f (x; φ, β) =
βφ xφ−1 e−βx
Γ(φ)
for x > 0 φ, β > 0,
(31)
xφ−1 e−x/θ
f (x; φ, θ) = φ
θ Γ(φ)
for x > 0,
φ, θ > 0.
where Γ(φ) is the gamma function.
Both parameterizations are common because either can be more convenient depending on the situation.
The cumulative distribution function is the regularized gamma function:
∫ x
γ(φ, βx)
f (u; φ, β) du =
F(x; φ, β) =
Γ(φ)
0
(32)
where γ(φ, βx) is the lower incomplete gamma function.
If φ is a positive integer (i.e., the distribution is an Erlang distribution), the cumulative distribution function has the
following series expansion:
φ−1
∞
∑
∑
(βx)i −βx
(βx)i
F(x; φ, β) = 1 −
e = e−βx
.
(33)
i!
i!
i=φ
i=0
The KullbackLeibler divergence (KL-divergence), of Γ(φ p , β p (”true” distribution) from Γ(φq , βq ) (”approximating” distribution) is given by
DKL (φ p , β p ; φq , βq ) = (φ p − φq )ψ(φ p ) − log Γ(φ p ) + log Γ(φq )
(34)
βq − β p
.
+ φq (log β p − log βq ) + φ p
βp
Written using the k, parameterization, the KL-divergence of Γ(k p , θ p ) from Γ(kq , θq ) is given by
DKL (k p , θ p ; kq , θq ) = (k p − kq )ψ(k p ) − log Γ(k p ) + log Γ(kq )
θ p − θq
+ kq (log θq − log θ p ) + k p
.
θq
(35)
The Laplace transform of the gamma probability density function is
F(s) = (1 + θs)−k =
βφ
.
(s + β)φ
(36)
Parameterization 1: If Xk ∼ Γ(φk , θk ) are independent, then,
φ2 θ2 X1
φ1 θ1 X2
∼ F(2φ1 , 2φ2 ), or equivalently,
X1
X2
∼ β (φ1 , φ2 , 1, θθ12 ).
Parameterization 2: If Xk ∼ Γ(φk , βk ) are independent, then
φ2 β1 X1
φ1 β2 X2
∼ F(2φ1 , 2φ2 ), or equivalently,
X1
X2
∼ β (φ1 , φ2 , 1, ββ12 ).
′
′
If X ∼ Γ(φ, θ) and Y ∼ Γ(β, θ) are independently distributed, then X/(X + Y) has a beta distribution with parameters
φ and β, and X/(X + Y) is independent of X + Y, which is Γ(φ + β, θ)-distributed.
3. Methods
3.1 Generalized Gamma Distribution
The generalized gamma has three parameters given in this case as : κ > 0, ϕ > 0, and β > 0.
For non-negative x, the probability density function of the generalized gamma is:
β
(β/κϕ )xϕ−1 e−(x/κ)
f (x; κ, ϕ, β) =
Γ(ϕ/β)
where Γ(.) denotes the gamma function.
89
(37)
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
3.2 The Cumulative Distribution Function
The cumulative distribution function (CDF) of a random variable X denoted by F(x), is defined as F(x) = Pr(X ≤ x).
Using identity for the probability of disjoint events, if X is a discrete random variable,then
n
∑
F(x) =
Pr(X = xk )
(38)
k=1
where xn gives the largest possible value of X that is less than or equal to x.
The cumulative distribution function for a random variable at x gives the probability that the random variable X is less
than or equal to that number x. Note that in the formula for CDFs of discrete random variables, we always have n ≤ N,
where N is the number of possible outcomes of X.
This function, CDF(x), simply tells us the odds of measuring any value up to and including x.
The cumulative distribution function (”c.d.f.”) of a continuous random variable X is defined as:
∫ x
f (t)dt f or − ∞ < x < ∞.
F(x) =
(39)
−∞
3.3 Properties of the CDF
• 0 ≤ F(x) ≤ 1, ∀x
• lim x→−∞ F(x) = 0
• lim x→∞ F(x) = 1
• F(x) is a non-decreasing function of x.
A function f (x) is said to be nondecreasing if f (x1 ) ≤ f (x2 ) whenever x1 < x2 .
The cumulative distribution function of generalized gamma distribution is given by:
F(x; κ, ϕ, β) =
γ(ϕ/β, (x/κ)β )
Γ(ϕ/β)
(40)
where γ(.) denotes the lower incomplete gamma function.
The GG is a three-parameter (β, σ > θ, k) family whose survival function is given as
S GG (t) = 1 − Γ[k−2 exp(k[log(t) − β]/σ); k−2 ].
(41)
S GG (t) = 1 − Γ[k−2 (e−β t)k/σ ; k−2 ].
(42)
3.4 Kullback-Leibler Divergence
The Kullback-Leibler divergence (also called relative entropy) is a measure of how one probability distribution diverges
from a second, expected probability distribution. Applications include characterizing the relative (Shannon) entropy in
information systems, randomness in continuous time-series, and information gain when comparing statistical models of
inference. In contrast to variation of information, it is a distribution-wise asymmetric measure and thus does not qualify
as a statistical metric of spread.
In the simple case, a Kullback-Leibler divergence of 0 indicates that we can expect similar, if not the same, behavior of
two different distributions, while a Kullback-Leibler divergence of 1 indicates that the two distributions behave in such a
different manner that the expectation given the first distribution approaches zero.
If f1 and f2 are the probability density functions of two generalized gamma distributions, then their Kullback-Leibler
divergence is given by;
DKL ( f1 ∥ f2 ) =
= ln
β1 κ2ϕ2 Γ(ϕ2 /β2 )
β2 κ1ϕ1 Γ(ϕ1 /β1 )
+[
∫
∞
f1 (x; κ1 , ϕ1 , β1 )ln
0
f1 (x; κ1 , ϕ1 , β1 )
dx
f2 (x; κ2 , ϕ2 , β2 )
Γ((ϕ1 + ϕ2 )/β1 ) κ1 β2 ϕ1
ψ(ϕ1 /β1 )
+ lnκ1 ](ϕ1 − ϕ2 ) +
( ) −
β1
Γ(ϕ1 /β1 )
κ2
β1
90
(43)
(44)
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
3.5 Moments of Generalized Gamma Distribution
If X has a generalized gamma distribution as above, then
Γ((ϕ + r)/β)
Γ(ϕ/β)
E(x)r = κr
(45)
The mean of a generalized gamma distribution is given by
κ
Γ((ϕ + 1)/β)
Γ(ϕ/β)
(46)
The mode of a generalized gamma distribution is given by
κ(
ϕ − 1 1/β
) f orϕ > 1, otherwise, 0
β
(47)
The variance of generalized gamma distribution is given by
κ2 (
Γ((ϕ + 2)/β)
Γ((ϕ + 1)/β) 2
−(
))
Γ(ϕ/β)
Γ(ϕ/β)
(48)
1 ϕ
ϕ
κΓ(ϕ/β) ϕ
+ + ( − )ψ( )
β
β
β β
β
(49)
While the entropy is given by
ln
Skewness is given by
Γ(ϕ + β3 )Γ(ϕ)2 − 3Γ(ϕ + 2k )Γ(ϕ + β1 )Γ(ϕ) + 2Γ(ϕ + β1 )3
(Γ(ϕ + β2 )Γ(ϕ) − Γ(ϕ + β1 )2 )3/2
(50)
3.6 Properties of Generalized Gamma Distribution
The generalized gamma family is flexible in that it includes several well-known models as subfamilies. The subfamilies
of generalized gamma thus far considered in the literature are exponential (ϕ = β = 1), gamma for β = 1, and Weibull for
ϕ = 1 . The log-normal distribution is also obtained as a limiting distribution when ϕ −→ ∞. An important property of
generalized gamma family for information analysis is that the family is closed under power transformation.
That is, if
X ∼ GG(α, τ, λ) then
τ
Y = X s ∼ GG(α, , λ s ), s > 0
s
(51)
It also has the property that
Z = ηX has GG(ηα, τ, λ) distribution.
We use the transformed probability density function of the generalized gamma distribution to prove that generalized
gamma is flexible with the subfamilies exponential, Weibull and Gamma distributions.
That is we let the pdf of the generalized gamma distribution (GG(α, τ, λ)) to be given by
f (α, τ, λ) =
τ
x
τ
( )ατ−1 e−(x/λ)
λΓ(α) λ
then if α = τ = 1 the subfamily is the exponential distribution
Proof I
From the probability density function of generalized gamma distribution given by
91
(52)
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
f (α, τ, λ) =
Vol. 8, No. 5; 2019
x
τ
τ
( )ατ−1 e−(x/λ)
λΓ(α) λ
(53)
if we replace α = τ = 1 , then we have that
1
x
1
( )1−1 e−(x/λ)
λΓ(1) λ
1
= e−x/λ
λ
=
(54)
(55)
which is probability density form of an exponential distribution with parameter λ hence the proof.
Proof II
Similarly, if we now let τ = 1 then, from the probability density function of generalized gamma distribution given by:
f (α, τ, λ) =
τ
x
τ
( )ατ−1 e−(x/λ)
λΓ(α) λ
(56)
we have that
=
x
1
( )α−1 e−(x/λ)
λτ(α) λ
(57)
simplifying gives;
1 1
1
xα−1 ( )α ( )−1 e−(x/λ)
λΓ(α)
λ λ
( 1 )α xα−1 e−(x/λ)
= λ
Γ(α)
=
(58)
(59)
which is a probability density function of a gamma distribution with the parameters α and λ
Proof III
In the last proof ,if we now let α = 1
then from the probability density function of generalized gamma distribution given by
f (α, τ, λ) =
x
τ
τ
( )ατ−1 e−(x/λ)
λΓ(α) λ
(60)
we show that the generalized gamma distribution reduces to Weibull distribution as a sub-family.
That is we have
=
τ
x
τ
( )τ−1 e−(x/λ)
λΓ(1) λ
(61)
This reduces to
τ τ−1 1 τ 1 −1 −(x/λ)τ
x ( )( ) e
λ
λ λ
1 τ τ−1 −(x/λ)τ
= τ( ) x e
λ
τ x τ−1 −(x/λ)τ
= ( ) e
λ λ
=
which is a probability density function of a Weibull distribution with the parameters λ and τ hence the proof.
92
(62)
(63)
(64)
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
3.7 Parameter Estimation Using Maximum Likelihood Estimation
Maximum likelihood estimation (MLE) is a method of estimating the parameters of a distribution by maximizing a
likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the
parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum
likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
If the likelihood function is differentiable, the derivative test for determining maxima can be applied. In some cases, the
first-order conditions of the likelihood function can be solved explicitly; for instance, the ordinary least squares estimator
maximizes the likelihood of the linear regression model. Under most circumstances, however, numerical methods will be
necessary to find the maximum of the likelihood function.
From a statistical standpoint, the observations y = (y1 , y2 , . . . , yn ) are a random sample from an unknown population.
The goal is to make inferences about the population that is most likely to have generated the sample, specifically the
probability distribution corresponding to the population. Associated with each probability distribution is a unique vector
θ = [θ1 , θ2 , . . . , θk ]T of parameters that index the probability distribution within a parametric family { f (· ; θ) | θ ∈ Θ}.
As θ changes in value, different probability distributions are generated. The idea of maximum likelihood is to re-express
the joint probability of the sample data f (y1 , y2 , . . . , yn ; θ) as a likelihood function L(θ ; y) that treats θ as a variable. For
independent and identically distributed random variables, the likelihood function is defined as
L(θ ; y) = f (y1 ; θ) f (y2 ; θ) · · · f (yn ; θ) =
n
∏
f (yi ; θ)
(65)
i=1
and evaluated at the observed data sample. The goal is then to find the values of the model parameter that maximize the
likelihood function over the parameter space Θ . Intuitively, this selects the parameter values that make the observed data
most probable. The problem is thus to find the supremum value of the likelihood function by choice of the parameter
L(b
θ ; y) = supθ∈Θ L(θ ; y) where the estimator b
θ =b
θ (y) is function of the sample. A sufficient but not necessary condition
for its existence is for the likelihood function to be continuous over a parameter space Θ that is compact. For an open Θ
the likelihood function may increase without ever reaching a supremum value.
In practice, it is often convenient to work with the natural logarithm of the likelihood function, called the log-likelihood:
ℓ(θ ; y) = ln L(θ ; y). Since the logarithm is a monotonic function, the maximum of ℓ(θ ; y) occurs at the same value of θ
as does the maximum of L. If ℓ(θ ; y) is differentiable in θ, the necessary conditions for the occurrence of a maximum (or
a minimum) are
∂ℓ
= 0,
∂θ1
∂ℓ
= 0,
∂θ2
...,
∂ℓ
= 0,
∂θk
(66)
known as the likelihood equations. For some models, these equations can be explicitly solved for b
θ , but in general no
closed-form solution to the maximization problem is known or available, and an MLE can only be found via numerical
optimization. Another problem is that in finite samples, there may exist multiple roots for the likelihood equations.
Whether the identified root b
θ of the likelihood equations is indeed a (local) maximum depends on whether the matrix of
second-order partial and cross-partial derivatives,
∂2 ℓ
2
∂θ1 θ=bθ
∂2 ℓ
∂θ2 ∂θ1 b
θ=θ
H(b
θ ) =
..
.
∂2 ℓ
∂2 ℓ
∂θ1 ∂θ2 θ=b
θ
∂2 ℓ
2
∂θ2 θ=b
θ
...
∂2 ℓ
∂θk ∂θ2 θ=b
θ
...
..
.
∂θk ∂θ1 θ=b
θ
...
..
.
∂2 ℓ
∂θ1 ∂θk θ=b
θ
2
∂ ℓ
∂θ2 ∂θk θ=b
θ
..
.
∂2 ℓ
∂θk2 θ=b
θ
,
(67)
known as the Hessian matrix is negative semi-definite at b
θ , which indicates local concavity. Conveniently, most common
probability distributionsin particular the exponential familyare logarithmically concave.
Note that the probability density function using the shape-scale parameterization for gamma distribution is given by
x
xk−1 e− θ
f (x; k, θ) = k
θ Γ(k)
for x > 0 and k, θ > 0.
Here Γ(k) is the gamma function evaluated at k. The likelihood function for N iid observations (x1 , ..., xN ) is
93
(68)
http://ijsp.ccsenet.org
L(k, θ) =
∏N
i=1
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
f (xi ; k, θ) from which we calculate the log-likelihood function
ℓ(k, θ) = (k − 1)
N
∑
i=1
ln(xi ) −
N
∑
xi
− Nk ln(θ) − N ln(Γ(k))
θ
i=1
(69)
Finding the maximum with respect to θ by taking the derivative and setting it equal to zero yields the maximum likelihood
estimator of the θ parameter:
1 ∑N
θ̂ = kN
i=1 xi .
Substituting this into the log-likelihood function gives
ℓ = (k − 1)
N
∑
i=1
ln(xi ) − Nk − Nk ln
(∑
)
xi
− N ln(Γ(k))
kN
Finding the maximum with respect to k by taking the derivative and setting it equal to zero yields
N
N
1 ∑
1 ∑
xi −
ln(xi )
ln(k) − ψ(k) = ln
N i=1
N i=1
An initial value of k can be found either using the method of moments, or using the approximation
(
)
1
1
1+
ln(k) − ψ(k) ≈
2k
6k + 1
If we let
N
N
1 ∑
1 ∑
s = ln
xi −
ln(xi )
N i=1
N i=1
then k is approximately
k≈
3−s+
√
(s − 3)2 + 24s
12s
(70)
(71)
(72)
(73)
(74)
Suppose random variable X is exponentially distributed with rate parameter θ, and x1 , . . . , xn are n independent samples
from X, with sample mean x̄.
The likelihood function for θ, given an independent and identically distributed sample x = (x1, ..., xn) drawn from the
variable, is:
n
n
∏
∑
n
(75)
L(θ) =
θ exp(−θxi ) = θ exp −θ
xi = θn exp (−θnx)
i=1
i=1
, where:
∑
x = 1n ni=1 xi is the sample mean.
The derivative of the likelihood function’s logarithm is:
> 0, 0 < θ < 1x ,
d
d
n
(n ln(θ) − θnx) = − nx
ln(L(θ)) =
= 0, θ = 1x ,
dθ
dθ
θ
< 0, θ > 1 .
x
(76)
Consequently, the maximum likelihood estimate for the rate parameter is:
n
1
b
θ= = ∑
x
i xi
(77)
Although this is not an unbiased estimator of θ, x is an unbiased MLE estimator of 1/θ = β, where β is the scale parameter
defined in the ’Alternative parameterization’ section above and the distribution mean.
The bias of θ̂mle is equal to
[
] θ
b ≡ E (θ̂mle − θ) =
n
94
(78)
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
which yields the bias-corrected maximum likelihood estimator
∗
θ̂ mle = θ̂ mle − b̂
The Fisher information, denoted I(θ), for an estimator of the rate parameter θ is given as:
(
)2 ∫ (
)2
∂
∂
log f (x; θ) θ =
log f (x; θ) f (x; θ) dx
I(θ) = E
∂θ
∂θ
Substituting in the distribution and solving gives:
)2
)2
∫ ∞(
∫ ∞(
∂
1
I(θ) =
log θe−θx θe−θx dx =
− x θe−θx dx = θ−2 .
∂θ
θ
0
0
(79)
(80)
(81)
This determines the amount of information each independent sample of an exponential distribution carries about the
unknown rate parameter θ .
3.8 Joint Moments of i.i.d. Exponential Order Statistics
Let X1 , . . . , Xn be n independent and identically distributed exponential random variables with rate parameter θ. Let
X(1) , . . . , X(n) denote the corresponding order statistics. For i < j, the joint moment E[X(i) X( j) ] of the order statistics X(i)
and X( j) is given by
1
1
2
1
2
E[X(i) ] + E[X(i)
]=
+
.
(n − j + 1)θ
(n − j + 1)θ (n − i + 1)θ ((n − i + 1)θ)2
This can be seen by invoking the law of total expectation and the memoryless property:
∫ ∞
E[X(i) X( j) | X(i) = x] fX(i) (x) dx
E[X(i) X( j) ] =
∫0 ∞
x E[X( j) | X( j) ≥ x] fX(i) (x) dx
(since X(i) = x =⇒ X( j) ≥ x)
=
x=0
]
∫ ∞ [
1
x
+ x fX(i) (x) dx
(by the memoryless property)
=
(n
−
j
+ 1)θ
x=0
1
2
=
E[X(i) ] + E[X(i)
].
(n − j + 1)θ
E[X(i) X( j) ] =
(82)
(83)
The first equation follows from the law of total expectation. The second equation exploits the fact that once we condition
on X(i) = x, it must follow that X( j) ≥ x. The third equation relies on the memoryless property to replace E[X( j) | X( j) ≥ x]
with E[X( j) ] + x.
3.9 Sum of Two Independent Exponential Random Variables
The sum of two independent variables corresponds to the convolution of probability distributions. If X1 andX2 are independent exponential random variables of independent observations, with respective rate parameters θ1 and θ2 , then the
probability density of Z = X1 + X2 is given by,
∫ z
fX1 (x1 ) fX2 (z − x1 ) dx1
fZ (z) =
0
∫ z
(84)
exp[(θ2 − θ1 )x1 ] dx1
=θ1 θ2 exp[−θ2 z]
0
)
θ1 θ2 (
=
exp[−θ1 z] − exp[−θ2 z]
θ2 − θ1
The mean and variance of Z = X1 + X2 are found to be
∫
E[Z] =
z d fZ (z)
= θ1−1 + θ2−1
var(Z) = E[Z 2 ] − E[Z]2
∫
=
z2 d fZ (z) − (θ1−1 + θ2−1 )2
= θ1−2 + θ2−2
95
(85)
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
The cumulative distribution function of the sum of two independent exponential random variables then follows as,
∫ z
fZ (t) dt
FZ (z) = Pr(Z ≤ z) =
0
=1+
θ2
θ1
exp[−θ2 z] −
exp[−θ1 z]
θ2 − θ1
θ2 − θ1
(86)
For the Weibull distribution, the maximum likelihood estimator for the ξ parameter given ν is
1∑ ν
b
x
ξν =
n i=1 i
n
(87)
The maximum likelihood estimator for ν is the solution for ν of the following equation
∑n ν
n
1 1∑
i=1 x ln xi
0 = ∑n i ν − −
ln xi
ν n i=1
i=1 xi
(88)
This equation defining b
ν only implicitly, one must generally solve for ν by numerical means.
When x1 > x2 > · · · > xN are the N largest observed samples from a dataset of more than N samples, then the maximum
likelihood estimator for the ξ parameter given ν is
N
1 ∑ ν
b
ξν =
(x − xνN )
N i=1 i
(89)
Also given that condition, the maximum likelihood estimator for ν is
∑N ν
N
ν
1 ∑
i=1 (xi ln xi − xN ln xN )
ln xi
−
0=
∑N ν
ν
N i=1
i=1 (xi − xN )
(90)
Again, this being an implicit function, one must generally solve for ν by numerical means.
4. Statistical Results and Data Analysis
An application of the distributions discussed above were carried out on the Ovarian Cancer Survival Data in which
a randomized trial was performed comparing two treatments for ovarian cancer .The specified distributions were then
compared parametrically thus the best choice was based on the minimum value of the AIC results.
There are different methods for selecting the most appropriate model in statistical analysis. The most commonly used
methods include information and likelihood based criteria. To compare the different sub-families of a distribution used in
the study , the information based criteria is applied . The most commonly used model selection criteria are the Akaike
information criterion (AIC) and Bayesian information criterion (BIC). AIC is given by the expression,
AIC = −2log(L) + 2k
(91)
where L is the maximized likelihood value and k is the number of parameters in the model.
BIC is given by the expression
BIC = −2log(L) + kln(N)
(92)
where N is the total sample size.
The model with the smallest AIC value is considered a better fit.
Table 1. Table showing results when Generalised Gamma is used as the distribution
Parameters
κ
ϕ
β
Log-likelihood
AIC
Estimates
6.426
1.426
-0.766
L95
4.984
0.888
-3.340
U95
7.868
2.292
1.807
SE
0.736
0.345
1.313
Log-likelihood
AIC
-96.94907
199.8981
96
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
0.0
0.2
0.4
0.6
0.8
1.0
http://ijsp.ccsenet.org
0
200
400
600
800
1000
1200
Figure 1. For gengamma distribution
Table 2. Table showing results when Exponential is used as the distribution
Parameters
rate
Log-likelihood
AIC
Estimates
0.000770
L95
0.000437
U95
0.001356
SE
0.000222
Log-likelihood
AIC
-98.0322
198.0644
Table 3. Table showing results when Weibull is used as the distribution
Parameters
shape
scale
Log-likelihood
AIC
Estimates
1.108
1225.419
L95
0.674
690.421
U95
1.822
2174.979
SE
0.281
358.714
Log-likelihood
AIC
-97.9539
199.9078
5. Discussion and Conclusion
Because of its constant failure rate property, the exponential distribution is an excellent model for the long flat ”intrinsic
failure” portion of the Bathtub Curve. Since most components and systems spend most of their lifetimes in this portion
of the Bathtub Curve, this justifies frequent use of the exponential distribution (when early failures or wear out is not a
concern). Just as it is often useful to approximate a curve by piecewise straight line segments, we can approximate any
failure rate curve by week-by-week or month-by-month constant rates that are the average of the actual changing rate
during the respective time durations. That way we can approximate any model by piecewise exponential distribution segments patched together. Some natural phenomena have a constant failure rate (or occurrence rate) property; for example,
the arrival rate of cosmic ray alpha particles or Geiger counter tics. The exponential model works well for inter arrival
times (while the Poisson distribution describes the total number of events in a given period). When these events trigger
failures, the exponential life distribution model will naturally apply. A probability plot of normalized exponential data
was generated, so that a perfect exponential fit is a diagonal line with slope 1. The probability plot for 200 normalized
97
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
0.0
0.2
0.4
0.6
0.8
1.0
http://ijsp.ccsenet.org
0
200
400
600
800
1000
1200
0.0
0.2
0.4
0.6
0.8
1.0
Figure 2. Weibul distribution superimposed to generalised gamma distribution
0
200
400
600
800
1000
1200
Figure 3. Exponential distrbution superimposed to generalised gamma distribution
98
International Journal of Statistics and Probability
20
http://ijsp.ccsenet.org
Vol. 8, No. 5; 2019
10
5
0
Hazard ratio (status)
15
Generalized gamma: AFT
Generalized gamma: PH
0
2
4
6
8
Period
Figure 4. Generalized-gamma-graph with AFT and PH
99
10
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
0.500
0.100
0.005
0.020
ln(1/(1-F(t)))
2.000
Weibull Q-Q Plot
500
1000
2000
5000
Time
0
50
100
150
200
250
Figure 5. Weibull-qq-plot
0
50
100
150
200
250
300
Theoretical Quantiles
Figure 6. Exponential-distribution quantile
100
Vol. 8, No. 5; 2019
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
random exponential observations (θ = 0.02) is as shown in the graph above. The exponential PDF and CDF at 200 hours
for the case where θ = 0.02 were computed and found as the PDF value 0.0003663128 and the CDF value 0.9816844.
Because of its flexible shape and ability to model a wide range of failure rates, the Weibull has been used successfully
in many applications as a purely empirical model. The Weibull model can be derived theoretically as a form of Extreme
Value Distribution, governing the time to occurrence of the ”weakest link” of many competing failure processes. This
may explain why it has been so successful in applications such as capacitor, ball bearing, relay and material strength
failures. Another special case of the Weibull occurs when the shape parameter is 2. The distribution is called the Rayleigh
Distribution and it turns out to be the theoretical probability model for the magnitude of radial error when the x andy
coordinate errors are independent normals with 0 mean and the same standard deviation. In case the data follow a Weibull
distribution, the points should follow a straight line. The PDF and CDF values for failure time T = 2000 were computed,
using the Weibull distribution with κ = 1.8 and λ = 3000 with 200 generated values. The PDF value is 0.0002678883 and
the CDF value is 0.3824452. Special Case: When κ = 1, the Weibull reduces to the Exponential Model, with λ = 1/θ =
the mean time to failure (MTTF).
The Weibull distribution has the ability to assume the characteristics of many different types of distributions. This has
made it extremely popular among engineers and quality practitioners, who have made it the most commonly used distribution for modeling reliability data. Most researchers like incorporating the Weibull distribution into their data analysis
because it is flexible enough to model a variety of data sets. The Weibull distribution can also model hazard functions that
are decreasing, increasing or constant, allowing it to describe any phase of an item’s lifetime. The Weibull distribution is
used extensively in reliability applications to model failure times.
Acknowledgements
Our appreciation goes to PAUISTI and AU as a whole for their support during the research study. I would also like to
thank the supervisors for their tireless effort during the study.
References
Stacy, E. W., & Mihram, G. A. (1965). Parameter estimation for a generalized gamma distribution. Technometrics, 7,
349-358. https://doi.org/10.2307/1266594
Shukla, G., & Kumar, V. (2006). Use of generalized gamma type model in life failure data. Indian Journal of Applied
Statistics, 10, 13-20.
Lawless, J. F. (1980). Inference in the generalized gamma and log gamma distributions. Technometrics, 22(3), 409-419.
https://doi.org/10.1080/00401706.1980.10486173
Cox, C., Chu, H., Schneider, M. F., & Munoz, A. (2007). Parametric survival analysis and taxonomy of hazard functions
for the generalized gamma distribution. Statistics in Medicine, 26, 4252-4374.
Carroll, R. J., & Stefanski, L. A. (1990). Approximate quasilikelihood estimation in models with surrogate predictor. J.
Amer. Statist. Assoc. 85, 652-663. https://doi.org/10.2307/2290000
Stacy, E. W. (1962). A generalization of the gamma distribution. Annals of Mathematical Statistics, 33, 1187-92. https://doi.org/10.1214/aoms/1177704481
Nadarajah, S., & Gupta, A. K. (2007). A generalized gamma distribution with application to drought data. Mathem.
Comput. Simulation, 74, 1-7. https://doi.org/10.1016/j.matcom.2006.04.004
Dadpay, A., Soofi, E. S., & Soyer, R. (2007). Information measures for generalized gamma family. J. Econometrics, 138,
568-585. https://doi.org/10.1016/j.jeconom.2006.05.010
Ghitany, M. E. (1998). On the recent generalization of gamma distribution. Communication in Statistics-Theory and
Methods, 27(1), 223-233. https://doi.org/10.1080/03610929808832662
Khodabin, M., & Ahmadabadi A. R. (2010). Some properties of generalized gamma distribution. Mathematica science,
4(1).
Leipnik, R. B. (January 1991). On Lognormal Random Variables: I-The Characteristic Function. Journal of the Australian
Mathematical Society Series B, 32(3), 327-347. https://doi.org/10.1017/S0334270000006901
Jiang, R., & Murthy, D. N. P. (2011). A study of Weibull shape parameter: Properties and significance. Reliability
Engineering and System Safety, 96(12), 1619-26. https://doi.org/10.1016/j.ress.2011.09.003
Bauckhage, C. (2013). Computing the Kullback-Leibler Divergence between two Weibull Distributions.
Sharif, M. N., & Islam, M. N. (1980). The Weibull distribution as a general model for forecasting technological change.
101
http://ijsp.ccsenet.org
International Journal of Statistics and Probability
Vol. 8, No. 5; 2019
Technological Forecasting and Social Change, 18(3), 247-56. https://doi.org/10.1016/0040-1625(80)90026-8
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 21. Wiley,
New York.
Weibull, W. (1951). A Statistical Distribution Function of Wide Applicability. Journal of Applied Mechanics.
Park, S. Y., & Bera, A. K. (2009). Maximum entropy autoregressive conditional heteroskedasticity model. Journal of
Econometrics, 150(2), 219-230.
Zhi-Sheng, Y., & Nan, C. (2017) Closed-Form Estimators for the Gamma Distribution Derived from Likelihood Equations. The American Statistician, 71(2), 177-181.
Ahrens, J. H., & Dieter, U. (1974). Computer methods for sampling from gamma, beta. Poisson and binomial distributions
Computing, 12(3), 223-246. https://doi.org/10.1007/BF02293108
Choi, S. C., & Wette, R. (1969). Maximum Likelihood Estimation of the Parameters of the Gamma Distribution and Their
Bias. Technometrics, 11(4), 683-690. https://doi.org/10.1080/00401706.1969.10490731
Marsaglia, G., & Tsang, W. W. (2000). A simple method for generating gamma variables. ACM Transactions on Mathematical Software, 26(3), 363-372. https://doi.org/10.18637/jss.v005.i08
Minka, T. P. (2002). Estimating a Gamma distribution.
Rossi, R. J. (2018). Mathematical Statistics : An Introduction to Likelihood Based Inference. New York: John Wiley and
Sons. p. 227. https://doi.org/10.1002/9781118771075
Myung, I. J. (2003). Tutorial on Maximum Likelihood Estimation. Journal of Mathematical Psychology, 47(1), 90-100.
https://doi.org/10.1016/S0022-2496(02)00028-7
Gould, W., Pitblado, J., & Poi, B. (2010). Maximum Likelihood Estimation with Stata (Fourth ed.). College Station: Stata
Press. pp. 13-20.
Pratt, J. W. (1976). F. Y. Edgeworth and R. A. Fisher on the efficiency of maximum likelihood estimation. The Annals of
Statistics, 4(3), 501-514. https://doi.org/10.1214/aos/1176343457
Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162-176.
https://doi.org/10.1214/ss/1030037906
Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license
(http://creativecommons.org/licenses/by/4.0/).
102