Academia.eduAcademia.edu

On Generalized Gamma Distribution and Its Application to Survival Data

2019, International Journal of Statistics and Probability

International Journal of Statistics and Probability; Vol. 8, No. 5; September 2019 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education On Generalized Gamma Distribution and Its Application to Survival Data Kiche J1 , Oscar Ngesa2 & George Orwa3 1 Mathematics Department, Pan African University, Institute of Basic Sciences, Technology, and Innovation, Kenya 2 Mathematics and Informatics Department, Taita-Taveta University, Kenya 3 Statistics and Actuarial Sciences Department, Jomo Kenyatta University of Agriculture and Technology, Kenya Correspondence: Kiche J, Mathematics Department, Pan African University, Institute of Basic Sciences, Technology, and Innovation, Kenya. E-mail: kicheopiyo07@gmail.com Received: July 7, 2019 doi:10.5539/ijsp.v8n5p85 Accepted: August 27, 2019 Online Published: August 30, 2019 URL: https://doi.org/10.5539/ijsp.v8n5p85 Abstract The generalized gamma distribution is a continuous probability distribution with three parameters. It is a generalization of the two-parameter gamma distribution. Since many distributions commonly used for parametric models in survival analysis (such as the Exponential distribution , the Weibull distribution and the Gamma distribution) are special cases of the generalized gamma, it is sometimes used to determine which parametric model is appropriate for a given set of data. Generalized gamma distribution is one of the distributions used in frailty modeling. In this study , it is shown that generalized gamma distribution has three sub-families and its application to the analysis of a survival data has also been explored. The parametric modeling approach has been carried out to find the expected results. Keywords: generalized gamma, sub-families, Kullback-Leibler divergence 1. Introduction The early generalization of gamma distribution can be traced back to Amoroso(1925)who discussed a generalized gamma distribution and applied it to fit income rates. Johnson et al. gave a four parameter generalized gamma distribution which reduces to the generalized gamma distribution defined by Stacy (1962) when the location parameter is set to zero.The generalized gamma defined by Stacy (1962) is a three-parameter exponentiated gamma distribution. Mudholkar and Srivastava (1993) introduced the exponentiated method to derive a distribution. Agarwal and Al-Saleh (2001) applied generalized gamma to study hazard rates. Balakrishnan and Peng (2006) applied this distribution to develop generalized gamma frailty model.Nadarajah and Gupta (2007) proposed another type of generalized gamma distribution with application to fitting drought data. Cordeiro et al. (2012) derived another generalization of Stacy’s generalized gamma distribution using exponentiated method, and applied it to life time and survival analysis. The generalized gamma distribution presents a flexible family in the varieties of shapes and hazard functions for modeling duration. Distributions that are used in duration analysis in economics include exponential, log-normal, gamma, and Weibull . The generalized gamma family, which encompasses exponential, gamma, and Weibull as subfamilies, and lognormal as a limiting distribution, has been used in economics by Jaggia, Yamaguchi, and Allenby et al . Some authors have argued that the flexibility of generalized gamma makes it suitable for duration analysis, while others have advocated use of simpler models because of estimation difficulties caused by the complexity of generalized gamma parameter structure. Obviously, there would be no need to endure the costs associated with the application of a complex generalized gamma model if the data do not discriminate between the generalized gamma and members of its subfamilies, or if the fit of a simpler model to the data is as good as that for the complex generalized gamma. Hager and Bain inhibited applications of the generalized gamma model. Prentice resolved the convergence problem using a nonlinear transformation of generalized gamma model. Maximumlikelihood estimation of the parameters and quasi maximum likelihood estimators for its subfamily (two-parameter gamma distribution) can be found. Hwang, T. et al introduced a new moment estimation of parameters of the generalized gamma distribution using it’s characterization. In information theory, thus far a maximum entropy (ME) derivation of generalized gamma is found in Kapur, where it is referred to as generalized Weibull distribution, and the entropy of GG has appeared in the context of flexible families of distributions. Some concepts of this family in information theory has introduced by 85 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 Dadpay et al . 2. Parametric Distributions and Their Properties 2.1 Log-Normal Distribution A log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. If the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Likewise, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. A log-normal process is the statistical realization of the multiplicative product of many independent random variables, each of which is positive. This is justified by considering the central limit theorem in the log domain. The log-normal distribution is the maximum entropy probability distribution for a random variateX for which the mean and variance of ln(X) are specified. A positive random variable X is log-normally distributed if the logarithm of X is normally distributed, ln(X) ∼ N(µ, σ2 ) The probability density function of the log-normal distribution is given by: 2 fX (x) = (lnx−µ) 1 1 ) √ exp(− x σ 2π 2σ2 (1) The cumulative distribution is given as (lnx) − µ ) σ (2) lnx − µ 1 [1 + er f ( √ )] 2 σ 2 (3) F X (x) = Φ( which may also be expressed as : where erf is the error function. Approximating formular for the characteristic function φ(t) can be given as 2 µ 2 )+2ω(−itσ exp( −ω (−itσ e2σ 2 φ(t) ≈ √ 1 + ω(−itσ2 eµ ) 2 µ e ) (4) where ω is the Lambert W function. For a log-normal random variable, the partial expectation is given by µ + σ2 − lnk ) σ (5) 2σ2 er f −1 (2F − 1)) (6) 1 2 g(k) = eµ+ 2 σ Φ( The quantile for the log-normal distribution is given by exp(µ + √ while the variance is given as The skewness is given by the formular [exp(σ2 ) − 1]exp(2µ + σ2 ) (7) √ 2 (eσ + 2) eσ2 − 1 (8) exp(4σ2 ) + 2exp(3σ2 ) + 3exp(2σ2 ) − 6 (9) The Ex.kurtosis is then denoted by The Support is over x ∈ (0, +∞) while the mean is exp(µ + 1 √ log2 (σeµ+ 2 2π) 86 σ2 2 ), the mode is exp(µ − σ2 ) and the Entropy is given as http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 2.2 Weibull Distribution The Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Frechet (1927) and first applied by Rosin and Rammler (1933) to describe a particle size distribution. The Weibull distribution is one of the most widely used lifetime distributions in reliability engineering. It is a versatile distribution that can take on the characteristics of other types of distributions, based on the value of the shape parameter. The probability density function of a Weibull random variable is ν x ν ,x≥0 (10) f (x, ξ, ν) = ( )ν−1 e−(x/ξ) ξ ξ where ν > 0 is the shape parameter and ξ > 0 is the scale parameter of the distribution. The cumulative distribution function for the Weibull distribution is ν F(x, ξ, ν) = 1 − e−(x/ξ) (11) The failure rate h (or hazard function) is given by ν x h(x, ξ, ν) = ( )ν−1 ξ ξ (12) The quantile (inverse cumulative distribution) function for the Weibull distribution is Q(p; ξ, ν) = ξ(−ln(1 − p))1/ν The moment generating function of the logarithm of a Weibull distributed random variable is given by t E[etlogX ] = ξt Γ( + 1) ν The mean and variance of a Weibull random variable can be expressed respectively as and (13) (14) 1 E(X) = ξΓ(1 + ) ν (15) 2 1 Var(X) = ξ2 [Γ(1 + ) − (Γ(1 + ))2 ] ν ν (16) The skewness is given by 3 Γ(1 + )ξ3 − 3µσ2 − µ3 ν γ1 = σ3 where the mean is denoted by µ and the standard deviation is denoted by σ. The excess kurtosis is given by γ2 = (17) −6Γ41 + 12Γ21 Γ2 − 3Γ22 − 4Γ1 Γ3 + Γ4 (18) ξ 1 H(ξ, ν) = γ(1 − ) + ln( ) + 1 ν ν (19) [Γ2 − Γ21 ]2 The information entropy is given by where γ is the EulerMascheroni constant. The Kullback-Leibler divergence for Weibull distribution is given by DKL (Weib1 ∥ Weib2 ) = log ν1 ξ1 ν2 ν2 γ − log ν2 + (ν1 − ν2 )[logξ1 − ] + ( )ν2 Γ( + 1) − 1 ν1 ξ2 ν1 ξ1ν1 ξ2 The variances and covariances of ν̂ and ξ̂ are estimated from the inverse local Fisher matrix as −1  2 ∂2 Λ  ∂ Λ    −  − ˆ ˆ  Var(ν̂) Cov(ν̂, ξ̂)  ∂ν2 ∂ν∂ξ      .    =  . 2 2  ˆ ˆ ξ̂) ∂ Λ  Cov(ν̂, ξ̂) Var(  ∂ Λ − − 2  ∂ν∂ξ ∂ξ ν=ν̂,ξ=ξ̂ 87 (20) (21) http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 2.3 Exponential Distribution The exponential model, with only one unknown parameter, is the simplest of all life distribution models. The exponential distribution is one of the widely used continuous distributions. It is often used to model the time elapsed between events. That is the exponential distribution is often concerned with the amount of time until some specific event occurs. For example, the amount of time (beginning now) until an earthquake occurs has an exponential distribution. Other examples include the length, in minutes, of long distance business telephone calls, and the amount of time, in months, a car battery lasts. An interesting property of the exponential distribution is that it can be viewed as a continuous analogue of the geometric distribution. The probability density function of exponential distribution is given by f (t, θ) = θe−θt , t≥0 (22) The cumulative distribution function is thus denoted by F(t) = 1 − e−θt (23) The reliability and failure rate of the exponential distribution are given respectively as R(t) = e−θt and h(t) = θ while the mean and variance are given by mean = 1/θ and variance = 1/θ2 . Note that the failure rate reduces to the constant θ for any time. The exponential distribution is the only distribution to have a constant failure rate. Also, another name for the exponential mean is the Mean Time To Failure or MTTF and we have MTTF = 1/θ. Kullback-Leibler divergence is given by DKL (exp0 ∥ exp1 ) = log(θ0 ) − log(θ) + θ −1 θ0 (24) The most important property is that the exponential distribution is memoryless.Specifically, the memoryless property says that an exponentially distributed random variable T obeys the relation: Pr(T > a + t/T > a) = Pr(T > t), ∀a, t ≥ 0 (25) Proof Pr(T > a + t/T > a) = Pr(T > a + t ∩ T > a) Pr(T > a) Pr(T > a + t) = Pr(T > a) −θ(a+t) e = −θa e = e−θt = Pr(T > t) (26) (27) (28) (29) (30) hence the proof. The memoryless property says that knowledge of what has occurred in the past has no effect on future probabilities. 2.4 Gamma Distribution The gamma distribution is another widely used distribution. Its importance is largely due to its relation to exponential and normal distributions. The gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. The gamma distribution is used to model errors in multi-level Poisson regression models, because the combination of the Poisson distribution and a gamma distribution is a negative binomial distribution. The gamma distribution is widely used as a conjugate prior in Bayesian statistics. It is the conjugate prior for the precision (i.e. inverse of the variance) of a normal distribution. It is also the conjugate prior for the exponential distribution. The gamma distribution can be 88 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 parameterized in terms of a shape parameter φ = κ and an inverse scale parameter β = 1/θ, called a rate parameter. A random variable X that is gamma-distributed with shape φ and rate β is denoted by X ∼ Γ(φ, β) ≡ Gamma(φ, β) The corresponding probability density function in the shape-rate parametrization is f (x; φ, β) = βφ xφ−1 e−βx Γ(φ) for x > 0 φ, β > 0, (31) xφ−1 e−x/θ f (x; φ, θ) = φ θ Γ(φ) for x > 0, φ, θ > 0. where Γ(φ) is the gamma function. Both parameterizations are common because either can be more convenient depending on the situation. The cumulative distribution function is the regularized gamma function: ∫ x γ(φ, βx) f (u; φ, β) du = F(x; φ, β) = Γ(φ) 0 (32) where γ(φ, βx) is the lower incomplete gamma function. If φ is a positive integer (i.e., the distribution is an Erlang distribution), the cumulative distribution function has the following series expansion: φ−1 ∞ ∑ ∑ (βx)i −βx (βx)i F(x; φ, β) = 1 − e = e−βx . (33) i! i! i=φ i=0 The KullbackLeibler divergence (KL-divergence), of Γ(φ p , β p (”true” distribution) from Γ(φq , βq ) (”approximating” distribution) is given by DKL (φ p , β p ; φq , βq ) = (φ p − φq )ψ(φ p ) − log Γ(φ p ) + log Γ(φq ) (34) βq − β p . + φq (log β p − log βq ) + φ p βp Written using the k, parameterization, the KL-divergence of Γ(k p , θ p ) from Γ(kq , θq ) is given by DKL (k p , θ p ; kq , θq ) = (k p − kq )ψ(k p ) − log Γ(k p ) + log Γ(kq ) θ p − θq + kq (log θq − log θ p ) + k p . θq (35) The Laplace transform of the gamma probability density function is F(s) = (1 + θs)−k = βφ . (s + β)φ (36) Parameterization 1: If Xk ∼ Γ(φk , θk ) are independent, then, φ2 θ2 X1 φ1 θ1 X2 ∼ F(2φ1 , 2φ2 ), or equivalently, X1 X2 ∼ β (φ1 , φ2 , 1, θθ12 ). Parameterization 2: If Xk ∼ Γ(φk , βk ) are independent, then φ2 β1 X1 φ1 β2 X2 ∼ F(2φ1 , 2φ2 ), or equivalently, X1 X2 ∼ β (φ1 , φ2 , 1, ββ12 ). ′ ′ If X ∼ Γ(φ, θ) and Y ∼ Γ(β, θ) are independently distributed, then X/(X + Y) has a beta distribution with parameters φ and β, and X/(X + Y) is independent of X + Y, which is Γ(φ + β, θ)-distributed. 3. Methods 3.1 Generalized Gamma Distribution The generalized gamma has three parameters given in this case as : κ > 0, ϕ > 0, and β > 0. For non-negative x, the probability density function of the generalized gamma is: β (β/κϕ )xϕ−1 e−(x/κ) f (x; κ, ϕ, β) = Γ(ϕ/β) where Γ(.) denotes the gamma function. 89 (37) http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 3.2 The Cumulative Distribution Function The cumulative distribution function (CDF) of a random variable X denoted by F(x), is defined as F(x) = Pr(X ≤ x). Using identity for the probability of disjoint events, if X is a discrete random variable,then n ∑ F(x) = Pr(X = xk ) (38) k=1 where xn gives the largest possible value of X that is less than or equal to x. The cumulative distribution function for a random variable at x gives the probability that the random variable X is less than or equal to that number x. Note that in the formula for CDFs of discrete random variables, we always have n ≤ N, where N is the number of possible outcomes of X. This function, CDF(x), simply tells us the odds of measuring any value up to and including x. The cumulative distribution function (”c.d.f.”) of a continuous random variable X is defined as: ∫ x f (t)dt f or − ∞ < x < ∞. F(x) = (39) −∞ 3.3 Properties of the CDF • 0 ≤ F(x) ≤ 1, ∀x • lim x→−∞ F(x) = 0 • lim x→∞ F(x) = 1 • F(x) is a non-decreasing function of x. A function f (x) is said to be nondecreasing if f (x1 ) ≤ f (x2 ) whenever x1 < x2 . The cumulative distribution function of generalized gamma distribution is given by: F(x; κ, ϕ, β) = γ(ϕ/β, (x/κ)β ) Γ(ϕ/β) (40) where γ(.) denotes the lower incomplete gamma function. The GG is a three-parameter (β, σ > θ, k) family whose survival function is given as S GG (t) = 1 − Γ[k−2 exp(k[log(t) − β]/σ); k−2 ]. (41) S GG (t) = 1 − Γ[k−2 (e−β t)k/σ ; k−2 ]. (42) 3.4 Kullback-Leibler Divergence The Kullback-Leibler divergence (also called relative entropy) is a measure of how one probability distribution diverges from a second, expected probability distribution. Applications include characterizing the relative (Shannon) entropy in information systems, randomness in continuous time-series, and information gain when comparing statistical models of inference. In contrast to variation of information, it is a distribution-wise asymmetric measure and thus does not qualify as a statistical metric of spread. In the simple case, a Kullback-Leibler divergence of 0 indicates that we can expect similar, if not the same, behavior of two different distributions, while a Kullback-Leibler divergence of 1 indicates that the two distributions behave in such a different manner that the expectation given the first distribution approaches zero. If f1 and f2 are the probability density functions of two generalized gamma distributions, then their Kullback-Leibler divergence is given by; DKL ( f1 ∥ f2 ) = = ln β1 κ2ϕ2 Γ(ϕ2 /β2 ) β2 κ1ϕ1 Γ(ϕ1 /β1 ) +[ ∫ ∞ f1 (x; κ1 , ϕ1 , β1 )ln 0 f1 (x; κ1 , ϕ1 , β1 ) dx f2 (x; κ2 , ϕ2 , β2 ) Γ((ϕ1 + ϕ2 )/β1 ) κ1 β2 ϕ1 ψ(ϕ1 /β1 ) + lnκ1 ](ϕ1 − ϕ2 ) + ( ) − β1 Γ(ϕ1 /β1 ) κ2 β1 90 (43) (44) http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 3.5 Moments of Generalized Gamma Distribution If X has a generalized gamma distribution as above, then Γ((ϕ + r)/β) Γ(ϕ/β) E(x)r = κr (45) The mean of a generalized gamma distribution is given by κ Γ((ϕ + 1)/β) Γ(ϕ/β) (46) The mode of a generalized gamma distribution is given by κ( ϕ − 1 1/β ) f orϕ > 1, otherwise, 0 β (47) The variance of generalized gamma distribution is given by κ2 ( Γ((ϕ + 2)/β) Γ((ϕ + 1)/β) 2 −( )) Γ(ϕ/β) Γ(ϕ/β) (48) 1 ϕ ϕ κΓ(ϕ/β) ϕ + + ( − )ψ( ) β β β β β (49) While the entropy is given by ln Skewness is given by Γ(ϕ + β3 )Γ(ϕ)2 − 3Γ(ϕ + 2k )Γ(ϕ + β1 )Γ(ϕ) + 2Γ(ϕ + β1 )3 (Γ(ϕ + β2 )Γ(ϕ) − Γ(ϕ + β1 )2 )3/2 (50) 3.6 Properties of Generalized Gamma Distribution The generalized gamma family is flexible in that it includes several well-known models as subfamilies. The subfamilies of generalized gamma thus far considered in the literature are exponential (ϕ = β = 1), gamma for β = 1, and Weibull for ϕ = 1 . The log-normal distribution is also obtained as a limiting distribution when ϕ −→ ∞. An important property of generalized gamma family for information analysis is that the family is closed under power transformation. That is, if X ∼ GG(α, τ, λ) then τ Y = X s ∼ GG(α, , λ s ), s > 0 s (51) It also has the property that Z = ηX has GG(ηα, τ, λ) distribution. We use the transformed probability density function of the generalized gamma distribution to prove that generalized gamma is flexible with the subfamilies exponential, Weibull and Gamma distributions. That is we let the pdf of the generalized gamma distribution (GG(α, τ, λ)) to be given by f (α, τ, λ) = τ x τ ( )ατ−1 e−(x/λ) λΓ(α) λ then if α = τ = 1 the subfamily is the exponential distribution Proof I From the probability density function of generalized gamma distribution given by 91 (52) http://ijsp.ccsenet.org International Journal of Statistics and Probability f (α, τ, λ) = Vol. 8, No. 5; 2019 x τ τ ( )ατ−1 e−(x/λ) λΓ(α) λ (53) if we replace α = τ = 1 , then we have that 1 x 1 ( )1−1 e−(x/λ) λΓ(1) λ 1 = e−x/λ λ = (54) (55) which is probability density form of an exponential distribution with parameter λ hence the proof. Proof II Similarly, if we now let τ = 1 then, from the probability density function of generalized gamma distribution given by: f (α, τ, λ) = τ x τ ( )ατ−1 e−(x/λ) λΓ(α) λ (56) we have that = x 1 ( )α−1 e−(x/λ) λτ(α) λ (57) simplifying gives; 1 1 1 xα−1 ( )α ( )−1 e−(x/λ) λΓ(α) λ λ ( 1 )α xα−1 e−(x/λ) = λ Γ(α) = (58) (59) which is a probability density function of a gamma distribution with the parameters α and λ Proof III In the last proof ,if we now let α = 1 then from the probability density function of generalized gamma distribution given by f (α, τ, λ) = x τ τ ( )ατ−1 e−(x/λ) λΓ(α) λ (60) we show that the generalized gamma distribution reduces to Weibull distribution as a sub-family. That is we have = τ x τ ( )τ−1 e−(x/λ) λΓ(1) λ (61) This reduces to τ τ−1 1 τ 1 −1 −(x/λ)τ x ( )( ) e λ λ λ 1 τ τ−1 −(x/λ)τ = τ( ) x e λ τ x τ−1 −(x/λ)τ = ( ) e λ λ = which is a probability density function of a Weibull distribution with the parameters λ and τ hence the proof. 92 (62) (63) (64) http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 3.7 Parameter Estimation Using Maximum Likelihood Estimation Maximum likelihood estimation (MLE) is a method of estimating the parameters of a distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference. If the likelihood function is differentiable, the derivative test for determining maxima can be applied. In some cases, the first-order conditions of the likelihood function can be solved explicitly; for instance, the ordinary least squares estimator maximizes the likelihood of the linear regression model. Under most circumstances, however, numerical methods will be necessary to find the maximum of the likelihood function. From a statistical standpoint, the observations y = (y1 , y2 , . . . , yn ) are a random sample from an unknown population. The goal is to make inferences about the population that is most likely to have generated the sample, specifically the probability distribution corresponding to the population. Associated with each probability distribution is a unique vector θ = [θ1 , θ2 , . . . , θk ]T of parameters that index the probability distribution within a parametric family { f (· ; θ) | θ ∈ Θ}. As θ changes in value, different probability distributions are generated. The idea of maximum likelihood is to re-express the joint probability of the sample data f (y1 , y2 , . . . , yn ; θ) as a likelihood function L(θ ; y) that treats θ as a variable. For independent and identically distributed random variables, the likelihood function is defined as L(θ ; y) = f (y1 ; θ) f (y2 ; θ) · · · f (yn ; θ) = n ∏ f (yi ; θ) (65) i=1 and evaluated at the observed data sample. The goal is then to find the values of the model parameter that maximize the likelihood function over the parameter space Θ . Intuitively, this selects the parameter values that make the observed data most probable. The problem is thus to find the supremum value of the likelihood function by choice of the parameter L(b θ ; y) = supθ∈Θ L(θ ; y) where the estimator b θ =b θ (y) is function of the sample. A sufficient but not necessary condition for its existence is for the likelihood function to be continuous over a parameter space Θ that is compact. For an open Θ the likelihood function may increase without ever reaching a supremum value. In practice, it is often convenient to work with the natural logarithm of the likelihood function, called the log-likelihood: ℓ(θ ; y) = ln L(θ ; y). Since the logarithm is a monotonic function, the maximum of ℓ(θ ; y) occurs at the same value of θ as does the maximum of L. If ℓ(θ ; y) is differentiable in θ, the necessary conditions for the occurrence of a maximum (or a minimum) are ∂ℓ = 0, ∂θ1 ∂ℓ = 0, ∂θ2 ..., ∂ℓ = 0, ∂θk (66) known as the likelihood equations. For some models, these equations can be explicitly solved for b θ , but in general no closed-form solution to the maximization problem is known or available, and an MLE can only be found via numerical optimization. Another problem is that in finite samples, there may exist multiple roots for the likelihood equations. Whether the identified root b θ of the likelihood equations is indeed a (local) maximum depends on whether the matrix of second-order partial and cross-partial derivatives,  ∂2 ℓ  2  ∂θ1 θ=bθ  ∂2 ℓ  ∂θ2 ∂θ1 b θ=θ H(b θ ) =  ..  .   ∂2 ℓ ∂2 ℓ ∂θ1 ∂θ2 θ=b θ ∂2 ℓ 2 ∂θ2 θ=b θ ... ∂2 ℓ ∂θk ∂θ2 θ=b θ ... .. . ∂θk ∂θ1 θ=b θ ... .. .  ∂2 ℓ  ∂θ1 ∂θk θ=b  θ 2  ∂ ℓ  ∂θ2 ∂θk θ=b θ  .. . ∂2 ℓ ∂θk2 θ=b θ ,    (67) known as the Hessian matrix is negative semi-definite at b θ , which indicates local concavity. Conveniently, most common probability distributionsin particular the exponential familyare logarithmically concave. Note that the probability density function using the shape-scale parameterization for gamma distribution is given by x xk−1 e− θ f (x; k, θ) = k θ Γ(k) for x > 0 and k, θ > 0. Here Γ(k) is the gamma function evaluated at k. The likelihood function for N iid observations (x1 , ..., xN ) is 93 (68) http://ijsp.ccsenet.org L(k, θ) = ∏N i=1 International Journal of Statistics and Probability Vol. 8, No. 5; 2019 f (xi ; k, θ) from which we calculate the log-likelihood function ℓ(k, θ) = (k − 1) N ∑ i=1 ln(xi ) − N ∑ xi − Nk ln(θ) − N ln(Γ(k)) θ i=1 (69) Finding the maximum with respect to θ by taking the derivative and setting it equal to zero yields the maximum likelihood estimator of the θ parameter: 1 ∑N θ̂ = kN i=1 xi . Substituting this into the log-likelihood function gives ℓ = (k − 1) N ∑ i=1 ln(xi ) − Nk − Nk ln (∑ ) xi − N ln(Γ(k)) kN Finding the maximum with respect to k by taking the derivative and setting it equal to zero yields   N N  1 ∑  1 ∑ xi  − ln(xi ) ln(k) − ψ(k) = ln  N i=1 N i=1 An initial value of k can be found either using the method of moments, or using the approximation ( ) 1 1 1+ ln(k) − ψ(k) ≈ 2k 6k + 1 If we let   N N  1 ∑  1 ∑ s = ln  xi  − ln(xi ) N i=1 N i=1 then k is approximately k≈ 3−s+ √ (s − 3)2 + 24s 12s (70) (71) (72) (73) (74) Suppose random variable X is exponentially distributed with rate parameter θ, and x1 , . . . , xn are n independent samples from X, with sample mean x̄. The likelihood function for θ, given an independent and identically distributed sample x = (x1, ..., xn) drawn from the variable, is:   n n ∏  ∑  n  (75) L(θ) = θ exp(−θxi ) = θ exp −θ xi  = θn exp (−θnx) i=1 i=1 , where: ∑ x = 1n ni=1 xi is the sample mean. The derivative of the likelihood function’s logarithm is:   > 0, 0 < θ < 1x ,        d d n  (n ln(θ) − θnx) = − nx  ln(L(θ)) = = 0, θ = 1x ,   dθ dθ θ      < 0, θ > 1 . x (76) Consequently, the maximum likelihood estimate for the rate parameter is: n 1 b θ= = ∑ x i xi (77) Although this is not an unbiased estimator of θ, x is an unbiased MLE estimator of 1/θ = β, where β is the scale parameter defined in the ’Alternative parameterization’ section above and the distribution mean. The bias of θ̂mle is equal to [ ] θ b ≡ E (θ̂mle − θ) = n 94 (78) http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 which yields the bias-corrected maximum likelihood estimator ∗ θ̂ mle = θ̂ mle − b̂ The Fisher information, denoted I(θ), for an estimator of the rate parameter θ is given as: ( )2  ∫ ( )2   ∂ ∂   log f (x; θ) θ = log f (x; θ) f (x; θ) dx I(θ) = E  ∂θ ∂θ Substituting in the distribution and solving gives: )2 )2 ∫ ∞( ∫ ∞( ∂ 1 I(θ) = log θe−θx θe−θx dx = − x θe−θx dx = θ−2 . ∂θ θ 0 0 (79) (80) (81) This determines the amount of information each independent sample of an exponential distribution carries about the unknown rate parameter θ . 3.8 Joint Moments of i.i.d. Exponential Order Statistics Let X1 , . . . , Xn be n independent and identically distributed exponential random variables with rate parameter θ. Let X(1) , . . . , X(n) denote the corresponding order statistics. For i < j, the joint moment E[X(i) X( j) ] of the order statistics X(i) and X( j) is given by 1 1 2 1 2 E[X(i) ] + E[X(i) ]= + . (n − j + 1)θ (n − j + 1)θ (n − i + 1)θ ((n − i + 1)θ)2 This can be seen by invoking the law of total expectation and the memoryless property: ∫ ∞ E[X(i) X( j) | X(i) = x] fX(i) (x) dx E[X(i) X( j) ] = ∫0 ∞ x E[X( j) | X( j) ≥ x] fX(i) (x) dx (since X(i) = x =⇒ X( j) ≥ x) = x=0 ] ∫ ∞ [ 1 x + x fX(i) (x) dx (by the memoryless property) = (n − j + 1)θ x=0 1 2 = E[X(i) ] + E[X(i) ]. (n − j + 1)θ E[X(i) X( j) ] = (82) (83) The first equation follows from the law of total expectation. The second equation exploits the fact that once we condition on X(i) = x, it must follow that X( j) ≥ x. The third equation relies on the memoryless property to replace E[X( j) | X( j) ≥ x] with E[X( j) ] + x. 3.9 Sum of Two Independent Exponential Random Variables The sum of two independent variables corresponds to the convolution of probability distributions. If X1 andX2 are independent exponential random variables of independent observations, with respective rate parameters θ1 and θ2 , then the probability density of Z = X1 + X2 is given by, ∫ z fX1 (x1 ) fX2 (z − x1 ) dx1 fZ (z) = 0 ∫ z (84) exp[(θ2 − θ1 )x1 ] dx1 =θ1 θ2 exp[−θ2 z] 0 ) θ1 θ2 ( = exp[−θ1 z] − exp[−θ2 z] θ2 − θ1 The mean and variance of Z = X1 + X2 are found to be ∫ E[Z] = z d fZ (z) = θ1−1 + θ2−1 var(Z) = E[Z 2 ] − E[Z]2 ∫ = z2 d fZ (z) − (θ1−1 + θ2−1 )2 = θ1−2 + θ2−2 95 (85) http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 The cumulative distribution function of the sum of two independent exponential random variables then follows as, ∫ z fZ (t) dt FZ (z) = Pr(Z ≤ z) = 0 =1+ θ2 θ1 exp[−θ2 z] − exp[−θ1 z] θ2 − θ1 θ2 − θ1 (86) For the Weibull distribution, the maximum likelihood estimator for the ξ parameter given ν is 1∑ ν b x ξν = n i=1 i n (87) The maximum likelihood estimator for ν is the solution for ν of the following equation ∑n ν n 1 1∑ i=1 x ln xi 0 = ∑n i ν − − ln xi ν n i=1 i=1 xi (88) This equation defining b ν only implicitly, one must generally solve for ν by numerical means. When x1 > x2 > · · · > xN are the N largest observed samples from a dataset of more than N samples, then the maximum likelihood estimator for the ξ parameter given ν is N 1 ∑ ν b ξν = (x − xνN ) N i=1 i (89) Also given that condition, the maximum likelihood estimator for ν is ∑N ν N ν 1 ∑ i=1 (xi ln xi − xN ln xN ) ln xi − 0= ∑N ν ν N i=1 i=1 (xi − xN ) (90) Again, this being an implicit function, one must generally solve for ν by numerical means. 4. Statistical Results and Data Analysis An application of the distributions discussed above were carried out on the Ovarian Cancer Survival Data in which a randomized trial was performed comparing two treatments for ovarian cancer .The specified distributions were then compared parametrically thus the best choice was based on the minimum value of the AIC results. There are different methods for selecting the most appropriate model in statistical analysis. The most commonly used methods include information and likelihood based criteria. To compare the different sub-families of a distribution used in the study , the information based criteria is applied . The most commonly used model selection criteria are the Akaike information criterion (AIC) and Bayesian information criterion (BIC). AIC is given by the expression, AIC = −2log(L) + 2k (91) where L is the maximized likelihood value and k is the number of parameters in the model. BIC is given by the expression BIC = −2log(L) + kln(N) (92) where N is the total sample size. The model with the smallest AIC value is considered a better fit. Table 1. Table showing results when Generalised Gamma is used as the distribution Parameters κ ϕ β Log-likelihood AIC Estimates 6.426 1.426 -0.766 L95 4.984 0.888 -3.340 U95 7.868 2.292 1.807 SE 0.736 0.345 1.313 Log-likelihood AIC -96.94907 199.8981 96 International Journal of Statistics and Probability Vol. 8, No. 5; 2019 0.0 0.2 0.4 0.6 0.8 1.0 http://ijsp.ccsenet.org 0 200 400 600 800 1000 1200 Figure 1. For gengamma distribution Table 2. Table showing results when Exponential is used as the distribution Parameters rate Log-likelihood AIC Estimates 0.000770 L95 0.000437 U95 0.001356 SE 0.000222 Log-likelihood AIC -98.0322 198.0644 Table 3. Table showing results when Weibull is used as the distribution Parameters shape scale Log-likelihood AIC Estimates 1.108 1225.419 L95 0.674 690.421 U95 1.822 2174.979 SE 0.281 358.714 Log-likelihood AIC -97.9539 199.9078 5. Discussion and Conclusion Because of its constant failure rate property, the exponential distribution is an excellent model for the long flat ”intrinsic failure” portion of the Bathtub Curve. Since most components and systems spend most of their lifetimes in this portion of the Bathtub Curve, this justifies frequent use of the exponential distribution (when early failures or wear out is not a concern). Just as it is often useful to approximate a curve by piecewise straight line segments, we can approximate any failure rate curve by week-by-week or month-by-month constant rates that are the average of the actual changing rate during the respective time durations. That way we can approximate any model by piecewise exponential distribution segments patched together. Some natural phenomena have a constant failure rate (or occurrence rate) property; for example, the arrival rate of cosmic ray alpha particles or Geiger counter tics. The exponential model works well for inter arrival times (while the Poisson distribution describes the total number of events in a given period). When these events trigger failures, the exponential life distribution model will naturally apply. A probability plot of normalized exponential data was generated, so that a perfect exponential fit is a diagonal line with slope 1. The probability plot for 200 normalized 97 International Journal of Statistics and Probability Vol. 8, No. 5; 2019 0.0 0.2 0.4 0.6 0.8 1.0 http://ijsp.ccsenet.org 0 200 400 600 800 1000 1200 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2. Weibul distribution superimposed to generalised gamma distribution 0 200 400 600 800 1000 1200 Figure 3. Exponential distrbution superimposed to generalised gamma distribution 98 International Journal of Statistics and Probability 20 http://ijsp.ccsenet.org Vol. 8, No. 5; 2019 10 5 0 Hazard ratio (status) 15 Generalized gamma: AFT Generalized gamma: PH 0 2 4 6 8 Period Figure 4. Generalized-gamma-graph with AFT and PH 99 10 http://ijsp.ccsenet.org International Journal of Statistics and Probability 0.500 0.100 0.005 0.020 ln(1/(1-F(t))) 2.000 Weibull Q-Q Plot 500 1000 2000 5000 Time 0 50 100 150 200 250 Figure 5. Weibull-qq-plot 0 50 100 150 200 250 300 Theoretical Quantiles Figure 6. Exponential-distribution quantile 100 Vol. 8, No. 5; 2019 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 random exponential observations (θ = 0.02) is as shown in the graph above. The exponential PDF and CDF at 200 hours for the case where θ = 0.02 were computed and found as the PDF value 0.0003663128 and the CDF value 0.9816844. Because of its flexible shape and ability to model a wide range of failure rates, the Weibull has been used successfully in many applications as a purely empirical model. The Weibull model can be derived theoretically as a form of Extreme Value Distribution, governing the time to occurrence of the ”weakest link” of many competing failure processes. This may explain why it has been so successful in applications such as capacitor, ball bearing, relay and material strength failures. Another special case of the Weibull occurs when the shape parameter is 2. The distribution is called the Rayleigh Distribution and it turns out to be the theoretical probability model for the magnitude of radial error when the x andy coordinate errors are independent normals with 0 mean and the same standard deviation. In case the data follow a Weibull distribution, the points should follow a straight line. The PDF and CDF values for failure time T = 2000 were computed, using the Weibull distribution with κ = 1.8 and λ = 3000 with 200 generated values. The PDF value is 0.0002678883 and the CDF value is 0.3824452. Special Case: When κ = 1, the Weibull reduces to the Exponential Model, with λ = 1/θ = the mean time to failure (MTTF). The Weibull distribution has the ability to assume the characteristics of many different types of distributions. This has made it extremely popular among engineers and quality practitioners, who have made it the most commonly used distribution for modeling reliability data. Most researchers like incorporating the Weibull distribution into their data analysis because it is flexible enough to model a variety of data sets. The Weibull distribution can also model hazard functions that are decreasing, increasing or constant, allowing it to describe any phase of an item’s lifetime. The Weibull distribution is used extensively in reliability applications to model failure times. Acknowledgements Our appreciation goes to PAUISTI and AU as a whole for their support during the research study. I would also like to thank the supervisors for their tireless effort during the study. References Stacy, E. W., & Mihram, G. A. (1965). Parameter estimation for a generalized gamma distribution. Technometrics, 7, 349-358. https://doi.org/10.2307/1266594 Shukla, G., & Kumar, V. (2006). Use of generalized gamma type model in life failure data. Indian Journal of Applied Statistics, 10, 13-20. Lawless, J. F. (1980). Inference in the generalized gamma and log gamma distributions. Technometrics, 22(3), 409-419. https://doi.org/10.1080/00401706.1980.10486173 Cox, C., Chu, H., Schneider, M. F., & Munoz, A. (2007). Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine, 26, 4252-4374. Carroll, R. J., & Stefanski, L. A. (1990). Approximate quasilikelihood estimation in models with surrogate predictor. J. Amer. Statist. Assoc. 85, 652-663. https://doi.org/10.2307/2290000 Stacy, E. W. (1962). A generalization of the gamma distribution. Annals of Mathematical Statistics, 33, 1187-92. https://doi.org/10.1214/aoms/1177704481 Nadarajah, S., & Gupta, A. K. (2007). A generalized gamma distribution with application to drought data. Mathem. Comput. Simulation, 74, 1-7. https://doi.org/10.1016/j.matcom.2006.04.004 Dadpay, A., Soofi, E. S., & Soyer, R. (2007). Information measures for generalized gamma family. J. Econometrics, 138, 568-585. https://doi.org/10.1016/j.jeconom.2006.05.010 Ghitany, M. E. (1998). On the recent generalization of gamma distribution. Communication in Statistics-Theory and Methods, 27(1), 223-233. https://doi.org/10.1080/03610929808832662 Khodabin, M., & Ahmadabadi A. R. (2010). Some properties of generalized gamma distribution. Mathematica science, 4(1). Leipnik, R. B. (January 1991). On Lognormal Random Variables: I-The Characteristic Function. Journal of the Australian Mathematical Society Series B, 32(3), 327-347. https://doi.org/10.1017/S0334270000006901 Jiang, R., & Murthy, D. N. P. (2011). A study of Weibull shape parameter: Properties and significance. Reliability Engineering and System Safety, 96(12), 1619-26. https://doi.org/10.1016/j.ress.2011.09.003 Bauckhage, C. (2013). Computing the Kullback-Leibler Divergence between two Weibull Distributions. Sharif, M. N., & Islam, M. N. (1980). The Weibull distribution as a general model for forecasting technological change. 101 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 Technological Forecasting and Social Change, 18(3), 247-56. https://doi.org/10.1016/0040-1625(80)90026-8 Johnson, N. L., Kotz, S., & Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 21. Wiley, New York. Weibull, W. (1951). A Statistical Distribution Function of Wide Applicability. Journal of Applied Mechanics. Park, S. Y., & Bera, A. K. (2009). Maximum entropy autoregressive conditional heteroskedasticity model. Journal of Econometrics, 150(2), 219-230. Zhi-Sheng, Y., & Nan, C. (2017) Closed-Form Estimators for the Gamma Distribution Derived from Likelihood Equations. The American Statistician, 71(2), 177-181. Ahrens, J. H., & Dieter, U. (1974). Computer methods for sampling from gamma, beta. Poisson and binomial distributions Computing, 12(3), 223-246. https://doi.org/10.1007/BF02293108 Choi, S. C., & Wette, R. (1969). Maximum Likelihood Estimation of the Parameters of the Gamma Distribution and Their Bias. Technometrics, 11(4), 683-690. https://doi.org/10.1080/00401706.1969.10490731 Marsaglia, G., & Tsang, W. W. (2000). A simple method for generating gamma variables. ACM Transactions on Mathematical Software, 26(3), 363-372. https://doi.org/10.18637/jss.v005.i08 Minka, T. P. (2002). Estimating a Gamma distribution. Rossi, R. J. (2018). Mathematical Statistics : An Introduction to Likelihood Based Inference. New York: John Wiley and Sons. p. 227. https://doi.org/10.1002/9781118771075 Myung, I. J. (2003). Tutorial on Maximum Likelihood Estimation. Journal of Mathematical Psychology, 47(1), 90-100. https://doi.org/10.1016/S0022-2496(02)00028-7 Gould, W., Pitblado, J., & Poi, B. (2010). Maximum Likelihood Estimation with Stata (Fourth ed.). College Station: Stata Press. pp. 13-20. Pratt, J. W. (1976). F. Y. Edgeworth and R. A. Fisher on the efficiency of maximum likelihood estimation. The Annals of Statistics, 4(3), 501-514. https://doi.org/10.1214/aos/1176343457 Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162-176. https://doi.org/10.1214/ss/1030037906 Copyrights Copyright for this article is retained by the author(s), with first publication rights granted to the journal. This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/). 102