A frequent statistical problem is that of predicting a set of quantities given the values of some... more A frequent statistical problem is that of predicting a set of quantities given the values of some covariates, and the information provided by a training sample. These prediction problems are often structured with hierarchical models that make use of the similarities existing within classes of the population. Hierarchical models are typically based on a 'natural' definition of the clustering which defines the hierarchy, which is context dependent. However, there is no assurance that this 'natural' clustering is optimal in any sense for the stated prediction purposes. In this paper we explore the this issue by treating the choice of the clustering which defines the hierarchy as a formal decision problem. Actually, the methodology described may be seen as describing a large class of new clustering algorithms. The application which motivated this research is briefly described. The argument lies entirely within the Bayesian framework.
Sort-statistics and Operations Research Transactions, 2007
Point and region estimation may both be described as specific decision problems. In point estimat... more Point and region estimation may both be described as specific decision problems. In point estimation, the action space is the set of possible values of the quantity on interest; in region estimation, the action space is the set of its possible credible regions. Foundations dictate that the solution to these decision problems must depend on both the utility function and the prior distribution. Estimators intended for general use should surely be invariant under one-to-one transformations, and this requires the use of an invariant loss function; moreover, an objective solution requires the use of a prior which does not introduce subjective elements. The combined use of an invariant information-theory based loss function, the intrinsic discrepancy, and an objective prior, the reference prior, produces a general solution to both point and region estimation problems. In this paper, estimation of the two parameters of univariate location-scale models is considered in detail from this point of view, with special attention to the normal model. The solutions found are compared with a range of conventional solutions.
It is argued that hypothesis testing problems are best considered as decision problems concerning... more It is argued that hypothesis testing problems are best considered as decision problems concerning the choice of a useful probability model. Decision theory, information measures and reference analysis, are combined to propose a non-subjective Bayesian approach to nested hypothesis testing, the Bayesian Reference Criterion (BRC). The results are compared both with frequentist based procedures, and with the use of Bayes factors. The theory is illustrated with stylized examples, where alternative approaches may easily be compared.
We perform a Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a... more We perform a Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the model parameters. Extensions, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are easily accommodated. A novelty introduced in the paper is given by the extension of the bivariate skewnormal model given in Liseo & Parisi (2013) to a more realistic p-variate skew-t model. We also introduce the R package mvst, which allows to estimate the multivariate skew-t model.
For any probability model M ≡ {p(x | θ, ω), θ ∈ Θ, ω ∈ Ω} assumed to describe the probabilistic b... more For any probability model M ≡ {p(x | θ, ω), θ ∈ Θ, ω ∈ Ω} assumed to describe the probabilistic behaviour of data x ∈ X, it is argued that testing whether or not the available data are compatible with the hypothesis H 0 ≡ {θ = θ 0 } is best considered as a formal decision problem on whether to use (a 0), or not to use (a 1), the simpler probability model (or null model) M 0 ≡ {p(x | θ 0 , ω), ω ∈ Ω}, where the loss difference L(a 0 , θ, ω) − L(a 1 , θ, ω) is proportional to the amount of information δ(θ 0 , θ, ω) which would be lost if the simplified model M 0 were used as a proxy for the assumed model M. For any prior distribution π(θ, ω), the appropriate normative solution is obtained by rejecting the null model M 0 whenever the corresponding posterior expectation δ(θ 0 , θ, ω) π(θ, ω | x) dθ dω is sufficiently large. Specification of a subjective prior is always difficult, and often polemical, in scientific communication. Information theory may be used to specify a prior, the reference prior, which only depends on the assumed model M , and mathematically describes a situation where no prior information is available about the quantity of interest. The reference posterior expectation, d(θ 0 , x) = δ π(δ | x) dδ, of the amount of information δ(θ 0 , θ, ω) which could be lost if the null model were used, provides an attractive non-negative test function, the intrinsic statistic, which is invariant under reparametrization. The intrinsic statistic d(θ 0 , x) is measured in units of information, and it is easily calibrated (for any sample size and any dimensionality) in terms of some average log-likelihood ratios. The corresponding Bayes decision rule, the Bayesian reference criterion (BRC), indicates that the null model M 0 should only be rejected if the posterior expected loss of information from using the simplified model M 0 is too large or, equivalently, if the associated expected average log-likelihood ratio is large enough. The BRC criterion provides a general reference Bayesian solution to hypothesis testing which does not assume a probability mass concentrated on M 0 and, hence, it is immune to Lindley's paradox. The theory is illustrated within the context of multivariate normal data, where it is shown to avoid Rao's paradox on the inconsistency between univariate and multivariate frequentist hypothesis testing.
Under conditions C, p(x | C), π(θ | C) are, respectively, probability densities (or mass) functio... more Under conditions C, p(x | C), π(θ | C) are, respectively, probability densities (or mass) functions of observables x and parameters θ p(x | C) ≥ 0, X p(x | C) dx = 1, E[x | C] = X x p(x | C) dx, π(θ | C) ≥ 0, Θ π(θ | C) dθ = 1, E[θ | C] = Θ θ π(θ | C) dθ. Special densities (or mass) functions use specific notation, as N(x | µ, σ), Bi(x | n, θ), or Pn(x | λ). Other examples: Beta {Be(x | α, β), 0 < x < 1, α > 0, β > 0} Be(x | α, β) = Γ(α+β) Γ(α)Γ(β) x α−1 (1 − x) β−1 Gamma {Ga(x | α, β), x > 0, α > 0, β > 0} Ga(x | α, β) = β α Γ(α) x α−1 e −βx Student {St(x | µ, σ, α), x ∈ , µ ∈ , σ > 0, α > 0} St(x | µ, σ, α) = Γ{(α+1)/2)} Γ(α/2) 1 σ √ απ 1 + 1 α x−µ σ 2 −(α+1)/2 5 • Statistical Models Statistical model generating x ∈ X X X , {p(x | θ), x ∈ X X X , θ ∈ Θ} Parameter vector θ = {θ 1 ,. .. , θ k } ∈ Θ. Parameter space Θ ⊂ k. Data set x ∈ X X X. Sampling (Outcome) space X X X , of arbitrary structure. Likelihood function of x, l(θ | x). l(θ | x) = p(x | θ), as a function of θ ∈ Θ. Maximum likelihood estimator (mle) of θ θ =θ(x) = arg sup θ∈Θ l(θ | x) Data x = {x 1 ,. .. , x n } random sample (iid) from model if p(x | θ) = n j=1 p(x j | θ), x j ∈ X , X X X = X n Behaviour under repeated sampling (general, not iid data) Considering {x 1 , x 2 ,. . .}, a (possibly infinite) sequence of possible replications of the complete data set x. Denote by x (m) = {x 1 ,. .. , x m } a finite set of m such replications. Asymptotic results obtained as m → ∞ 7 • Interpretation and calibration of the intrinsic discrepancy Let {p 1 (x | θ 1), θ 1 ∈ Θ 1 } or {p 2 (x | θ 2), θ 2 ∈ Θ 2 } be two alternative statistical models for x ∈ X, one of which is assumed to be true. The intrinsic divergence δ{θ 1 , θ 2 } = δ{p 1 , p 2 } is then minimum expected log-likelihood ratio in favour of the true model. Indeed, if p 1 (x | θ 1) true model, the expected log-likelihood ratio in its favour is E 1 [log{p 1 (x | θ 1)/p 2 (x | θ 1)}] = κ{p 2 | p 1 }. If the true model is p 2 (x | θ 2), the expected log-likelihood ratio in favour of the true model is κ{p 2 | p 1 }. But δ{p 2 | p 1 } = min[κ{p 2 | p 1 }, κ{p 1 | p 2 }]. Calibration. δ = log[100] ≈ 4.6 nits, likelihood ratios for the true model larger than 100 making discrimination very easy. δ = log(1 + ε) ≈ ε nits, likelihood ratios for the true model may about 1 + making discrimination very hard. Intrinsic Discrepancy δ 0.01 0.69 2.3 4.6 6.9 Average Likelihood Ratio for true model exp[δ] 1.01 2 10 100 1000
Bayesian Statistics is typically taught, if at all, separately from conventional frequentist meth... more Bayesian Statistics is typically taught, if at all, separately from conventional frequentist methods. It is becoming clear, however, that the emergence of powerful objective Bayesian methods (where the result, as in frequentist statistics, only depends on the assumed model and the observed data) provides a new unifying perspective on most established methods, and may be used in situations (e.g. hierarchical structures) where frequentist methods cannot. On the other hand, frequentist procedures provide mechanisms to evaluate and calibrate any statistical method. Hence, it may be the right time to consider an integrated approach to mathematical statistics, where objective Bayesian methods provide the inferential construction elements, and frequentist methods the necessary evaluations. The emphasis of this presentation will be on undergraduate courses on mathematical statistics, but the main ideas may also be applied to more basic introductory and service courses.
A Bayesian statistical approach is introduced to assess experimental data from the analyses of ra... more A Bayesian statistical approach is introduced to assess experimental data from the analyses of radionuclide activity concentration in environmental samples (low activities). A theoretical model has been developed that allows the use of known prior information about the value of the measurand (activity), together with the experimental value determined through the measurement. The model has been applied to data of the Inter-laboratory Proficiency Test organised periodically among Spanish environmental radioactivity laboratories that are producing the radiochemical results for the Spanish radioactive monitoring network. A global improvement of laboratories performance is produced when this prior information is taken into account. The prior information used in this methodology is an interval within which the activity is known to be contained, but it could be extended to any other experimental quantity with a different type of prior information available.
z = {p(z |✓, ),z 2 Z,✓ 2 ⇥, 2 ⇤}, where is some appropriately chosen nuisance parameter vector. L... more z = {p(z |✓, ),z 2 Z,✓ 2 ⇥, 2 ⇤}, where is some appropriately chosen nuisance parameter vector. Let ⇡(✓, ) = ⇡( |✓)⇡(✓) be the assumed prior, and let ⇡(✓ |z) be the corresponding marginal posterior distribution of ✓. Appreciation of the inferential contents of ⇡(✓ |z) may be enhanced by providing both point and region estimates of the vector of interest ✓, and by declaring whether or not some context-suggested specific value ✓0 is compatible with the observed data z (precise hypothesis testing). A large number of Bayesian estimation and hypothesis testing procedures have been proposed in the literature. We argue that their choice is better made in decision theoretical terms.
In multi-parameter models, reference priors typically depend on the parameter or quantity of inte... more In multi-parameter models, reference priors typically depend on the parameter or quantity of interest, and it is well known that this is necessary to produce objective posterior distributions with optimal properties. There are, however, many situations where one is simultaneously interested in all the parameters of the model or, more realistically, in functions of them that include aspects such as prediction, and it would then be useful to have a single objective prior that could safely be used to produce reasonable posterior inferences for all the quantities of interest. In this paper, we consider three methods for selecting a single objective prior and study, in a variety of problems including the multinomial problem, whether or not the resulting prior is a reasonable overall prior.
Probabilistic prediction of the value of a given observable quantity given a random sample of pas... more Probabilistic prediction of the value of a given observable quantity given a random sample of past observations of that quantity is a frequent problem in the sciences, but a problem which has not a commonly agreed solution. In this paper, Bayesian statistical methods and information theory are used to propose a new procedure which is model-free, in that no assumption is required about an underlying statistical model, and it is objective, in that a reference non-subjective prior distribution is used. The proposed method may be seen as a Bayesian analogue to conventional kernel density estimation, but one with an appropriate predictive behaviour not previously available. The procedure is illustrated with the analysis of some published astronomical data.
Discussion of "Objective Priors: An Introduction for Frequentists" by M. Ghosh [arXiv:1... more Discussion of "Objective Priors: An Introduction for Frequentists" by M. Ghosh [arXiv:1108.2120]
Rejoinder to Overall Objective Priors by James O. Berger, Jose M. Bernardo, Dongchu Sun [arXiv:15... more Rejoinder to Overall Objective Priors by James O. Berger, Jose M. Bernardo, Dongchu Sun [arXiv:1504.02689]
Comparing the means of two normal populations is a very old problem in mathematical statistics, b... more Comparing the means of two normal populations is a very old problem in mathematical statistics, but there is still no consensus about its most appropriate solution. In this paper we treat the problem of comparing two normal means as a Bayesian decision problem with only two alternatives: either to accept the hypothesis that the two means are equal, or to conclude that the observed data are, under the assumed model, incompatible with that hypothesis. The combined use of an information-theory based loss function, the intrinsic discrepancy (Bernardo and Rueda, 2002), and an objective prior function, the reference prior (Bernardo, 1979; Berger and Bernardo, 1992), produces a new solution to this old problem which, for the first time, has the invariance properties one should presumably require.
We present a decision analysis approach to the problems faced by people subject to multiple-choic... more We present a decision analysis approach to the problems faced by people subject to multiple-choice examinations, as often encountered in their education, in looking for a job, or in getting a driving permit. From the candidate's viewpoint, each question in this form of examination is a decision problem, where the decision space depends on the examination rules and the expected utility is some function of the expected score. We analyse this problem for the two basic situations which occur in practice, namely when the candidate wants to maximize his or her expected score, and when he or she wants to maximize the probability of obtaining the minimum grade required to pass, and we derive the corresponding optimal strategies. We argue that for multiple-choice examinations to be fair, candidates should be required to provide a probability distribution over the possible answers to each question, rather than merely marking the answers judged to be more likely; we then discuss the appropriate scoring rules and the corresponding optimal strategies. As an interesting byproduct, we deduce some illuminating consequences on the scoring procedures of multiple-choice examinations, as they are currently performed.
A frequent statistical problem is that of predicting a set of quantities given the values of some... more A frequent statistical problem is that of predicting a set of quantities given the values of some covariates, and the information provided by a training sample. These prediction problems are often structured with hierarchical models that make use of the similarities existing within classes of the population. Hierarchical models are typically based on a 'natural' definition of the clustering which defines the hierarchy, which is context dependent. However, there is no assurance that this 'natural' clustering is optimal in any sense for the stated prediction purposes. In this paper we explore the this issue by treating the choice of the clustering which defines the hierarchy as a formal decision problem. Actually, the methodology described may be seen as describing a large class of new clustering algorithms. The application which motivated this research is briefly described. The argument lies entirely within the Bayesian framework.
Sort-statistics and Operations Research Transactions, 2007
Point and region estimation may both be described as specific decision problems. In point estimat... more Point and region estimation may both be described as specific decision problems. In point estimation, the action space is the set of possible values of the quantity on interest; in region estimation, the action space is the set of its possible credible regions. Foundations dictate that the solution to these decision problems must depend on both the utility function and the prior distribution. Estimators intended for general use should surely be invariant under one-to-one transformations, and this requires the use of an invariant loss function; moreover, an objective solution requires the use of a prior which does not introduce subjective elements. The combined use of an invariant information-theory based loss function, the intrinsic discrepancy, and an objective prior, the reference prior, produces a general solution to both point and region estimation problems. In this paper, estimation of the two parameters of univariate location-scale models is considered in detail from this point of view, with special attention to the normal model. The solutions found are compared with a range of conventional solutions.
It is argued that hypothesis testing problems are best considered as decision problems concerning... more It is argued that hypothesis testing problems are best considered as decision problems concerning the choice of a useful probability model. Decision theory, information measures and reference analysis, are combined to propose a non-subjective Bayesian approach to nested hypothesis testing, the Bayesian Reference Criterion (BRC). The results are compared both with frequentist based procedures, and with the use of Bayes factors. The theory is illustrated with stylized examples, where alternative approaches may easily be compared.
We perform a Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a... more We perform a Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the model parameters. Extensions, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are easily accommodated. A novelty introduced in the paper is given by the extension of the bivariate skewnormal model given in Liseo & Parisi (2013) to a more realistic p-variate skew-t model. We also introduce the R package mvst, which allows to estimate the multivariate skew-t model.
For any probability model M ≡ {p(x | θ, ω), θ ∈ Θ, ω ∈ Ω} assumed to describe the probabilistic b... more For any probability model M ≡ {p(x | θ, ω), θ ∈ Θ, ω ∈ Ω} assumed to describe the probabilistic behaviour of data x ∈ X, it is argued that testing whether or not the available data are compatible with the hypothesis H 0 ≡ {θ = θ 0 } is best considered as a formal decision problem on whether to use (a 0), or not to use (a 1), the simpler probability model (or null model) M 0 ≡ {p(x | θ 0 , ω), ω ∈ Ω}, where the loss difference L(a 0 , θ, ω) − L(a 1 , θ, ω) is proportional to the amount of information δ(θ 0 , θ, ω) which would be lost if the simplified model M 0 were used as a proxy for the assumed model M. For any prior distribution π(θ, ω), the appropriate normative solution is obtained by rejecting the null model M 0 whenever the corresponding posterior expectation δ(θ 0 , θ, ω) π(θ, ω | x) dθ dω is sufficiently large. Specification of a subjective prior is always difficult, and often polemical, in scientific communication. Information theory may be used to specify a prior, the reference prior, which only depends on the assumed model M , and mathematically describes a situation where no prior information is available about the quantity of interest. The reference posterior expectation, d(θ 0 , x) = δ π(δ | x) dδ, of the amount of information δ(θ 0 , θ, ω) which could be lost if the null model were used, provides an attractive non-negative test function, the intrinsic statistic, which is invariant under reparametrization. The intrinsic statistic d(θ 0 , x) is measured in units of information, and it is easily calibrated (for any sample size and any dimensionality) in terms of some average log-likelihood ratios. The corresponding Bayes decision rule, the Bayesian reference criterion (BRC), indicates that the null model M 0 should only be rejected if the posterior expected loss of information from using the simplified model M 0 is too large or, equivalently, if the associated expected average log-likelihood ratio is large enough. The BRC criterion provides a general reference Bayesian solution to hypothesis testing which does not assume a probability mass concentrated on M 0 and, hence, it is immune to Lindley's paradox. The theory is illustrated within the context of multivariate normal data, where it is shown to avoid Rao's paradox on the inconsistency between univariate and multivariate frequentist hypothesis testing.
Under conditions C, p(x | C), π(θ | C) are, respectively, probability densities (or mass) functio... more Under conditions C, p(x | C), π(θ | C) are, respectively, probability densities (or mass) functions of observables x and parameters θ p(x | C) ≥ 0, X p(x | C) dx = 1, E[x | C] = X x p(x | C) dx, π(θ | C) ≥ 0, Θ π(θ | C) dθ = 1, E[θ | C] = Θ θ π(θ | C) dθ. Special densities (or mass) functions use specific notation, as N(x | µ, σ), Bi(x | n, θ), or Pn(x | λ). Other examples: Beta {Be(x | α, β), 0 < x < 1, α > 0, β > 0} Be(x | α, β) = Γ(α+β) Γ(α)Γ(β) x α−1 (1 − x) β−1 Gamma {Ga(x | α, β), x > 0, α > 0, β > 0} Ga(x | α, β) = β α Γ(α) x α−1 e −βx Student {St(x | µ, σ, α), x ∈ , µ ∈ , σ > 0, α > 0} St(x | µ, σ, α) = Γ{(α+1)/2)} Γ(α/2) 1 σ √ απ 1 + 1 α x−µ σ 2 −(α+1)/2 5 • Statistical Models Statistical model generating x ∈ X X X , {p(x | θ), x ∈ X X X , θ ∈ Θ} Parameter vector θ = {θ 1 ,. .. , θ k } ∈ Θ. Parameter space Θ ⊂ k. Data set x ∈ X X X. Sampling (Outcome) space X X X , of arbitrary structure. Likelihood function of x, l(θ | x). l(θ | x) = p(x | θ), as a function of θ ∈ Θ. Maximum likelihood estimator (mle) of θ θ =θ(x) = arg sup θ∈Θ l(θ | x) Data x = {x 1 ,. .. , x n } random sample (iid) from model if p(x | θ) = n j=1 p(x j | θ), x j ∈ X , X X X = X n Behaviour under repeated sampling (general, not iid data) Considering {x 1 , x 2 ,. . .}, a (possibly infinite) sequence of possible replications of the complete data set x. Denote by x (m) = {x 1 ,. .. , x m } a finite set of m such replications. Asymptotic results obtained as m → ∞ 7 • Interpretation and calibration of the intrinsic discrepancy Let {p 1 (x | θ 1), θ 1 ∈ Θ 1 } or {p 2 (x | θ 2), θ 2 ∈ Θ 2 } be two alternative statistical models for x ∈ X, one of which is assumed to be true. The intrinsic divergence δ{θ 1 , θ 2 } = δ{p 1 , p 2 } is then minimum expected log-likelihood ratio in favour of the true model. Indeed, if p 1 (x | θ 1) true model, the expected log-likelihood ratio in its favour is E 1 [log{p 1 (x | θ 1)/p 2 (x | θ 1)}] = κ{p 2 | p 1 }. If the true model is p 2 (x | θ 2), the expected log-likelihood ratio in favour of the true model is κ{p 2 | p 1 }. But δ{p 2 | p 1 } = min[κ{p 2 | p 1 }, κ{p 1 | p 2 }]. Calibration. δ = log[100] ≈ 4.6 nits, likelihood ratios for the true model larger than 100 making discrimination very easy. δ = log(1 + ε) ≈ ε nits, likelihood ratios for the true model may about 1 + making discrimination very hard. Intrinsic Discrepancy δ 0.01 0.69 2.3 4.6 6.9 Average Likelihood Ratio for true model exp[δ] 1.01 2 10 100 1000
Bayesian Statistics is typically taught, if at all, separately from conventional frequentist meth... more Bayesian Statistics is typically taught, if at all, separately from conventional frequentist methods. It is becoming clear, however, that the emergence of powerful objective Bayesian methods (where the result, as in frequentist statistics, only depends on the assumed model and the observed data) provides a new unifying perspective on most established methods, and may be used in situations (e.g. hierarchical structures) where frequentist methods cannot. On the other hand, frequentist procedures provide mechanisms to evaluate and calibrate any statistical method. Hence, it may be the right time to consider an integrated approach to mathematical statistics, where objective Bayesian methods provide the inferential construction elements, and frequentist methods the necessary evaluations. The emphasis of this presentation will be on undergraduate courses on mathematical statistics, but the main ideas may also be applied to more basic introductory and service courses.
A Bayesian statistical approach is introduced to assess experimental data from the analyses of ra... more A Bayesian statistical approach is introduced to assess experimental data from the analyses of radionuclide activity concentration in environmental samples (low activities). A theoretical model has been developed that allows the use of known prior information about the value of the measurand (activity), together with the experimental value determined through the measurement. The model has been applied to data of the Inter-laboratory Proficiency Test organised periodically among Spanish environmental radioactivity laboratories that are producing the radiochemical results for the Spanish radioactive monitoring network. A global improvement of laboratories performance is produced when this prior information is taken into account. The prior information used in this methodology is an interval within which the activity is known to be contained, but it could be extended to any other experimental quantity with a different type of prior information available.
z = {p(z |✓, ),z 2 Z,✓ 2 ⇥, 2 ⇤}, where is some appropriately chosen nuisance parameter vector. L... more z = {p(z |✓, ),z 2 Z,✓ 2 ⇥, 2 ⇤}, where is some appropriately chosen nuisance parameter vector. Let ⇡(✓, ) = ⇡( |✓)⇡(✓) be the assumed prior, and let ⇡(✓ |z) be the corresponding marginal posterior distribution of ✓. Appreciation of the inferential contents of ⇡(✓ |z) may be enhanced by providing both point and region estimates of the vector of interest ✓, and by declaring whether or not some context-suggested specific value ✓0 is compatible with the observed data z (precise hypothesis testing). A large number of Bayesian estimation and hypothesis testing procedures have been proposed in the literature. We argue that their choice is better made in decision theoretical terms.
In multi-parameter models, reference priors typically depend on the parameter or quantity of inte... more In multi-parameter models, reference priors typically depend on the parameter or quantity of interest, and it is well known that this is necessary to produce objective posterior distributions with optimal properties. There are, however, many situations where one is simultaneously interested in all the parameters of the model or, more realistically, in functions of them that include aspects such as prediction, and it would then be useful to have a single objective prior that could safely be used to produce reasonable posterior inferences for all the quantities of interest. In this paper, we consider three methods for selecting a single objective prior and study, in a variety of problems including the multinomial problem, whether or not the resulting prior is a reasonable overall prior.
Probabilistic prediction of the value of a given observable quantity given a random sample of pas... more Probabilistic prediction of the value of a given observable quantity given a random sample of past observations of that quantity is a frequent problem in the sciences, but a problem which has not a commonly agreed solution. In this paper, Bayesian statistical methods and information theory are used to propose a new procedure which is model-free, in that no assumption is required about an underlying statistical model, and it is objective, in that a reference non-subjective prior distribution is used. The proposed method may be seen as a Bayesian analogue to conventional kernel density estimation, but one with an appropriate predictive behaviour not previously available. The procedure is illustrated with the analysis of some published astronomical data.
Discussion of "Objective Priors: An Introduction for Frequentists" by M. Ghosh [arXiv:1... more Discussion of "Objective Priors: An Introduction for Frequentists" by M. Ghosh [arXiv:1108.2120]
Rejoinder to Overall Objective Priors by James O. Berger, Jose M. Bernardo, Dongchu Sun [arXiv:15... more Rejoinder to Overall Objective Priors by James O. Berger, Jose M. Bernardo, Dongchu Sun [arXiv:1504.02689]
Comparing the means of two normal populations is a very old problem in mathematical statistics, b... more Comparing the means of two normal populations is a very old problem in mathematical statistics, but there is still no consensus about its most appropriate solution. In this paper we treat the problem of comparing two normal means as a Bayesian decision problem with only two alternatives: either to accept the hypothesis that the two means are equal, or to conclude that the observed data are, under the assumed model, incompatible with that hypothesis. The combined use of an information-theory based loss function, the intrinsic discrepancy (Bernardo and Rueda, 2002), and an objective prior function, the reference prior (Bernardo, 1979; Berger and Bernardo, 1992), produces a new solution to this old problem which, for the first time, has the invariance properties one should presumably require.
We present a decision analysis approach to the problems faced by people subject to multiple-choic... more We present a decision analysis approach to the problems faced by people subject to multiple-choice examinations, as often encountered in their education, in looking for a job, or in getting a driving permit. From the candidate's viewpoint, each question in this form of examination is a decision problem, where the decision space depends on the examination rules and the expected utility is some function of the expected score. We analyse this problem for the two basic situations which occur in practice, namely when the candidate wants to maximize his or her expected score, and when he or she wants to maximize the probability of obtaining the minimum grade required to pass, and we derive the corresponding optimal strategies. We argue that for multiple-choice examinations to be fair, candidates should be required to provide a probability distribution over the possible answers to each question, rather than merely marking the answers judged to be more likely; we then discuss the appropriate scoring rules and the corresponding optimal strategies. As an interesting byproduct, we deduce some illuminating consequences on the scoring procedures of multiple-choice examinations, as they are currently performed.
Uploads
Papers by Jose Bernardo