Academia.eduAcademia.edu

Marketing models of consumer heterogeneity

1998, Journal of Econometrics

Journal of Econometrics 89 (1999) 57—78 Marketing models of consumer heterogeneity Greg M. Allenby!,*, Peter E. Rossi" ! Max M. Fisher College of Business, Ohio State University, 1775 College Road, Columbus, OH 43210, USA " Graduate School of Business, University of Chicago, Chicago, IL 60637, USA Abstract The distribution of consumer preferences plays a central role in many marketing activities. Pricing and product design decisions, for example, are based on an understanding of the differences among consumers in price sensitivity and valuation of product attributes. In addition, marketing activities which target specific households require household level parameter estimates. Thus, the modeling of consumer heterogeneity is the central focus of many statistical marketing applications. In contrast, heterogeneity is often regarded as an ancillary nuisance problem in much of the applied econometrics literature which must be dealt with but is not the focus of the investigation. The focus is instead on estimating average effects of policy variables. In this paper, we discuss various approaches to modeling consumer heterogeneity and evaluate the utility of these approaches for marketing applications. ( 1999 Elsevier Science S.A. All rights reserved. JEL classification: C11; C33; C35 Keywords: Random effects; Heterogeneity; Probit models The purpose of marketing is to understand consumer preferences and to help design and deliver appropriate goods and services. Marketers are interested in determining what products to offer, what prices to charge, how the products should be promoted and how to best deliver the products to the consumer. One of the greatest challenges in marketing is to understand the diversity of preferences and sensitivities that exists in the market. Heterogeneity in preferences gives rise to differentiated product offerings, market segments and market * Corresponding author. E-mail: gm#@osu.edu.peter.rossi@gsb.uchicago.edu 0304-4076/99/$ — see front matter ( 1999 Elsevier Science S.A. All rights reserved. PII: S 0 3 0 4 - 4 0 7 6 ( 9 8 ) 0 0 0 5 5 - 4 58 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 niches. Differing sensitivities are the basis for targeted communication programs and promotions. As consumer preferences and sensitivities become more diverse, it becomes less and less efficient to consider the market in the aggregate. In contrast to this emphasis on individual differences, economists are often more interested in aggregate effects and regard heterogeneity as a statistical nuisance parameter problem which must be addressed but not emphasized. Econometricians frequently employ methods which do not allow for the estimation of individual-level parameters. For example, random coefficient models are often implemented through an unconditional likelihood approach in which only hyper-parameters are estimated. Furthermore, the models of heterogeneity considered in the econometrics literature often restrict heterogeneity to subsets of parameters such as model intercepts. In the marketing context, there is no reason to believe that differences should be confined to the intercepts and, as indicated above, differences in slope coefficients are critically important. Finally, economic policy evaluation is frequently based on estimated hyperparameters which are measured with much greater certainty than individuallevel parameters. This is in contrast to marketing policies which often attempt to respond to individual differences that are measured less precisely. The determination of optimal marketing decisions must account for the substantial uncertainty that exists in individual-level estimates, and the impact of this uncertainty on decision criteria which often involve non-linear functions of model parameters. Marketing practices which are designed to exploit consumer differences require flexible models of heterogeneity coupled with an inference method which adequately describes uncertainty in consumer level estimates. The goal of this paper is to outline an approach to this problem which is the culmination of developments from a series of papers (Rossi and Allenby, 1993; McCulloch and Rossi, 1994; Rossi, McCulloch and Allenby, 1996). We advocate a continuous model of heterogeneity and employ a Bayesian approach to inference. We contrast this approach with discrete models of heterogeneity popular in the marketing literature. We also comment on non-Bayesian methods of inference which dominate the econometrics literature. An example of price promotions targeted to the household is used to illustrate the differences between different models of heterogeneity and methods of inference. The remainder of the paper is organized as follows: Section 1 examines issues in modeling heterogeneity, contrasting various approaches that appear in the econometric and marketing literature. Section 2 introduces the discrete choice model used to explore the heterogeneity issues, and Section 3 provides an empirical application using a scanner panel dataset of household ketchup purchases. Section 4 then discusses various marketing policies related to the analysis of this type of data, and illustrates the benefit of a detailed understanding of the extent and nature of heterogeneity. Section 5 offers concluding remarks. G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 59 1. Issues in modeling heterogeneity Data describing consumer preferences and sensitivities to variables such as price are typically obtained through surveys or household purchase histories which yield very limited individual-level information. For example, household purchases in most product categories often total less than 12 per year. Similarly, survey respondents become fatigued and irritable when questioned for more than 20 or 30 min. As a result, the amount of data available for drawing inferences about any specific consumer is very small, although there may exist many consumers in a particular study. The fixed-effects approach to heterogeneity has much appeal since it delivers the individual household level parameter estimates and does not require the specification of any particular probability distribution of heterogeneity. However, the sparseness of individual-level data renders this approach impractical. In many situations, incomplete household-level data causes a lack of identification at the household level. In other cases, the parameters are identified in the household level likelihood but the fixed effects estimates are measured with huge uncertainty which is difficult to quantify using standard asymptotic methods. 1.1. Inference in random coefficient models An alternative approach is to specify a random-effects model which stochastically pools data across consumers (Heckman, 1982). This approach is frequently employed in the analysis of panel data and has been extensively applied in both marketing and economic studies. Examples include Chamberlain (1984), Heckman and Singer (1984), Kamakura and Russell (1989), Chintagunta et al. (1991), and Gonul and Srinivasan (1993). In the random effects model, individual household level parameters are viewed as draws from a super-population distribution which is termed the random effects or mixing distribution. Traditionally, the random effects distribution is viewed as being a part of the likelihood. To make this discussion more concrete, we now introduce a generic notation for this problem. The likelihood of the household-specific parameters Ma N and the common parameters of the mixing distribution can be written as i N l(Ma N, b),p(dataDMa N, b)" < p(dataDa )n(a Db). i i i i i/1 (1) Here ‘i’ denotes the ith of N consumers, l denotes the likelihood, a is the vector i of individual parameters, and n(a Db) is the random effects distribution indexed i by the parameter b. In most econometric applications, interest focuses on the common parameters, b, of the super-population distribution. For example in an economic analysis, we might be interested in the average effects of a policy 60 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 intervention. Inference about the super-population parameters is conducted with the average conditional likelihood or the marginal likelihood which is obtained by integrating out the Ma N. i P N l(b),p(dataDb)" < p(data Da )n(a Db) da . i i i i i/1 (2) Outside of normal linear models with a normal random coefficient distribution performing the integral in Eq. (2) can be computationally challenging. Recent advances in simulation methods have considerably expanded the set of models for which it is now possible to conduct marginal likelihood inference using Eq. (2) (see Hajivassiliou and Ruud, 1994; Mariano et al., 1997). In the econometric literature, little attention has been paid to the problem of obtaining household level estimates. Unfortunately, many marketing problems require knowledge of the parameters of particular customers. For example, marketers are often interested in targeting their communications and offers to subpopulations who they expect will react most favorably. The identification and characterization of these consumers requires knowledge of a as well as b. In i this and other cases, knowledge of the distribution of heterogeneity is insufficient to identify optimal marketing actions. An approximate Bayesian approach to this problem would be to estimate b by maximizing Eq. (2) and then using n(a Db"bK ) as a prior in the analysis of an individual household conditional i likelihood. ) p(data Da )n(a Db"bK ). p(a Ddata)J i i i i (3) This approximation is justified when there are enough consumers in the study such that b is precisely estimated and any one consumer does not have a large influence on its estimate. In many problems such as the choice model application considered in Section 2 below, this analysis of household level likelihoods would require numerical integration methods as the prior and likelihood will not ‘mix’ to form a simple posterior (see Rossi and Allenby, 1993 for an illustration of this approach). In this paper, we outline a comprehensive approach to Bayesian analysis of this problem which yields both household level and common parameter inferences without resort to approximate methods. In theory, information about individual parameters can be obtained from the random-effects model which does not integrate the likelihood as in Eq. (2), and instead considers the joint distribution of Ma N and b. This can be accomplished i by recasting Eq. (1) as a hierarchical Bayes model: N n(Ma N, bDdata)J < l(data Da )n(a Db)n(b), i i i i i/1 (4) G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 61 where n(b) is a prior distribution placed on b so that the joint (posterior) distribution is defined. In this Bayesian formulation, the mixing distribution is considered as part of the prior distribution. Inferences about the preferences and sensitivities of a specific individual are obtained by appropriately marginalizing the joint posterior distribution: P n(a Ddata)" n(Ma N, bDdata) da db, j i ~j (5) where ‘!j’ reads ‘all units except j.’ The advantage of this approach is that all the data in the study are used to derive individual-level estimates. This helps insure that the parameters are identified and results in more accurate household parameter estimates via parameter shrinkage in which the random effects distribution is used a prior and the household parameter estimates are ‘shrunk’ toward the mean. In addition, Eq. (5) provides information about the entire distribution of a and not just some measure of central tendency. A common practice in many j studies in marketing and economics is to employ point estimates of model parameters when considering the implications of the estimated parameters. For example, the effect of a general change in price can often be accurately evaluated in linear models by considering the average price sensitivity as reflected in b. However in nonlinear models, and in situations where the average affects are not of primary interest then the impact of parameter uncertainty needs to be considered. Individual-level estimates are much less accurate than aggregatelevel estimates, and this uncertainty needs to be explicitly considered when designing marketing activities which are tailored to specific individuals. 1.2. Models of heterogenity: Continuous vs. discrete distributions Until this point, we have not specified the form of the random coefficient or mixing distribution, n( ). It has become popular in the marketing literature to specify a discrete distribution for n(aDb)"+J u I(a"b ), often termed a finite j/1 j j mixture model (c.f. Titterington et al., 1985; Kamakura and Russell, 1989; Chintagunta et al., 1991; Heckman and Singer, 1984). The finite mixture approach is popular in part due to the fact that the marginal likelihood is easily evaluated as a sum over the J mass points. Heckman and Singer (1984) have emphasized the flexibility of the discrete distribution in that, for sufficient mass points, any distribution can be approximated to a high degree of accuracy. In practice, however, researchers find it difficult to estimate finite mixture models with more than a half dozen or so mass points. While the finite mixture model may approximate the central tendencies of the mixing distribution, there is a growing body of evidence which suggests that finite mixture models with 62 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 a small number of mass points may inadequately capture the full extent of heterogeneity in the data. A disadvantage of the finite mixture model is that the posterior distribution of individual-level parameters is constrained to lie with the convex hull of b . As a result, these finite mixture models lead to posterior j estimates of individual effects which are much less heterogeneous that those obtained from continuous mixing distributions (see Allenby and Ginter, 1995b; Lenk et al., 1996; Rossi et al., 1996). Recent advances in Markov Chain Monte Carlo (MCMC) estimation now permit the evaluation of continuous random effect models and the exact posterior distribution of individual effects Eq. (5). In addition, MCMC methods provide a new tool for analysis of certain discrete choice models for which the likelihood itself is computationally challenging to evaluation. To illustrate these points, we introduce a discrete choice model of household purchases in the next section and contrast it to alternative formulations found in the economics and marketing literature. 2. Approaches to incorporating heterogenity in choice models In this section, we discuss approaches to modeling heterogeneity in the context of discrete choice models. The basic model we will employ in this discussion is an independence Probit in a household panel data setting. We observe I , the choices of H households, each for ¹ periods. I is a multih,t h h,t nomial outcome conditional on a set of covariates, X . In the standard random h,t utility framework, these multinomial outcomes are thought to result from an underlying latent regression model. y "X b #e h,t h,t h h,t e &iid N(0, K). h,t (6) We assume there are m possible brand choices and that households choose the brand with the highest utility. I is the index of the maximum of the elements of h,t y . X is a matrix of choice characteristics which includes an intercept term for h,t h,t each of the m brands and price, display, and feature variables, X " h,t [I , p , d , f ] where I is the m]m identity matrix, p is an m vector of log m h,t h,t h,t m h,t prices, d is an indicator vector of length m such that the ith element is 1 if the h,t ith brand is on display and 0 otherwise, and f is an indicator vector for feature. h,t b is a vector representing the household h’s preferences and sensitivity to h marketing mix variables and e is an error term. Given the short length of our panel h,t data (the typical household is observed for at a maximum of 1.5 yr), it seems reasonable to assume that the b parameters do not vary over time; that is, we h assume that brand preferences and marketing mix sensitivities are not time-varying. Different specifications of the error structure result in various probit (normal error) and logit (extreme value error) models. We use a diagonal covariance G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 63 structure e &iid N(0, K), where K is a m]m diagonal matrix. A scalar h,t covariance structure (equal variances for each choice alternative random utility errors) yields a Probit model which displays the restrictive IIA property (Hausman and Ruud, 1987; Allenby and Ginter, 1995a). Finally, we should note that if we specify a random effects model for the b parameters, then we have specified h a complicated correlated probit model with heteroskedastic errors (see Section 2.2 for further elaboration of this point). Below we consider alternative approaches to modeling heterogeneity in household preferences and sensitivities. 2.1. Bayesian approaches Consider a continuous random coefficient model in which the Mb N are drawn h from a multivariate normal distribution: b &iid N(bM , » ). h b (7) While the normal distribution is a reasonable starting point for a continuous distribution of heterogeneity, more flexible distributions are possible. For example, Allenby et al. (1995b) and Rossi et al. (1996) introduce observable household characteristics into the specification of bM . McCulloch and Rossi (1996) outline a strategy for using a mixture of normals to provide additional flexibility (see also, Allenby et al., 1998). In Section 3 below, we consider some diagnostic evidence regarding the appropriateness of the normal random coefficient distribution in our data context. As discussed in Section 1, we introduce priors for the parameters to insure that the posterior distribution is defined. For convenience, we use natural conjugate priors in which the prior on bM is normal and the prior on » is b inverted Wishart: bM &N(bO , a» ) b (8) »~1&¼(v , » ). b 0 0 (9) and In our empirical analysis, we employ relatively diffuse priors for bM and » since b we do not have direct information on these quantities. More specifically, we specify these priors with the following parameters: bO "0, a"10, v "k#4, » "v I. In our analysis below, we found our results to be insensi0 0 0 tive to doubling a and v , indicating that we have indeed specified a prior which 0 is diffuse relative to the likelihood. 64 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 To complete the Bayesian model, we must formulate a prior for the elements of ". We use the standard natural conjugate Inverted Gamma priors where: K"diag(1, j ,2,j )Jj &IG(l, s ). 2 m i i (10) Again, we employ very diffuse settings for these IG priors, letting l"3 and s "1.0. Note that we have set j "1 for identification purposes. i 1 Our complete hierarchical model is specified by a series of conditional distributions. At the bottom of the hierarchy is the latent utility regression model conditional on b . The random utility regression is followed by successiveh ly higher levels of priors which incorporate views about the distribution of the b coefficients. Here we use the notation that yDx is the conditional distribution ) of y given x. I Dy , h,t h,t (11) y DX , b , K, h,t h,t h (12) b DbM , » , h b (13) bM DbO , a» , b (14) » Dv , » , b 0 0 (15) KDn, s, (16) where y@ "(y@ ,2, y@ ) and X@ "[X@ ,2, X@ ]. Eqs. (11)—(16) give the h,Th h,1 h h,Th h,1 h conditional likelihood for b and the random coefficient model. The hierarchy is h constructed by these sets of conditional distributions which combine to specify the model. From a Bayesian perspective, the random coefficient model in Eq. (4) is part of a prior in which the household coefficients are viewed as having some commonality through the mixing or random coefficient distribution. Our goal in the analysis of household choice data is not only to describe the extent and nature of household heterogeneity but also to make inferences about specific households for the purpose of customizing various marketing actions. Fortunately, Bayesian analysis of hierarchical models with Markov Chain Monte Carlo methods has made it feasible to conduct inference about both household level and common random coefficient parameters. Bayesian analysis of hierarchical models has been made feasible by the development of Markov chain simulation methods which directly exploit the hierarchical structure (see Gelfand and Smith, 1990; Gelfand et al., 1990; Tierney, 1991 for general discussion of these methods). The basic idea behind G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 65 these methods is to construct a Markov chain which has the posterior as its stationary or invariant distribution and then simulate the chain to obtain a sequence of draws which can be used to approximate the posterior to any desired degree of accuracy. In this paper, we use the Gibbs sampler constructed for the hierarchical MNP model by McCulloch and Rossi (1994) (Allenby and Lenk (1994) consider a logistic normal regression model applied to scanner panel data). The Gibbs sampler is implemented by drawing successively from the following set of posterior distributions which are based on the data consisting of the X explanatory variables and I (the index of observed choices). The ht h,t exact forms for these conditional distributions are given in McCulloch et al. (1995). Our Gibbs sampler proceeds by drawing successively from each of the distributions above and iterating this procedure to obtain a long sequence of draws. These draws are then used to compute the marginal posterior distribution of various quantities of interest. There are a number of technical issues which arise in using these sorts of procedures (see McCulloch and Rossi, 1994; Gelman and Rubin, 1992 for a thorough discussion of these issues). 2.2. Classical approaches Parametric random coefficient models have been widely used in econometrics but usually from the classical point of view. This means that the model of heterogeneity laid out in Section 2.1 is used to average the conditional likelihood given b (see Eq. (1)). In the probit model with a normal random coefficient h distribution, substitution of the random coefficient distribution into the latent variable regression results in a correlated probit with a heteroskedastic error structure (as in Hausman and Wise, 1978) y "X (bM #v )#e , h,t h,t h h,t (17) y "X bM #u . h,t h,t h,t (18) Here u has the following covariance structure: h,t K#X » X@ h,1 b h,1 C Var(u )" h,t 2 Xh,1»bX@h,Th } F K#X » X@ Th,Th b Th,Th D . (19) This results in a very high dimensional correlated probit problem with a (m!1)¹ dimensional integrals required to evaluate the unconditional probit h likelihood. Advances in evaluation of these integrals via simulation methods have made the method of simulated maximum likelihood feasible for problems with small to moderate values of ¹ . It is important to insure that adequate h 66 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 numbers of draws are used in evaluating the likelihood for the method of simulated maximum likelihood and the method of simulated scores (see Geweke et al., 1996). Without sampling experiments such as those pursued in Geweke et al., it is hard to establish, a priori, adequate number of simulations in order to assure good performance of maximum likelihood methods. In many marketing contexts, we desire estimates of the household level parameter vector in order to tailor marketing actions to specific households. The standard classical approach to random effects models does not provide these estimates automatically. An approximate Bayes procedure can be developed to make inferences at the household level. As shown in Eq. (3) we can use the random coefficient model conditional on the maximum likelihood estimates of the super-population or common parameters as a prior in the analysis of each households likelihood function. . p(b DX , y .)Jp(y DX , b , KK )p(b DbMK , »K ). h h h h h h h b (20) This is an approximate Bayesian analysis since it is conditional on the estimated parameters and the ‘prior’ on b is based on all of the data including the data for ) this household. It should be noted that numerical integration methods will have to be used to find the posterior mean of b using the approximate posterior h above. Thus, the classical approach for inferring about household parameters is substantially more computationally demanding than the full Bayesian approach and offers only approximate answers. 3. Empirical application The hierarchical Bayes and finite mixture models described above were fit to a scanner panel dataset of ketchup purchases of households residing in Springfield, MO. Four brands of ketchup in 32 oz containers were considered in our analysis: Heinz, Hunt’s, Del Monte and a House Brand. The data is comprised of 8191 purchases by 1401 households during an 18 month period beginning in 1986. Explanatory variables include price and dummy variables for the presence of display and feature advertising activity. Summary statistics are provided in Table 1. Identification of household level parameters is informationally demanding and it is encouraging to see the large standard deviations of the price variables which is produced by frequent price promotions in which prices are reduced by 25% or more. 3.1. Comparison with finite mixture approaches Parameter estimates from the finite mixture model are provided in Table 2. The number of mixing components were determined by examining the Bayesian G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 67 Table 1 Descriptive statistics Brand Heinz Hunt’s Del Monte House Choice share Avg price (SD) ($) 0.429 0.251 0.103 0.217 1.239 1.263 1.287 0.775 % of Time (0.28) (0.25) (0.33) (0.10) Displayed Featured 0.082 0.110 0.056 0.086 0.132 0.057 0.010 0.047 Table 2 Finite mixture model parameter estimates(standard errors) Components 1 Mass 2 3 4 5 6 0.13 0.29 0.06 0.25 0.20 0.07 !3.19 (0.43) !3.68 (0.51) !7.61 (0.97) !3.89 (0.95) 0.57 (0.32) 0.19 (0.23) !0.66 (0.08) !1.48 (0.10) !3.76 (0.20) !3.11 (0.19) 1.03 (0.11) 1.51 (0.15) 0.94 (0.12) !1.05 (0.29) !3.01 (0.38) !2.02 (0.32) 0.55 (0.25) 0.65 (0.30) 0.17 (0.07) !0.20 (0.08) !3.53 (0.21) !4.63 (0.28) 1.59 (0.13) 0.76 (0.12) 0.26 (0.09) !0.48 (0.13) 0.02 (0.14) !3.87 (0.12) 1.37 (0.12) 1.26 (0.14) !0.46 (0.14) 0.34 (0.10) !1.53 (0.26) !3.33 (0.33) 0.46 (0.19) 0.16 (0.22) Parameter Hunt’s Del Monte House ln Price Display Feature ¸ambda (exp[j]) Hunt’s Del Monte House !0.62 (0.12) !0.48 (0.09) 0.48 (0.06) Information Criteria (Schwarz, 1978). In general, the parameter estimates differ greatly across components, indicating the existence of consumer heterogeneity. For example, preferences for Hunt’s range from !3.19 to 0.94, while sensitivity to (log) price ranges from !4.63 to !2.02. Components one, three, five and six 68 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 describe consumers who prefer each of the four brands, Heinz, Hunt’s, House and Del Monte, respectively. Component two describes consumers who prefer either Heinz or Hunt’s and are display and feature sensitive. Component four describes consumers with approximately equal intercept estimates for the national brands and large sensitivity to price, in-store displays and, to a lesser extent, feature advertising. Consumers described by this latter component frequently switch between the national brands. Parameter estimates for the common parameters of the hierarchical Bayes model are reported in Table 3. The mean of the normal random effects distribution is reported on the left of the table, and the covariance matrix of random effects is reported on the right. The upper right portion of the matrix displays the corresponding correlations. The negative estimates for the brand names indicate that Heinz is the preferred brand, consistent with the choice shares reported in Table 1. Estimates of the covariance matrix of the random effects distribution reveal many large diagonal elements, indicating the extent of heterogeneity in brand preferences and sensitivities to price, display and feature. When compared to the range of estimates from the finite mixture model, the hierarchical Bayes estimates indicate a much larger degree of heterogeneity in the market. For example, a model-based estimate of the 95% HPD interval for price sensitivity ranges from !9.22 to !0.22, which is more than two times larger than that obtained with the finite mixture model. Unless the actual distribution of heterogeneity has a pronounced multi-modal shape, the low dispersion of the finite mixture model distribution is an indication of an inadequate approximation. Below, we present diagnostic evidence that suggests that our assumption of a unimodal heterogeneity distribution is justified. In a Bayesian setting, we can formally compare the models by computing the posterior probability of each of a set of models. The posterior probability of Table 3 Hierarchical Bayes model parameter estimates (posterior standard deviation) Parameter Mean CovarianceCcorrelation matrix Hunt’s !0.50 (0.07) !1.33 (0.11) !4.52 (0.28) !4.72 (0.25) 1.69 (0.13) 1.52 (0.14) 1.82 (0.21) 1.37 (0.18) 2.12 (0.38) !0.98 (0.30) 0.14 (0.19) 0.03 (0.18) Del Monte House Brand ln Price Display Feature 0.67 0.47 !0.32 !0.08 0.02 2.25 (0.27) 2.81 (0.47) !1.32 (0.35) 0.25 (0.21) !0.14 (0.21) 0.57 !0.39 0.13 !0.07 10.94 (1.49) !0.49 (0.83) 0.87 (0.43) 0.87 (0.48) !0.07 0.20 0.20 5.05 (0.99) !0.04 (0.41) 0.23 (0.42) !0.01 0.08 1.76 (0.39) 0.57 (0.25) 0.33 1.69 (0.43) G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 69 a model is proportional to the marginal density of the data (we use h to denote the parameters of the model). P p(M D Data)Jp(Data)" p(h D Data)p(h) dh. (21) We compute the marginal density of the data for the mixture model by using the popular Schwarz asymptotic approximation, ln(p(Data))Gln(p(DataDh"hK )) MLE !k/2 ln(N). K is the number of parameters in the model and N is the number of observations. For the Hierarchical Bayes model, we can employ a more accurate approximation by using Importance Sampling to compute the integral in Eq. (21) as suggested by Newton and Raftery, 1994, p. 21). When we compute the marginal densities of the data for the finite mixture and continuous mixture HB models, it is clear that the HB model fits the data much better and has a much higher posterior probability: p(DataDM"finite mixture)"!5998.3, p(DataDM"continous mixture)"!3928.3. Disaggregate estimates of brand preference for Hunt’s and price sensitivity are displayed in Fig. 1 for a selected subset of households. The MCMC estimate of Fig. 1. Comparison of household parameter posterior distributions: continuous vs. finite mixture models. 70 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 the posterior marginal density of the parameter from the hierarchical Bayes model is shown by the solid line. By conditioning on the MLE’s of the finite mixture model parameters, we are able to compute an approximation to the posterior distribution of household parameters by applying Bayes theorem conditional on the parameter estimates (see also Kamakura and Russell, 1989). This discrete approximate posterior distribution is depicted by the bar charts superimposed over the density plots in Fig. 1. What stands out in Fig. 1 is the large discrepancies between the marginal distributions obtained from a continuous mixture approach and those obtained using the finite mixture approach. The continuous mixture marginals show dramatically greater dispersion in contrast to the finite mixture distributions which, in many cases, put virtually all posterior mass on one value. It is our view that these finite mixture approximations are unrealistically tight, conveying a misleading impression about the extent of sample information regarding household level parameters. This discrepancy becomes problematic when considering marketing policies (e.g. temporary price reductions) which are primarily driven by specific parameters (e.g. the price coefficients). Fig. 2 compares disaggregate estimates of model parameters for the hierarchical Bayes and finite mixture models. Plotted are the posterior means of model parameters for 200 randomly selected households in the panel. The plot illustrates that the finite mixture model is very restrictive in that posterior estimates are constrained to lie within the convex hull of the estimated mass points. The Fig. 2. HB vs. finite mixture household estimates (posterior means). G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 71 finite mixture model therefore does a poor job of describing tail behavior of the distribution of heterogeneity, which is often of considerable interest in marketing problems. This issue is discussed in more detail in Section 4 below. 3.2. Diagnostic checking of the prior In our Bayesian approach, priors play a central role in producing household level inferences. In the hierarchical model, the prior is specified in a two stage process: b &N(bM , » ), h b p(bM ); p(» ). b (22) In the classical literature, the normal distibution of b called the random effects ) model would be considered part of the likelihood rather than part of the prior. As discussed below, the assumption of a normal distribution could be problematic here. The second stage of the prior is also of potential concern. At issue is the tightness of the second stage priors which determines the amount of shrinkage. We use extremely diffuse (barely proper) priors here. In order to check prior sensitivity, we have considerably tightened (doubled the hyperparameters) in the second stage and found that the posterior distribution of the common parameters (displayed in Table 3) does not change at all (up to simulation error in the Gibbs draws). The huge number of households (1401) used here is undoubtedly the reason why large changes in the second stage prior make no material difference. Thus, it is the first stage prior which is important and will always remain important as long as there are only a few observations available per household. Since the parameters of the first stage prior are infered from the data, the main focus of concern should be on the form of this distribution. In the econometric literature, the use of parametric distributions of heterogeneity (e.g. normal distributions) are often criticized on the grounds that their mis-specification leads to inconsistent estimates of the common model parameters (cf. Heckman and Singer, 1984). For example, if the true distribution of household parameters were skewed or bimodal, our inferences based on the symmetric, unimodal normal prior could be misleading. One simple approach would be to plot the distribution of the posterior household means and compare this to the implied normal distribution evaluated at the Bayes estimates of the hyperparameters, N(E[bM Ddata], E[Var(b )]). The posterior means are not conh strained to follow the normal distribution since the normal distribution is only part of the prior and the posterior is influence by household data. This simple approach is in the right spirit but could be misleading since we do not properly account for uncertainty in the household parameter estimates, ME[b Ddata]N, h E[bM Ddata], E[Var(b )Ddata]. h Fig. 3 provides a diagnostic check of the assumption of normality in the first stage of the prior distribution which properly accounts for parameter uncertainty. To handle uncertainty in our knowledge of the common parameters of 72 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 Fig. 3. Comparison of predictive and empirical marginal parameter distributions. the normal distribution, we compute the predictive distribution of b for h{ household h@ selected at random from the population of households with the random effects distribution. Using our data and model, we can define the predictive distribution of b as follows: h{ P b Ddata" /(b DbM , » )p(bM , » data) dbM d» . h{ h{ b b b (23) Here /(b DbM , » ) is the normal prior distribution. We can use our Gibbs draws h{ b of bM , » , coupled with draws from the normal prior to construct an estimate of b this distribution which is plotted as a solid line in Fig. 3. As an informal diagnostic tool, we can pool together all of the Gibbs draws of b for all H households in our dataset and compare this distribution with draws h from the predictive distribution given in Eq. (23). If there is evidence of large departures from normality in this data, we should expect to see discrepancies between the predictive distribution from the model and what we call the ‘empirical’ marginal distribution of the household parameter vector. This ‘empirical’ distribution is shown in Fig. 3 as a dashed line and shows no significant departure from the predictive distribution from the model. This lends support to our assumption of normality. Our overall conclusion from the comparison of household level inferences using the finite mixture and continuous mixture approach is that the finite G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 73 mixture approximations give a misleading impression of the level of information available to infer about household parameters. In addition, the posterior probability calculations dramatically favor the continuous mixture approach. While we will never know for sure whether actual distribution of household parameters follows a normal continuous mixture or a finite mixture, all available evidence seems to favor the normal continuous mixture approach. 4. Marketing decisions In this section, we consider various decisions of interest to marketers and demonstrate the value of considering disaggregate estimates of model parameters. It is rarely the case that a marketing activity is uniformly directed at an entire market. Even mass media decisions, such as the design and execution of television advertising, involves the identification of a target audience and the selection of an appropriate method of message delivery (e.g. a specific network and television show). This requires knowledge of the preferences and the viewing habits of those individuals with whom the firm wants to communicate. Product design decisions also requires detailed knowledge of the distribution of consumer preferences. In general, there are two general strategies available to firms wishing to maintain a profitable existence in competitive markets. The first is to be a low cost producer and design products to have the greatest possible demand. Because of competitive pressures this strategy typically results in lower margins, but profits can be obtained through higher volume. A second strategy of more interest to marketing is to design products for a portion of the market who place high value on particular features. For example, a manufacturer of outboard marine engines may decide to improve the durability of its product which is of particular interest to fishermen who come in frequent contact with submerged logs and weeds. Alternatively, the manufacturer might decide to improve the acceleration property of its engines which is of interest to water skiers. These product differentiation strategies are oriented toward a much smaller customer base who is willing to pay for additional performance. Classical random effects models, which only yield aggregate estimates of the distribution of demand (bM , » ), do not provide b sufficient information for the identification and characterization of these key customers. At a more tactical level, there are many problems in marketing which require knowledge of the distribution of response and estimates of where particular ‘units’ fall within the distribution. For example, marketers are often concerned with the response of individual units (e.g. stores, sales regions, customers) to marketing expenditures. Units which display the strongest response are often used to illustrate ‘best practices’ while those with the lowest response are candidates for ‘corrective action.’ Similarly, the efficient allocation of resources 74 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 requires knowledge of the expected response of individual units of analysis and not just the distribution of responses. An important characteristic of these marketing decisions is that actions frequently take place at a very disaggregate level, with different units of analysis often aggressively targeted with different programs. Furthermore, the identification of optimal actions often requires knowledge of nonlinear functions of model parameters. Because of the severe data limitation at the disaggregate level, substantial uncertainty exists about the true value of model parameters. This causes great difficulty in identifying optimal decision whenever nonlinear rules are encountered. To illustrate, consider the following stylized example which involves identifying optimal temporary price reductions for households in the scanner panel dataset described above. Disaggregate price reductions are now very common in many grocery stores with the advent of point-of-purchase coupons issued at the checkout counter. The current triggering mechanism for issuing most of these coupons is related to the current purchases of the household. For example, a coupon for Hunt’s ketchup is issued to buyers of Heinz. While current coupons are for a fixed amount (e.g. 50 cents), there is nothing to prevent the use of coupons of differing amount. There is also no reason that the triggering mechanism should not be based on the entire purchase history of the household rather than just the current purchase. These enhancements are currently being developed by firms such as Catalina Marketing Inc. and frequent shopper programs which issue coupons electronically (see Rossi et al., 1996 for a comprehensive treatment of target couponing strategies). Optimal price reductions can be identified by considering the profit function: n"Pr(i)(p !c ), i i (24) where Pr(i) is the probability of purchasing alternative i with price p and cost i c The profit function in Eq. (24) is a stylized one which is meant to reflect the i profits of a manufacturer with a rather passive retailer. Retailers now frequently capture information on purchase histories. Specialized firms such as Catalina Marketing have products which issue customized incentives based on purchase history data; Catalina rents exclusive access to their system. We do not consider the effects of competing firms, each with its one customized price promotions problem. We leave this and other questions for further research. An optimal price (p*) to charge the household is the one for which expected i profits are maximized: ­E[n]/­p D * "0, ip i (25) where the expectation is taken with respect to the posterior distribution of model parameters. G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 75 Fig. 4 plots the optimal price reductions for Hunt’s ketchup for the first 200 households in our panel. The price reductions were determined assuming a regular price of $1.50 and a manufacturer cost of $1.00. The plot reflects the wide dispersion in optimal price reductions, ranging from zero to $0.34 with a mean of $0.15. There is a large group of households for which the optimal price reduction is zero with a wide dispersion in non-zero price reductions. Profits increase from $0.106 to $0.117 per household when these household-specific price reductions are used. This suggests that there are substantial profit opportunities for the use of customized pricing if the costs of acquiring and using household purchase history information are sufficiently low. As discussed in Section 2, optimal decision making requires not only individual-level parameter estimates but also must acknowledge the substantial uncertainty in those estimates. In other words, we must use the entire posterior distribution of the household parameters rather than just point estimates to solve the marketing problem. To illustrate the problems with a ‘plug-in’ approach in which parameter estimates are simply inserted into the profit function, we solved the optimal price reduction problem conditioning on the posterior means of each household parameter. Conditioning on the posterior means introduces over-confidence into the targeting of price reductions. We are much more certain about the benefits of large price reductions, so we tend to give Fig. 4. Distribution of optimal price reductions. 76 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 larger price reductions to many households. The average value of the price reductions in the decisions problem which conditions on parameter estimates is 0.18 (contrast with 0.15). This results in higher profit numbers as well, $0.129 vs. $0.117. In our view, conditioning on the posterior means will overestimate the value of targeting households since it does not incorporate uncertainty properly. 5. Concluding remarks A fundamental assumption of marketing is that people are different. People differ in the products they prefer, where they shop, how they communicate and in their sensitivity to variables such as price. This diversity provides firms with the economic incentive to offer different products to different groups of people, which are promoted, priced and placed in the market in a manner appropriate for the targeted groups. The successful execution of these marketing activities requires a detailed understanding of distribution of consumer heterogeneity and identification of preferences at the customer level. For example, a firm wishing to increase sales may consider mailing coupons to price sensitive individuals who are not currently customers. Of critical interest are the preferences and sensitivities of these individuals who, by definition, are not well represented by summary statistics which describe the distribution of heterogeneity for an entire market. In the extreme, the determination of an optimal coupon value which is unique for each individual requires knowledge of model parameters for each individual. The advantage of hierarchical Bayes models of heterogeneity is that they yield disaggregate estimates of model parameters. These estimates are of particular interest to marketers pursuing product differentiation strategies in which products are designed and offered to specific groups of individuals with specific needs. In contrast, classical approaches to modeling heterogeneity yield only aggregate summaries of heterogeneity and do not provide actionable information about specific groups. The classical approach is therefore of limited value to marketers. The disaggregate nature of many marketing decisions, coupled with the assumption of individual differences and the limited data available about any specific individual, has created the need for models of consumer heterogeneity which successfully pool data across individuals while allowing for the (posterior) analysis of individual model parameters. The development of Markov chain methods of inference has made such an analysis possible by exploiting the hierarchical structure typically present in models which incorporate distributional assumptions of heterogeneity. Individual estimates are available as a byproduct of this estimation procedure, which provide a rich source of information for a wide variety of marketing decisions. G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 77 References Allenby, G., Arora, N., Ginter, J., 1998. On the heterogeneity of demand. Journal of Marketing Research, forthcoming. Allenby, G., Ginter, J.L., 1995a. The effects of in-store displays and feature advertising on consideration sets. International Journal of Research in Marketing 12, 76—80. Allenby, G., Ginter, J., 1995b. Using extremes to design products and segment markets. Journal of Marketing Research 32, 392—403. Allenby, G., Lenk, P., 1994. Modeling household purchase behavior with logistic normal regression. JASA 89, 1218—1231. Chamberlain, G., 1984. Panel data. In: Griliches, Intriligator (Eds.), Handbook of Econometrics. North-Holland, Amsterdam. Chintagunta, P., Jain, D.C., Vilcassim, N.J., 1991. Investigating heterogeneity in brand preferences in logit models for panel data. Journal of Marketing Research 28, 417—428. Gelfand, A. et al., 1990. Illustration of Bayesian inference in normal data models using Gibbs sampling. JASA 85, 972—985. Gelfand, A., Smith, A., 1990. Sampling-based approaches to calculating marginal densities. JASA 85, 398—409. Gelman, A., Rubin, D., 1992. A single series from the Gibbs sampler provides a false sense of security. In: Bernardo, J.M. et al. (Eds.), Bayesian Statistics 4. Oxford University Press, Oxford, pp. 625—632. Geweke, J., Keane, M., Runkle, D., 1996. Statistical inference in the multinomial multiperiod probit model. Journal of Econometrics, forthcoming. Gonul, F., Srinivasan, K., 1993. Modeling multiple sources of heterogeneity in multinomial logits: methodological and managerial issues. Marketing Science 12, 213—229. Hajivassiliou, V., Ruud, P., 1994. Classical estimation methods for LDV models using simulation. In: Engle, McFadden (Eds.), Handbook of Econometrics. North-Holland, Amsterdam, pp. 2384—2438. Hausman, J., Ruud, P., 1987. Specifying and testing econometric models for rank-ordered data. Journal of Econometrics 34, 83—104. Hausman, J., Wise D. A., 1978. A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. econometrica 46, 393—408. Heckman, J., 1982. Statistical models for analysis of discrete panel data. In: Manski, McFadden (Eds.), Structural Analysis of Discrete Data. MIT Press, Cambridge, pp. 114—178. Heckman, J., Singer, B., 1984. A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52, 271—320. Kamakura, W., Russell, G., 1989. A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research 26, 379—390. Lenk, P., DeSarbo, W., Green, P., Young, M., 1996. Hierarchical Bayes conjoint analysis: recovery of part worth heterogeneity from reduced experimental designs. Marketing Science 15, 173—191. Mariano, R., Weeks, M., Schuermann, T. (Eds.), 1997. Simulation-Based Inference in Econometrics. Cambridge University Press, Cambridge, forthcoming. McCulloch, R., Rossi, P., 1994. An exact likelihood analysis of the multinomial probit model. Journal of Econometrics 64, 207—240. McCulloch, R., Rossi, P., 1996. Bayesian analysis of multinomial probit model. In: Mariano, Weeks, Schuermann (Eds.), Simulation-Based Inference in Econometrics. Cambridge University, Cambridge. McCulloch, R., Rossi, P., Allenby, G., 1995. Hierarchical modeling of consumer heterogeneity: an application to target marketing. In: Kass, Singpurwalla (Eds.), Case Studies in Bayesian Statistics (1995). Springer, New York, pp. 323—350. 78 G.M. Allenby, P.E. Rossi / Journal of Econometrics 89 (1999) 57–78 Newton, M., Raftery, A., 1994. Approximate Bayesian inference with the weighted likelihood bootstrap. JRSS B 56, 3—48. Rossi, P., Allenby, G., 1993. A Bayesian approach to estimating household parameters. Journal of Marketing Research 30, 171—182. Rossi, P., McCulloch, R., Allenby, G., 1996. On the value of household purchase history information in target marketing. Marketing Science 15, 321—340. Schwarz, G., 1978. Estimating the dimension of a model. The Annals of Statistics 6, 461—464. Tierney, L., 1991. Markov chains for exploring posterior distributions. Technical Report no. 560, School of Statistics, University of Minnesota. Titterington, D., Smith, A.F.M., Makov, U.E., 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, New York.