10 1016@j Indmarman 2013 03 006

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Industrial Marketing Management 42 (2013) 544–551

Contents lists available at SciVerse ScienceDirect

Industrial Marketing Management

Model-supported business-to-business prospect prediction based on an iterative


customer acquisition framework
Jeroen D’Haen 1, Dirk Van den Poel ⁎
Ghent University, Faculty of Economics and Business Administration, Tweekerkenstraat 2, B-9000 Gent, Belgium

a r t i c l e i n f o a b s t r a c t

Article history: This article discusses a model designed to help sales representatives acquire customers in a business-to-business
Received 12 November 2011 environment. Sales representatives are often overwhelmed by available information, so they use arbitrary
Received in revised form 7 January 2013 rules to select leads to pursue. The goal of the proposed model is to generate a high-quality list of prospects
Accepted 10 January 2013
that are easier to convert into leads and ultimately customers in three phases: Phase 1 occurs when there is
Available online 29 March 2013
only information on the current customer base and uses the nearest neighbor method to obtain predictions.
Keywords:
As soon as there is information on companies that did not become customers, phase 2 initiates, triggering a
Customer acquisition feedback loop to optimize and stabilize the model. This phase uses logistic regression, decision trees, and
Sales funnel neural networks. Phase 3 combines phases 1 and 2 into a weighted list of prospects. Preliminary tests indi-
Prospect cate the good quality of the model. The study makes two theoretical contributions: First, the authors offer a
Nearest neighbor standardized version of the customer acquisition framework, and second, they point out the iterative
Iterative framework aspects of this process.
© 2013 Elsevier Inc. All rights reserved.

1. Introduction several reasons (Ang & Buttle, 2006; Buttle, 2009b; Kamakura et al.,
2005). Startups and companies aiming to exploit new markets need
The phrase customer relationship management (CRM) is often used new customers, because they lack existing customers. Even existing
in contemporary marketing literature. Although it has been in use companies in a mature market will lose some customers and must re-
since the beginning of the 1990s, researchers have reached no con- place them (Wilson, 2006). Acquiring new customers is a multistage
sensus with regard to its definition (Buttle, 2009a; Ngai, 2005; process, in which only certain suspects (for a definition of the terms
Richards & Jones, 2008). Most definitions have, however, some core used herein, see Section 2) become actual customers, also referred
features in common; for example, CRM consistently deals with the to as the “sales funnel” (Cooper & Budd, 2007; Patterson, 2007; Yu
acquisition and retention of customers and the maximization of & Cai, 2007). During this process, it is often difficult for sales repre-
long-term customer value (Jackson, 2005; Ngai, Xiu, & Chau, 2009). sentatives to cope with all available data (Yu & Cai, 2007). Monat
Prior literature also distinguishes four types of CRM: strategic, opera- (2011, p. 192) indicates that many companies face this issue:
tional, analytical and collaborative (Buttle, 2009a). This paper focuses
“Sales leads are the lifeblood of industrial companies, yet determin-
on analytical CRM, which involves mining customer-related data for
ing which leads are likely to convert to bookings is often based upon
strategic purposes (Ang & Buttle, 2006; Buttle, 2009a; Ngai et al.,
guesswork or intuition. This results in a waste of resources, inaccu-
2009), centered on the process of acquiring new customers, and
rate sales forecasts, and potential loss of sales. A quantitative model
how data mining techniques can facilitate this process.
that may be used to predict which leads will convert, based on infor-
Most CRM literature neglects customer acquisition in favor of
mation inherent in the leads themselves, would be highly valuable.”
other topics, such as retention (Sohnchen & Albers, 2010), because re-
tention strategies are typically cheaper than acquisition strategies
In response, this article presents a quantitative model, designed to be
(Blattberg, Kim, Kim, & Neslin, 2008a; Wilson, 2006). However, as im-
used as a tool to assist sales representatives in customer acquisition—
portant as customer retention might have become, customer acquisi-
that is, a sales force automation tool. Moreover, it is designed to be
tion is and should be a crucial focus for companies and researchers for
implemented in a web application, giving it certain specific characteris-
tics and advantages. First, it should be usable regardless of specific com-
⁎ Corresponding author. Tel.: +32 9 264 89 80; fax: +32 9 264 42 79. pany characteristics such as size and industry. Whether for a large
E-mail addresses: Jeroen.DHaen@UGent.be (J. D’Haen), dirk.vandenpoel@UGent.be
(D. Van den Poel).
company in the automotive sector or a small company in the food sector,
URL: http://www.crm.UGent.be (D. Van den Poel). the model should render high-quality predictions. Second, it must be
1
Tel.: +32 9 264 98 30. fully automated and run without the need for human interference.

0019-8501/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.indmarman.2013.03.006
J. D’Haen, D. Van den Poel / Industrial Marketing Management 42 (2013) 544–551 545

Third, it must be fast and inexpensive. Because it is a web application, The darker portion of Fig. 1 illustrates the sales funnel. The begin-
users typically want results immediately.2 When the algorithm is ning is a list of suspects. Suspects are all potential new customers
embedded into a web application, the cost to the user is limited. The available. In theory, they could include every other company in a
user (i.e., a business-to-business [B2B] company) only needs to pay a B2B context, apart from the current customer base. In practice, they
membership fee to obtain access to the application and does not need boil down to a limited list of companies (perhaps purchased from
to pay for the whole database of prospects, which can be expensive. specialized vendors; Buttle, 2009b; Rygielski, Wang, & Yen, 2002;
Moreover, the company does not need in-house experts to analyze the Wilson, 2006). The vast amounts of information in those lists tends
data, as the algorithm performs this step and provides intuitive, to overwhelm B2B marketers (Wilson, 2003). As a result, marketers
ready-to-use output. often make selections using a set of arbitrary rules. The outcome of
Sales representatives must sometimes make arbitrary decisions in this selection is the list of prospects. Prospects are suspects who
selecting prospects from a list of suspects and further qualifying them meet certain predefined characteristics. The next step is to qualify
into leads. Thus, time is lost pursuing bad prospects and leads, violat- these prospects. Leads are prospects that will be contacted, after
ing the famous “time is money” corporate mantra. A model with high they have been qualified as the most likely to respond. This qualifica-
predictive power in forecasting the right prospects to pursue can tion is often driven by gut feeling or self-claimed competence. Finally,
save a company time and, ipso facto, money. Research indicates that leads who become clients of the company are customers.
approximately 20% of a sales representative’s time is spent selecting However, current theories and models fail to acknowledge the it-
prospects (Trailer, 2006) and depicts prospecting as the most cum- erative nature of these stages, which implies none of the different
bersome part of the selling process (Moncrief & Marshall, 2005). stages is static. Yet the dynamics of this process influence the process
Furthermore, making ineffective decisions in the customer acquisi- itself. First, if customer acquisition is successful, the customer base is
tion process decreases the overall value of the company over time altered as new customers get added to it. As a result, these new
(Hansotia & Wang, 1997). The proposed algorithm is designed to customers are excluded from the next iteration in the sales funnel.
make the decision-making process less arbitrary by providing model- Second, knowledge from a previous iteration should be incorporated
based prospects. in consecutive iterations. The successes and failures in each stage
Although the algorithm should work regardless of the company fine-tune the overall process. Here, we focus on the interplay be-
using it or the industry in which it is situated, note that the proposed tween prospects and leads. The created model alters on the basis of
sales force automation tool will work best in markets that are highly the conversion from prospect to lead, including learning from the
saturated, in which market penetration is strategically crucial. We new information generated in each iteration. Incorporating the itera-
expect the highest efficiency in markets in which the pool of potential tive aspect will improve the quality of customer acquisition models.
customers is large. In those markets, the selection process is often The procedure we propose radically alters the shape of the sales
costly and arbitrary, due to information overload. In contrast, in in- funnel (the lighter portion of Fig. 1), forming an isosceles trapezoid.
dustries in which customers are large organizations, well-known, More prospects are selected, but they are of higher quality. As a result,
and few in number, the proposed algorithm will not provide a signif- a greater proportion will be converted into leads and ultimately cus-
icant advantage, because the selection of prospects is limited (Long, tomers. Furthermore, the algorithm integrates a feedback loop that,
Tellefsen, & Lichtenthal, 2007). The algorithm functions in a B2B envi- over time, further elevates the quality of the prospects. Note that
ronment and uses the current customer base of a company to predict Fig. 1 is an exaggerated representation; reality should be somewhere
prospects. It also contains a feedback loop that iteratively improves its between the graphs, because sales representatives will most likely se-
overall predictive performance. lect a smaller proportion of leads due to time constraints. It is nearly
There is a limited amount of research on customer acquisition impossible for companies to increase their number of sales calls, as-
(Blattberg et al., 2008a). With this research, we aim to fill this void suming sales representatives work close to capacity (Coe, 2004b).
and also stimulate further research. The theoretical contributions The only alternative is to improve the quality of these calls, which is
are twofold. First, we offer a standardized version of the customer what the proposed algorithm aims to do. It provides high-quality
acquisition framework. Second, we point out the iterative aspect of prospects that are easier to convert, as recommended by research
this process, which has been neglected in research. The remainder showing that call productivity can be improved by the use of informa-
of this article is structured as follows: We present a literature review tion technology tools (Ahearne, Hughes, & Schillewaert, 2007; Eggert
on customer acquisition, then describe the different stages of our & Serdaroglu, 2011).
model. After we elaborate on the data, we report the results of the Traditionally, the conversion rate from prospects to qualified leads
model and finally discuss the conclusions, implications, limitations, is approximately 10% on average (Coe, 2004b). Thus, getting a good
and further research suggestions. list of prospects saves time that then can be spent qualifying them.
Moreover, better qualified leads should lead to a higher customer
2. Customer acquisition framework conversion rate. Usually, a conversion rate from prospects to cus-
tomers of 1%–5% on average can be expected (Coe, 2004b). Research
The sales funnel conceptualization offers a way to describe the shows that a lower conversion rate increases the cost of customer
customer acquisition process, dividing it into different stages (Ang &
Buttle, 2006; Coe, 2004a; Patterson, 2007; Yu & Cai, 2007). These di-
visions vary from study to study, as do the definitions they use to
characterize each part. A main difference, however, is where the stud-
ies place a prospect and a lead in the sales process: some put the pros-
pect before the lead (e.g., Coe, 2004a; Metzger, 2005), whereas others
put the lead before the prospect (e.g., Gillin & Schwartzman, 2011;
Patterson, 2007). For the sake of clarity and as a way of creating a
standardized framework, we first describe our vision on the sales fun-
nel and define each stage. The emphasis is not on where the different
terms are placed but on their definitions.

2
We ran the algorithm discussed herein on a 3.40 GHz Windows server containing
16 GB of RAM. Fig. 1. The original and transformed sales funnel.
546 J. D’Haen, D. Van den Poel / Industrial Marketing Management 42 (2013) 544–551

acquisition (Blattberg et al., 2008a). Thus, raising the conversion rate profiling method in a business-to-consumer (B2C) environment.
will also lower the cost of customer acquisition. They build a model on their current customers and use that model
on potential prospects to rank them from most to least likely to re-
3. Proposed model spond. Here, we apply it in a B2B environment.
A profile is composed of a combination of variables (Hansotia &
The model contains three phases that must be executed chrono- Wang, 1997). The profiles of the prospects are compared with those
logically. Phase 1 runs when there are data only on the current cus- of the current customers. The technique used here to search for
tomers. The model must indicate hidden structures in the data similar profiles is the nearest neighbor algorithm. This method is con-
without the presence of feedback data (i.e., a dependent variable). ceptually simple; it involves calculating the distance between obser-
Therefore, unsupervised learning is necessary. The input of phase 1 vations using a set of variables. The more similar the cases are, the
is data on a list of suspects and the current customers of a company. lower the distance. The advantage of this algorithm is that it is pow-
The output is a list of ranked prospects. As soon as there are data on erful yet easy to understand (Weinberger & Saul, 2009). Fig. 2 is a
which prospects were or were not qualified as leads, phase 2 initiates. simplified presentation of the nearest neighbor algorithm. In the
The model uses this feedback data and supervised learning methods, two-dimensional space shown (representing a profile of two vari-
such as logistic regression, decision trees, and neural networks. Phase ables), company C is closer to company A than company B is to A,
3 combines phases 1 and 2 into a weighted list of prospects. The out- which means that company C is more similar to company A than
put of phase 3 generates more feedback data, which in turn are fed company B is. The reality is more complex though, in that there is a
into phase 2, initializing a feedback loop. That is: multidimensional space rather than a two-dimensional one. The
method we apply here is a k-nearest neighbor algorithm, meaning
that for each current customer, it ranks the k-nearest prospects. We
set k arbitrarily to 10,000. The size of k is not that important, as
long as it is set high enough. The larger the number, the larger the
list of outputted prospects. However, this list can be reduced, such
as by selecting only prospects with a similarity higher than a
predefined threshold. A different, more recommended strategy is to
rank the list first on similarity and then on a different variable of in-
terest (e.g., company size). More ranking variables can be added to
further refine this ranking. Next, the top n prospects of the list are se-
Every model uses an estimation and validation sample to prevent lected, with n being the maximum amount of prospects that the sales
overfitting and to calculate the area under the receiver operating representatives are able to handle. Thus, we advise practitioners to
characteristic curve, also known as the AUC (Blattberg, Kim, Kim, & set k > n and refine the ranking by adding variables that are relevant
Neslin, 2008c,d). The AUC is a common metric to evaluate the accura- to the company of interest.
cy of a model (Ballings & Van den Poel, 2012, 2013; Chen, Hsu, & Hsu, The most important element of a nearest neighbor analysis is the
2011). It can vary from 0.5 (random model) to 1 (perfect model) distance metric, which calculates how similar companies are. Thus,
(Baecke & Van den Poel, 2011; Blattberg et al., 2008d). The data set it is crucial for the quality of the model. Distance metrics are data
is randomly distributed over the estimation and validation sample, type specific: there is no easy way to combine categorical and numer-
with a ratio of two-thirds and one-thirds, respectively, as Blattberg ic data types in one nearest neighbor. Because most of the variables
et al. (2008d) suggest. The estimation sample is used to compute are categorical, numeric ones are converted into categories (for
the models, whereas the validation sample tests the predictive per- more information, see Section 4). The Jaccard and Hamming distance
formance of these models. measures are two possible distance metrics for categorical data
(Ichino & Yaguchi, 1994). The Jaccard similarity coefficient is obtained
3.1. Phase 1 by dividing the size of the set of variables that have the same value by
the size of the set of variables that do not have the same value
The key problem of customer acquisition is that, in the beginning, (Charikar, 2002). The formula is as follows (where A and B signify
the current customer base in combination with a suspect list repre- companies):
sents the only inputs, so no supervised learning can be applied. A so-
lution is to conduct a profiling model, also known as a look-alike jA∩Bj
SJaccard ðA; BÞ ¼
model (Blattberg et al., 2008a; Jackson, 2005; Setnes & Kaymak, jA∪Bj
2001; Wilson, 2006). To acquire new customers, sales representatives
must know in detail who their own customers are (Ngai et al., 2009). The Hamming metric is similar (Steane, 1996). However, the
Profiles are created according to the current customer base, and these Jaccard metric ignores variables that have a zero for both companies,
profiles are subsequently used to predict prospects (Bose & Chen, whereas the Hamming metric does not (Zytynska, Fay, Penney, &
2009; Chou, 2000). This method is a type of clustering, in which iden- Preziosi, 2011). Because in the used data, a zero usually stands for a
tical prospects are put in the same cluster rather than the center of
the cluster being a current customer. The cluster continues to expand
with less similar prospects, with a measure of (dis)similarity assigned
to these prospects. This procedure creates concentric circles, and in
B C
each circle, we find prospects that have the same similarity to the
center (being a specific current customer). The more distant a circle
is, the more dissimilar the prospects are on that circle. Prospects in A
the same cluster or circle share comparable preferences and behav-
iors (Bruckhaus, 2010). As a result, we assume that finding prospects
that are similar to the current customer base increases the probability
that these prospects become future clients of the company, compared
with less similar prospects, because they share the same company
preferences. Kim, Street, Russel, and Menczer (2005) use this Fig. 2. Nearest neighbor.
J. D’Haen, D. Van den Poel / Industrial Marketing Management 42 (2013) 544–551 547

missing value, this comparison should be ignored; thus, we prefer the and the more chance there is of having an overfitted tree. Pruning a
Jaccard metric. Although several distance metrics exist for numeric tree begins at the terminal nodes and works up to the top (Berk,
variables, such as the Euclidean and Mahalanobis distances, there is 2008). It eliminates nodes that do not reduce heterogeneity enough
no generally accepted preference. Aggarwal, Hinneburg, and Keim compared with the complexity they add to the tree. Occam’s razor
(2001) suggest that fractional distance metrics work better than prescribes that researchers should prefer the simplest model that ex-
others when dimensionality is high. The output of phase 1 is a list plains the data (Baesens, Mues, Martens, & Vanthienen, 2009; Duda et
of prospects with their respective similarities (ranging from 0 to 1, al., 2001). The decision tree and its pruning method are based on
with 1 being completely the same with regard to the set of variables Breiman, Friedman, Olshen, and Stone (1984). We use a majority vot-
and 0 being completely different). ing scheme to calculate the probabilities of the decision tree. We then
calculate the probabilities by taking the percentage of ones in each
3.2. Phase 2 ending node. Fig. 3 presents a simple tree.
We calculate the AUC to determine whether to use the logistic
As mentioned previously, phase 2 can only be implemented after model or the decision tree and choose the model with the highest
phase 1 has rendered positive and negative feedback (see the feed- AUC. We include both logistic regression and a decision tree, because
back loop in Fig. 1), which is used as a dependent variable. Thus, the there is no a priori hypothesis for which model works best. Further-
model in Phase 2 uses the prospect list of phase 1, including the feed- more, it might be company or industry specific. (Recall the stipulation
back data on those prospects, and the reference database (for more that the algorithm must run fully automatically without human
information, see Section 2). By adding this second phase to the algo- interference.)
rithm, we incorporate an iterative customer acquisition process. We incorporate a backup model that runs if both the logistic
Each time its output has been evaluated, the feedback is inserted model and the decision tree fail to produce a model that predicts bet-
into the algorithm, and it re estimates the model. The process gradu- ter than a random ranking of prospects (i.e., a model of an AUC of
ally optimizes and stabilizes the model. 0.5): a neural network. The reason we use it only as a back-up is
A basic model to predict customer acquisition is (logistic) regres- that it is relatively slow and unstable (Rygielski et al., 2002), and
sion (Bose & Chen, 2009; Gupta et al., 2006; Hansotia & Wang, the algorithm must provide fast and reliable results. A neural network
1997), the formula for which is as follows: is a nonlinear nonparametric regression model that mimics the struc-
ture and function of the brain (Ha, Cho, & Mela, 2005). It is a black box
1 method, in that it provides no information on the estimated model.
F ðzÞ ¼ where z ¼ β0 þ β1 x1 þ β2 x2 þ … þ βn xn
1 þ e−z The input generates a certain output, and the way this output is
generated remains hidden from the user. The main advantage of neu-
(Blattberg et al., 2008a; Hansotia & Wang, 1997; Pampel, 2000; ral networks is that they are capable of estimating very complex
Van den Poel & Buckinx, 2005). Because there is a danger of relationships.
overfitting the model when using all possible independent variables, A neural network usually contains an input layer, a hidden layer,
we apply a stepwise selection (i.e., the combination of a forward and an output layer (Fig. 4). The input layer corresponds to the inde-
and backward selection) (Blattberg et al., 2008d; Kim et al., 2005). pendent variables, and the output layer is the dependent variable. The
We also include variable transformations (e.g., taking the square of hidden layer represents the nonlinearity of the model. Multiple hid-
variables) to take nonlinearity and skewed distributions into account. den layers can be introduced, but one hidden layer is deemed enough
A problem with logistic regression is that it cannot use categorical to obtain quality estimations (Ha et al., 2005). The neural network is
variables, only continuous ones (Pampel, 2000). We solve this prob- implemented in Matlab and is a feed-forward network. For the input,
lem using dummy variables. However, the large number of categori- hidden, and output layers, the purelin, tansig, and purelin transfer
cal variables could lead to an overload of dummies, which is a functions are applied, respectively. The hidden layer size is varied
computational burden (Bose & Chen, 2009); moreover, no a priori from 1 to 10 neurons selecting the one rendering the highest AUC.
knowledge is available about which categorical variables are likely
crucial to include in the model. Thus, the logistic model only incorpo-
rates continuous variables.
Therefore, we estimate a model using the categorical variables as Selection
well, including categorized versions of the continuous variables. We criterion 1
created the categories using equal frequency binning: the different
categories of a variable have the same size, and they are based on
the ranking of the values of this variable, the preferred technique 0 1
for discretizing the variables for commercial data, which are often un-
balanced or contain outliers (Cantu-Paz, 2001). We apply a decision
tree to estimate the model, an efficient method for estimating cate-
gorical input variables (Bose & Chen, 2009). It involves dividing a
data set into subsets, using the values of the independent variables Selection
as selection criteria to predict the dependent variable (Blattberg, criterion 2 Subset
Kim, Kim, & Neslin, 2008b). It then involves dividing the data into ho-
mogeneous subsets that are heterogeneous to each other, while min- 1
1 0
imizing the cost of this division (Danielson & Ekenberg, 2007). The
top of a decision tree is called the root node (Berk, 2008). This root
node contains the full data set. The outcome of a decision at each
node is called a split (Duda, Hart, & Stork, 2001). Splits after the root
Subset Subset
node are termed branches, and the final splits are the terminal nodes.
All splits after the initial split imply interaction effects, unless they 3 2
use the same predictor (Berk, 2008). We use pruning to find the
right size of the tree to avoid the omnipresent problem of overfitting:
the bigger a tree is, the fewer cases there are in the terminal nodes Fig. 3. Decision tree.
548 J. D’Haen, D. Van den Poel / Industrial Marketing Management 42 (2013) 544–551

Input Phase 1

Phase 3 Prospect list


Hidden

Phase 2
Output
Fig. 5. Overview of the algorithm.

same content. Furthermore, they both range between 0 and 1, making


a combination simple and straightforward.

4. Data

We leased a database of more than 16 million U.S. companies from


an international data provider (hereinafter referred to as the refer-
ence database). It represents the list of suspects, after excluding cur-
rent customers. It contains a selection of 4 numerical and 24
categorical variables (see the Appendix A, Table A.1). Moreover, we
Fig. 4. Neural network. created four additional variables, representing the discretized ver-
sions of the numeric variables. Some literature exists on which vari-
The output of phase 2 is the list of prospects of phase 1 with their ables are relevant in profiling models. Industrial demographic data
respective predicted probability. are often used to prospect new potential customers (Bounsaythip &
Rinta-Runsala, 2001). Two basic demographic variables of companies
3.3. Phase 3 are industry type and company size (Coe, 2004c). However, to our
knowledge, no research addresses the full range of relevant industrial
Prior literature indicates that predictability can be improved by demographic variables, and it is likely that these variables will be
weighting the predictions from different models (Gupta et al., industry or even company specific. Therefore, we included as many
2006). Combining models can partially eliminate the bias inherent variables as possible, because the algorithm must perform well
in each model (Bose & Chen, 2009). The AUC calculated in phase 2 regardless of the company using it. We used three criteria to exclude
(i.e., the AUC of the best model) is used to assign the weights in variables:
phases 1 and 2. We apply the following linear function to calculate
1. Redundant variables: Redundant variables are highly correlated
the weight of phase 2:
with other variables. Including them in a nearest neighbor analysis
ωPhase 2 ¼ ðAUC−0:5Þ  2 would artificially assign them more weight. Because we make no
hypotheses about variable importance, this would be detrimental
to the quality of the analysis.
The weight of phase 1 is naturally computed as follows:
2. Name-based variables: Name-based variables are mainly general
ωPhase 1 ¼ 1−ωPhase 2 company variables that have no predictive power, such as the
chief executive officer name and company name. For example,
the fact that a company is called Apple has no predictive perfor-
Table 1 portrays some AUCs between 0.5 and 1 and their respec-
mance as such. The connotation and familiarity of the name
tive weights for phases 1 and 2. The function used to calculate the
might influence customer acquisition, but the specific letters do
weights is conservative in the sense that it requires a relatively high
not. If we were to run a nearest neighbor algorithm with Apple
AUC before phase 2 weights more than phase 1. The output of
as a current customer, Applebee’s would be a relatively good
phase 3 is the prospect list generated in phase 1 and the weighted
match based on the name variable. It is however unlikely that
similarity.
Applebee’s will be evaluated as a good prospect, due to the large
In summary, phase 1 generates a list of prospects with their simi-
difference between the two companies.
larity; some prospects will be qualified as leads, while others are not;
3. Variables containing a high percentage of missing values: We ex-
this feedback is entered into phase 2, and the algorithm calculates a
cluded variables with more than 50% of missing values. For the
new similarity (probability); phase 3 defines the weights of phase 1
retained variables, we did not infer missing values, which might
and 2 and produces a final prospect list (see Fig. 5).
insert bias in the data (Han & Kamber, 2006).
Phase 3 combines the similarities of phase 1 and the probabilities
of phase 2. Even though they are not the same measure, they repre- The B2B company that serves as a test case for the algorithm is ac-
sent the same idea. More specifically, the higher a prospect is ranked tive in telecommunication services and was founded in 1997. It is
in the list, the more likely this prospect is to become a customer. This based in the United States and is one of the leaders in its market.
justifies combining two different measures because they measure the The platform the firm developed handles more than 1 billion calls a
year and has deployed more than 750 tailored solutions for cus-
Table 1 tomers. The telecom company has 389 active current customers, of
AUC and weights. whom we selected 107 as input for the algorithm. We deleted compa-
nies that had a large amount of missing data or that could not be
AUC phase 2 0.5 0.6 0.7 0.8 0.9 1
matched in the reference database. The matching with the reference
Weight phase 2 0 0.2 0.4 0.6 0.8 1 database is necessary because we extracted the variables of the cur-
Weight phase 1 1 0.8 0.6 0.4 0.2 0
rent customers from this database.
J. D’Haen, D. Van den Poel / Industrial Marketing Management 42 (2013) 544–551 549

In summary, we used two types of data: the reference data set and Table 2
the telecom company data set. The reference data set is a database Results of profile searching.

containing variables on more than 16 million U.S. companies. We Run 1 2 3 4


used this database as a list of suspects, which is the input of the algo-
Number of prospects 10 123 10176 2
rithm (excluding the current customers of the company). The tele- Number of profile 10 0 12 2
com data set contains the customers of the telecom company, customers
without any variables on these customers. We extracted variables of Phase 2: Selected Only Phase 1 Only Phase 1 Decision Tree Decision Tree
method
the telecom company customers from the reference database.
AUC / / 1 0.99985

5. Results
decision tree again represented the selected method (Appendix A,
The sales funnel of B2B companies is more complex than that of Fig. A.2), with an AUC of 0.99985. Additional runs did not reveal the
B2C companies (Yu & Cai, 2007). More processes are needed to com- remaining two customers, most likely because run 4 did not add a
plete transactions, and, as a result, deals take longer to close. Thus, it great deal of feedback data to the model (only two feedback points).
is difficult to conduct an extensive real-life test in a B2B setting. We
were, however, able to do a (limited) real life test of the algorithm. 6. Conclusions and implications
We inserted the current customers of the company in the algo-
rithm as input for phase 1. This rendered a list of prospects sent to This article presents a procedure to facilitate the customer acquisi-
the telecom company. The company reviewed this list and qualified tion process in a B2B environment. The algorithm contains three
prospects into “good” and “bad” leads. The list was first ranked on phases, and the output is a ranked list of prospects. Sales representa-
similarity and then on company sales volume, which was a relevant tives could select a top percentage of these ranked prospects to qual-
ranking variable according to the company sales manager. The sales ify further as leads to pursue. Because these prospects are higher
representatives selected the top 356 prospects to evaluate. Of these, quality, it is easier for sales representatives to qualify and, in turn,
56 companies were qualified as good leads, corresponding to a conver- convert them into customers. Real-life and pseudo tests show posi-
sion rate from prospect to lead of 15.73% [=56/(56 + 300)], higher tive results. The real-life test suggests a conversion rate from prospect
than the overall conversion rate of 10% on average (Coe, 2004b). Next, to lead that is higher than average. The first pseudo test produced a
we administered two pseudo tests to determine the quality of the algo- conversion rate from prospect to customer similar to the average con-
rithm. Although they are not real-life tests, they use real data. version rate by only using the first phase of the algorithm. The second
In the first test, we used the positively qualified prospects as input pseudo test needs only four runs to find 24 of 26 companies in a sus-
to the algorithm. Here, we employed a reverse logic to test the model. pect list that contains more than 16 million companies.
We used the 56 positively qualified prospects received from the tele- This study provides several managerial implications. First, the pro-
com company as input to find the 107 original telecom company cus- posed sales force automation tool operates in a fully automated way,
tomers. Using an (arbitrary) selection rule of retaining prospects with but human intervention remains possible, when necessary. As a re-
the highest similarity (to the 56 positively qualified prospects that we sult, the tool can work in a broad range of situations. It supports
used as input), we retained 228 potential prospects, of which 8 were sales managers from a starting position, in which there is merely a
original telecom company customers. Assuming that these 8 pros- basic set of current customers and no information on the acquisition
pects would become company clients, we obtain a conversion rate process, to a situation in which the customer base is more mature
of prospect to customer of 3.5% (= 8/228), similar to what can be and a vast amount of data is available on the history of this process.
expected on average (Coe, 2004b). However, this is obtained by However, human intervention might be preferable in some cases.
only running phase 1. We expect that running phases 2 and 3 will Look-alike models tend to overlook opportunities in other segments
elevate this conversion rate by including feedback data. (Blattberg et al., 2008a), which is inherent to the method, in that it
The second test assesses the combination of the three phases and searches for new prospects similar to the current customers. As a re-
their ability to find specific companies, mainly as a test of the efficien- sult, it is not always optimal to include the full set of variables. For ex-
cy of the feedback loop. This test does not use the telecom company ample, the industry (NAICS code) can be withheld from the algorithm
data, only the reference data. We selected companies with the follow- to find prospects in different industries as well.
ing random profile from the reference database (the interpretation of Second, the output of the algorithm can be used straightforwardly
the variables is not relevant here): sales volume > $100 000 and without any knowledge of the statistical models running in the back-
≤ $190 000; number of employees > 4 and ≤50; square footage esti- ground. Thus, its applicability does not rely on any human expertise,
mator >2 210 and ≤3319; import export indicator = 2; population such that it lowers the threshold for sales representatives to use this
code > 4; and active in the accommodations and food services indus- tool. Furthermore, research has shown that the efficiency of sales rep-
try. This rendered a list of 36 companies. We then randomly selected resentatives using sales force automation tools is only augmented
10 companies as current customers and ran the model to search for when it is accompanied by user training and support (Ahearne,
the other 26 profile customers. In other words, these 26 companies Jelinek, & Rapp, 2005). Because this tool can intuitively be used and
are “hidden” in the reference database, and the goal is to find them. no significant training is necessary, the cost and time of such support
In each run, the algorithm chose prospects that had the highest sim- is marginal, making it more likely that B2B sales managers will imple-
ilarity, regardless of how big this selection was. The first run only ment it and that this implementation is fluent.
used the nearest neighbor algorithm, because no feedback data Third, the tool could help sales managers negotiate with a data
were available yet (Table 2). Ten prospects had the highest similarity, vendor to pay for only the prospects indicated by the sales force auto-
and all of these were part of the 26 profile customers. In the second mation tool and not the whole list of suspects. The tool can also be
run, again only the nearest neighbor algorithm ran, because the pre- embedded into a web application, limiting the costs (see Section 1).
vious run gave only positive feedback points and no negative ones. However, even if a data vendor was already willing to sell a selection
Of the 123 prospects with the highest similarity, none were profile of prospects on the basis of some arbitrary rules instead of a list of
customers. Run 3 rendered 10,176 prospects, of which 12 were profile suspects, sales managers or the vendors themselves could improve
customers. The selected method in phase 2 was a decision tree the selection using the proposed algorithm.
(Appendix A, Fig. A.1). The AUC of the decision tree was 1. Run 4 pro- Fourth, this study offers an explicit iterative view of the customer
vided two more prospects, both of which were profile customers. A acquisition process. Each iteration provides useful information for the
550 J. D’Haen, D. Van den Poel / Industrial Marketing Management 42 (2013) 544–551

next. Therefore, there is a need for an extensive documentation when


sales managers attempt to acquire new customers. Information on
Minority owned
decisions made, steps taken, strategies employed, and so on must be
recorded and analyzed periodically. This way, new customer acquisi-
tion can be improved incrementally.
This iterative view is also a theoretical implication. The shift from
a static to a dynamic framework is a more accurate conceptualization <0.5 >=0.5
of reality. When designing models using a customer acquisition
framework, modelers should take the iterative aspect into account,
which has been neglected to date. A different but related theoretical
implication is the need for a standardized customer acquisition 1 0
framework. We provide a personal, though literature-based, view
on the flow between the different acquisition stages and their respec-
Fig. A.1. Decision tree round 3.
tive definitions. It is by no means meant as an ultimate framework,
but rather as a tool tailored for our purposes.

7. Limitations and further research


Import-export
The main limitation of this study is that it was not possible to run a
full, real-life test of the algorithm. Such a test is necessary to fully val-
idate the model. Therefore, further research should first involve an
extensive, real-life test using the suggested algorithm. If these tests
prove the model valid, adjustments can be made to improve it fur- <1 >=1
ther. A possible avenue of study is to make a distinction within the
current customer base between good and bad customers and give
corresponding weights to them when running the algorithm. The dis-
tinction between good and bad customers might be based, for exam- 0 1
ple, on profitability, because research shows that customers are not
equally profitable (Jacobs, Johnston, & Kotchetova, 2001). Another
Fig. A.2. Decision tree round 4.
possible avenue is to include other data sources into the model, be-
cause the success of a model depends partly on the data input
(Baecke & Van den Poel, 2012a, b). For example, web data have prov-
References
en to be strong predictors of profitable customers (D'Haen et al.,
2013; Thorleuchter, Van den Poel, & Prinzie, 2012). Aggarwal, C., Hinneburg, A., & Keim, D. (2001). On the surprising behavior of distance
metrics in high dimensional space. Lecture notes in computer science (pp. 420–434).
London: Springer-Verlag.
Appendix A Ahearne, M., Hughes, D. E., & Schillewaert, N. (2007). Why sales reps should welcome
information technology: Measuring the impact of CRM-based IT on sales effective-
ness. International Journal of Research in Marketing, 24, 336–349.
Ahearne, M., Jelinek, R., & Rapp, A. (2005). Moving beyond the direct effect of SFA
adoption on salesperson performance: Training and support as key moderating
Table A.1 factors. Industrial Marketing Management, 34, 379–388.
Variable list. Ang, L., & Buttle, F. (2006). Managing for successful customer acquisition: An explora-
tion. Journal of Marketing Management, 22, 295–317.
Variable Name Type Baecke, P., & Van den Poel, D. (2012a). Improving customer acquisition models by in-
corporating spatial autocorrelation at different levels of granularity. Journal of In-
Sales_volume Numeric
telligent Information Systems, 1–18. http://dx.doi.org/10.1007/s10844-012-0225-4.
Employees_total Numeric
Baecke, P., & Van den Poel, D. (2011). Data augmentation by predicting spending plea-
Employees_here Numeric
sure using commercially available external data. Journal of Intelligent Information
Status_indicator_0 Categorical Systems, 36, 367–383.
Naics_1 to Naics_5 Categorical Baecke, P., & Van den Poel, D. (2012b). Including spatial interdependence in customer
Veteran_indicator Categorical acquisition models: A cross-category comparison. Expert Systems with Applications,
Women_owned_indicator Categorical 39, 12105–12113.
Minority_owned_indicator Categorical Baesens, B., Mues, C., Martens, D., & Vanthienen, J. (2009). 50 years of data mining and
Minority_type Categorical OR: Upcoming trends and challenges. Journal of the Operational Research Society,
Cottage_indicator Categorical 60, S16–S23.
Import_export_indicator Categorical Ballings, M., & Van den Poel, D. (2013). Kernel Factory: An ensemble of kernel ma-
Manufacturing_indicator Categorical chines. Expert Systems with Applications, 40, 2904–2913.
Public_private_indicator Categorical Ballings, M., & Van den Poel, D. (2012). Customer event history for churn prediction:
How long is long enough? Expert Systems with Applications, 39, 13517–13522.
Legal_status_code Categorical
Berk, R. A. (2008). Classification and regression trees (CART). Statistical learning from a
Owns_rents_indicator Categorical
regression perspective (pp. 103–166). London: Springer Verlag.
Small_business_indicator Categorical
Blattberg, R. C., Kim, P., Kim, B. D., & Neslin, S. A. (2008a). Acquiring customers. Database mar-
Population_code Categorical keting: Analyzing and managing customers (pp. 495–514). London: Springer Verlag.
Fortune_1000_indicator Categorical Blattberg, R. C., Kim, P., Kim, B. D., & Neslin, S. A. (2008b). Decision trees. Database market-
Non_profit_indicator Categorical ing: Analyzing and managing customers (pp. 423–441). London: Springer Verlag.
8a_disadvantage_indicator Categorical Blattberg, R. C., Kim, P., Kim, B. D., & Neslin, S. A. (2008c). The predictive modeling
Square_footage_estimator Numeric process. Database marketing: Analyzing and managing customers (pp. 245–286).
Franchise_indicator Categorical London: Springer Verlag.
Territory_covered Categorical Blattberg, R. C., Kim, P., Kim, B. D., & Neslin, S. A. (2008d). Statistical issues in predictive
Hierarchy_code Categorical modeling. Database marketing: analyzing and managing customers (pp. 291–321).
Sales_cat Categorical London: Springer Verlag.
Emp_here_cat Categorical Bose, I., & Chen, X. (2009). Quantitative models for direct marketing: A review from
Emp_total_cat Categorical systems perspective. European Journal of Operational Research, 195, 1–16.
Bounsaythip, C., & Rinta-Runsala, E. (2001). Overview of data mining for customer
Square_footage_cat Categorical
behavior modeling. Otaniemi, Finland: VTT Information Technology.
J. D’Haen, D. Van den Poel / Industrial Marketing Management 42 (2013) 544–551 551

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression Long, M. M., Tellefsen, T., & Lichtenthal, J. D. (2007). Internet integration into the indus-
trees. Covington, KY: Wadsworth International Group. trial selling process: A step-by-step approach. Industrial Marketing Management,
Bruckhaus, T. (2010). Collective intelligence in marketing. In J. Casillas, & F. J. 36, 676–689.
Martínez-López (Eds.), Marketing intelligence systems using soft computing: Managerial Metzger, M. (2005). Using water testing to convert prospects into leads and leads into
and research applications (pp. 131–154). London: Springer Verlag. customers. WC&P International, 47, 7–8.
Buttle, F. (2009a). Introduction to customer relationship management. Customer Monat, J. P. (2011). Industrial sales lead conversion modeling. Marketing Intelligence &
relationship management: Concepts and technologies (pp. 1–23) (2nd ed.). London: Planning, 29, 178–194.
Taylor & Francis. Moncrief, W. C., & Marshall, G. W. (2005). The evolution of the seven steps of selling.
Buttle, F. (2009b). Managing the customer lifecycle: customer acquisition. Customer Industrial Marketing Management, 34, 13–22.
relationship management: Concepts and technologies (pp. 225–254) (2nd ed.). London: Ngai, E. W. T. (2005). Customer relationship management research (1992–2002): An
Taylor & Francis. academic literature review and classification. Marketing Intelligence & Planning,
Cantu-Paz, E. (2001). Supervised and unsupervised discretization methods for evolu- 23, 582–605.
tionary alghorithms. Proceedings of the Genetic and Evolutionary Computation Ngai, E. W. T., Xiu, L., & Chau, D. C. K. (2009). Application of data mining techniques in
Conference. San Francisco: Association for Computing Machinery. customer relationship management: A literature review and classification. Expert
Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. Systems with Applications, 36, 2592–2602.
Proceedings of the thirty-fourth annual ACM symposium on theory of computing Pampel, F. C. (2000). Logistic regression: A primer. Thousand Oaks, CA: Sage Publications.
(pp. 380–388). New York: Association for Computing Machinery. Patterson, L. (2007). Marketing and sales alignment for improved effectiveness. Journal
Chen, W. C., Hsu, C. C., & Hsu, J. N. (2011). Optimal selection of potential customer of Digital Asset Management, 3, 185–189.
range through the union sequential pattern by using a response model. Expert Richards, K. A., & Jones, E. (2008). Customer relationship management: Finding value
Systems with Applications, 38, 7451–7461. drivers. Industrial Marketing Management, 37, 120–130.
Chou, P. B. (2000). Identifying prospective customers. Proceedings of the 6th Rygielski, C., Wang, J., & Yen, D. C. (2002). Data mining techniques for customer rela-
ACM SIGKDD international conference on knowledge discovery and data mining tionship management. Technology in Society, 24, 483–502.
(pp. 447–456). New York: Association for Computing Machinery. Setnes, M., & Kaymak, U. (2001). Fuzzy modeling of client preference from large data
Coe, J. M. (2004a). Segmentation for communications. The fundamentals of business to sets: an application to target selection in direct marketing. IEEE Transactions on
business sales and marketing (pp. 71–94). New York: McGraw-Hill. Fuzzy Systems, 9, 153–163.
Coe, J. M. (2004b). The integration of direct marketing and field sales to form a new Sohnchen, F., & Albers, S. (2010). Pipeline management for the acquisition of industrial
B2B sales coverage model. Journal of Interactive Marketing, 18, 62–77. projects. Industrial Marketing Management, 39, 1356–1364.
Coe, J. M. (2004c). The start: Profiling and targeting the market. The fundamentals of Steane, A. M. (1996). Error correcting codes in quantum theory. Physical Review Letters,
business to business sales and marketing (pp. 51–69). New York: McGraw-Hill. 77, 793–797.
Cooper, M. J., & Budd, C. S. (2007). Tying the pieces together: A normative framework Thorleuchter, D., Van den Poel, D., & Prinzie, A. (2012). Analyzing existing customers
for integrating sales and project operations. Industrial Marketing Management, 36, websites to improve the customer acquisition process as well as the profitability
173–182. prediction in B-to-B marketing. Expert Systems with Applications, 39, 2597–2605.
Danielson, M., & Ekenberg, L. (2007). Computing upper and lower bounds in interval Trailer, B. (2006). Understanding what your sales manager is up against. Harvard Business
decision trees. European Journal of Operational Research, 181, 808–816. Review, 84, 48–55.
D'Haen, J., Van den Poel, D., & Thorleuchter, D. (2013). Predicting customer profitability Van den Poel, D., & Buckinx, W. (2005). Predicting online-purchasing behaviour. European
during acquisition: Finding the optimal combination of data source and data min- Journal of Operational Research, 166, 557–575.
ing technique. Expert systems with applications, 40, 2007–2012. Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Non-metric methods. Pattern classification nearest neighbor classification. Journal of Machine Learning Research, 10, 207–244.
(pp. 1–66) (2nd ed.). New York: Wiley. Wilson, R. D. (2003). Using online databases for developing prioritized sales leads. The
Eggert, A., & Serdaroglu, M. (2011). Exploring the impact of sales technology on sales- Journal of Business and Industrial Marketing, 18, 388–402.
person performance: A task-based approach. Journal of Marketing Theory & Practice, Wilson, R. D. (2006). Developing new business strategies in B2B Markets by combining
19, 169–186. CRM concepts and online databases. Competitiveness Review: An International Business
Gillin, P., & Schwartzman, E. (2011). Lead generation. Social marketing to the business Journal incorporating Journal of Global Competitiveness, 16, 38–43.
customer: Listen to your B2B market, generate major account leads, and build client Yu, Y. P., & Cai, S. Q. (2007). A new approach to customer targeting under conditions of
relationships (pp. 156–175). Hoboken, NJ: Wiley. information shortage. Marketing Intelligence & Planning, 25, 343–359.
Gupta, S., Hanssens, D., Hardie, B., Kahn, W., Kumar, V., Lin, N., et al. (2006). Modeling Zytynska, S. E., Fay, M. F., Penney, D., & Preziosi, R. F. (2011). Genetic variation in a trop-
customer lifetime value. Journal of Service Research, 9, 139–155. ical tree species influences the associated epiphytic plant and invertebrate com-
Ha, K., Cho, S., & Mela, C. F. (2005). Response models based on bagging neural net- munities in a complex forest ecosystem. Philosophical Transactions of the Royal
works. Journal of Interactive Marketing, 19, 17–30. Society B: Biological Sciences, 366, 1329–1336.
Han, J., & Kamber, M. (2006). Data preprocessing. Data mining: Concepts and techniques
(pp. 47–104) (2nd ed.). Amsterdam: Elsevier.
Jeroen D’Haen is a PhD candidate in Applied Economics at Ghent University, Belgium. His
Hansotia, B. J., & Wang, P. (1997). Analytical challenges in customer acquisition. Journal
main field of interest centers on customer acquisition in combination with data- and text
of Direct Marketing, 11, 7–19.
mining. Previous research has been published in Expert Systems with Applications.
Ichino, M., & Yaguchi, H. (1994). Generalized Minkowski metrics for mixed feature-type
data analysis. IEEE Transactions on Systems, Man, and Cybernetics, 24, 698–708.
Jackson, T. W. (2005). CRM: From “art to science”. Journal of Database Marketing &
Dirk Van den Poel is a full professor of marketing analytics at the Faculty of Economics
Customer Strategy Management, 13, 76–92.
and Business Administration of Ghent University in Belgium. His main fields of interest
Jacobs, F. A., Johnston, W., & Kotchetova, N. (2001). Customer profitability: prospective
are studying consumer behavior from a quantitative perspective (CRM), data mining
vs. retrospective approaches in a business-to-business setting. Industrial Marketing
(genetic algorithms, neural networks, random forests, ensemble classification), and
Management, 30, 353–363.
operations research. Previous research has been published in European Journal of
Kamakura, W., Mela, C. F., Ansari, A., Bodapati, A., Fader, P., Iyengar, R., et al. (2005). Choice
Operational Research, Expert Systems with Applications, Decision Support Systems, and
models and customer relationship management. Marketing Letters, 16, 279–300.
Journal of Applied Econometrics, among others.
Kim, Y. S., Street, W. N., Russel, G. J., & Menczer, F. (2005). Customer targeting: A neural
network approach guided by genetic algorithms. Management Science, 51, 264–276.

You might also like