Chaud Huri 2021
Chaud Huri 2021
Chaud Huri 2021
A R T I C L E I N F O A B S T R A C T
Keywords: A thorough understanding of online customer’s purchase behavior will directly boost e-commerce business
E-commerce performance. Existing studies have overtly focused on purchase intention and used sales rank as a natural proxy,
Customer relationship which however has limited business application. Additionally, intention to purchase does not necessarily convert
Deep learning
to actual retail purchases. We aim to further our understanding of online customer’s purchase behavior for an e-
Machine learning
Online purchase behavior
commerce platform by predicting the same using deep learning techniques, on a large multidimensional data
sample of more than 50,000 unique web sessions. This study used two distinct sets of variables, i.e., platform
engagement and customer characteristics, as key predictors of online purchases by retail customers. We further
compared the predictive capability of our deep learning method with other widely used machine learning
techniques for prediction, including Decision Tree, Random Forest, Support Vector Machines, and Artificial
Neural Networks. We found that the deep learning technique outperformed the machine learning techniques
when applied to the same dataset. These analyses will help platform designers plan for more platform engage
ments while simultaneously expanding the academic understanding of purchase prediction for online e-com
merce platforms.
* Corresponding author.
E-mail addresses: neha1731992@gmail.com (N. Chaudhuri), gaurav.gupta@neoma-bs.fr (G. Gupta), vamsi@bitsindri.gmail.com (V. Vamsi), indranil_bose@
yahoo.com (I. Bose).
https://doi.org/10.1016/j.dss.2021.113622
Received 1 December 2020; Received in revised form 13 May 2021; Accepted 7 June 2021
Available online 15 June 2021
0167-9236/© 2021 Elsevier B.V. All rights reserved.
Please cite this article as: Neha Chaudhuri, Decision Support Systems, https://doi.org/10.1016/j.dss.2021.113622
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
2
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
opportunity to further extend our knowledge in this area. complex datasets. For example, a recent study by Loureiro et al. [34] has
Existing studies have further examined website usability [22], adopted DL to forecast sales in fashion retailing. Also, Korpusik et al.
modelled the convergence of transaction convergence as task comple [35] have applied a feedback-based DNN model (i.e., Recurrent Neural
tion [23] or have predicted this convergence [5]. Close & Kukar-Kinney Network) to a large corpus of tweets of potential customers to predict
[24] have examined the capabilities of the online platform as a function their choice of products and final purchases.
of the overall purchase process. They have linked user motivations on In summary, a detailed review of literature in this area, as shown in
the platform with their actions with a focus on their usage of the online Table A1, highlights important research gaps. First, there is a lack of
platform features. Various studies including those from Brown et al. empirical evidence connecting platform engagement with the actual
[10], and Olbrich and Holsing [25] have analyzed customers’ online purchase decision. The focus has primarily been on purchase intention
activity to draw insights about their purchasing behavior. However, a which, as theory has shown, can be different from actual purchase
willingness to purchase does not necessarily translate to the same in an behavior. While purchase intention represents the will to purchase a
uncontrolled real-life setting [20]. Increasingly, studies have started product, purchase behavior refers to the actual purchasing process on an
adopting real world data (e.g., clickstreams or retail sales datasets) to online platform, which is the focus of our research. Second, existing
examine online purchase behavior [35,36]. studies have examined the impact of platform engagement attributes
and customer attributes on purchase intentions, but separately. There is
a lack of a concerted effort to link these two streams of research. It is
2.3. Analytical methods for prediction
necessary to examine these distinctions and their combined impact
when attempting to examine online customer engagement. Additionally,
In order to predict customers’ decision to purchase, analytical
behavioral data analysis from clickstream and retail sales has huge un
methods such as regression and ML have been used by researchers over
tapped potential in understanding customers’ activities on online
the years. The most widely used methods include Stepwise Logistic
shopping platforms and the impact that these activities have on the
Regression (SLR) [26], Decision Tree (DT) [27], Random Forest (RF)
customers’ purchase behavior. In this study, we attempt to bridge these
[28], Support Vector Machines (SVM) [29], and Artificial Neural Net
gaps in literature by examining retail sales data to draw insights about
works (ANN) [30]. DT and RF have widespread applications for pre
purchase behavior of customers through meaningful engagement on an
diction related problems because of their ease of use and the high
e-commerce platform. Table A1 provides a summary of the extant
interpretability of their generated results. Moreover, unlike ANN, DT,
research in this area.
and RF are both capable of directly handling categorical variables
[27,28]. However, DT is less robust than RF and has been found to be
3. Data description
highly sensitive to even small variations in data [31]. Additionally, RF is
simpler to tune because it has a smaller number of hyperparameters as
The challenges associated with the application of big data analytics
compared to neural network-based models [28]. However, ANN has
to predict customer purchases stem from the lack of actual sales data. As
been found to outperform DT and RF in terms of resource utilization and
a result, past literature has widely used ‘intention to purchase’ [36] as
handling of multidimensional complex datasets [31,32]. SLR has been
well as sales rank of products [37] as proxies for actual purchases of
used in extant literature for predictions involving binary dependent
customers on e-commerce platforms. However, recent studies have
variables. However, it suffers from a major limitation that makes it unfit
shown that analyzing actual purchase data would yield more convincing
for rigorous empirical analysis. SLR adds or removes variables during
results [38]. The dataset used in this study addresses this concern. We
analysis in a specific order and studies have found that this order of
have collected anonymized web browsing data from an online e-com
addition or removal of variables can affect the final outcome [33]. This
merce platform. This e-commerce platform is a multi-vendor general
has prompted scholars to suggest the use of SLR for exploratory research
purpose online marketplace based in Germany and Belgium. The dataset
only.
comprised historical purchase data and other variables relevant for
While all of these approaches have improved the ability to determine
addressing the research questions of this study. The data consisted of
customers’ purchases, we believe that recent advances in computing,
429,013 unique sessions. One or more products were purchased in
especially DL techniques, hold much promise, primarily due to their
290,030 sessions (67.60%) while no product was purchased in 138,083
capability to improve predictions through learning. Recently various
(32.40%) sessions. Each row of this dataset represented a single online
studies have started embracing this approach for analyzing large and
Table A1
Summary of related literature using ML and DL methods.
Topics studied Research approach Algorithm used Research contributions Research limitations
Role of customer Dataset related to frequency, time Logistic lasso regression, Innovative pairwise comparison of time No distinction between high
characteristics and lapse and values of earlier extreme learning machine lapse and value difference between two versus low involvement product
engagement with e- purchases [62] and gradient tree boosting consecutive purchases to predict future categories [62]
commerce platforms in methods [62] purchase [62]
purchase predictions Online sale data of consumer LDA [63] Inclusion of customer heterogeneity as a Limited scalability of method
goods [63] predictor [63] [63]
Clickstream dataset related to Maximum likelihood Comparison between role of focused Clickstream data did not allow
browsing of online forum by estimation followed by versus unfocused product search and in-depth analysis and
potential customers [64] binary regression [64] browsing behavior on purchase decision classification of different
[64] browsing behaviors [64]
Role of platform Naturally obtained dataset from Tobit model followed by Data linked to advertisement exposure Examination of a single product
characteristics in purchase 4000 customers over 2 years Variational Bayes ML revealed negative effects of e-mail category (i.e. clothing category)
predictions related to multiple sources of method [65] catalogues and positive effects of paid [65]
advertisement exposure [65] effects and competitor catalogues [65]
Data from 400 respondents about Ordinary Least Squares Identification of website quality factors Lack of external validity due to
website quality in an experimental (OLS) regression [66] such as sophistication, genuineness, and use of student samples [66]
setup [66] unpleasantness [66]
Dataset of tourism e-commerce Co-EM logistic regression Combined semi-supervised and multi- Lack of generalizability to other
products [67] [67] view learning procedures to exploit product domains [67]
unlabeled data [67]
3
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
4
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
For every n-level categorical variable, n variables were generated. After layer as well as multiple sparsely connected intermediate layers. The
this preprocessing, the dataset had 25 numerical variables that acted as greater depth of its structure, as compared to the ANN, allows it to learn
predictors of customers’ purchases. multiple levels of representations in a dataset with increasing
complexity. This enhanced representation learning leads to a better
4.2. Data analysis predictive performance of the DNN. DL has been used for various
research applications, including image classification, emotion detection,
The DL and ML techniques required initialization of hyper sales prediction, and has been applied to a variety of data types. As a
parameters during model building. These hyperparameters determined result, different types of DL architectures have been developed for
the training capability of the techniques [43] and needed to be tuned different use cases. For example, convolutional neural network (CNN)
during the learning process. Moreover, the traditional approach of using has been used for image processing and is well-suited for multi-
a training dataset to train a ML technique and then using a testing dimensional records. The recurrent neural network (RNN) works well
dataset to evaluate the performance of the technique often suffers from with sequential data with a temporal dynamic behavior.
overfitting and lack of robustness [44]. To overcome this problem, we For this study, we used the feed-forward DNN. We chose this DL
employed k-fold cross validation with k = 5 for each run. This procedure technique because the dataset used in this study consisted of one-
involved partitioning of the training dataset into five folds. For every dimensional inputs only. Our choice to apply the DNN in this context
iteration of the procedure, one fold was treated as a validation set while was further affirmed due to the data-driven self-adaptive approach of
the remaining four folds acted as training sets. In each iteration, we the DNN that was in contrast to the traditional methods that required
performed grid-search based hyperparameter tuning until the training specific assumptions about the functional form of the data. The network
and validation errors stabilized. This helped to overcome the more structure of the DNN is defined in a way such that the initial layers learn
subtle ‘hyperparameter overfitting’ [45]. Further, following Zolbanin et al. the simpler data features, while the deeper layers handle the more
[41], we calculated the accuracy-based importance score of variables for complex features [47]; thus, enabling it to capture nonlinear relation
each run. This importance score compared the predictive ability of the ships in the dataset. Moreover, prior studies have shown that the DNN is
independent variable based on how much it contributed to the overall less vulnerable to the curse of dimensionality, as compared to the ML
accuracy of a given model. To determine the importance of a variable, regression-based models. This makes the DNN suitable for this study
we started with the full combination of variables and kept dropping because it is expected to handle the multiple multi-class categorical as
variables one by one, and trained the technique with the remaining well as numerical variables of the dataset efficiently.
variables, and calculated the accuracy of prediction. We observed that We used the stochastic gradient descent (SGD) method to train the
the absence of a relatively important variable resulted in a significant DNN and initialized it with three layers and an equal number of neurons
drop in the accuracy of the model. Finally, following Shen et al. [46], we in each layer. SGD is the most preferred cost-effective optimization al
used sensitivity analysis-based feature selection to extract the optimal gorithm to train neural networks as it ensures randomness by intro
subset of variables that most accurately predicted purchases. We also ducing a single training sample at each iteration [61]. We investigated
compared the training and validation errors for the predictive models different configurations of the network structure, including three, five,
and found that the validation error was within 0.048% of the training and seven layers with 64,128 and 256 neurons in each layer. The results
error. This confirmed insignificant overfitting. are reported in the following section. Finally, since the DNN is vulner
able to overfitting, we used an early stopping and a dropout layer to
address this problem, as suggested by Srivastava et al. [48].
4.3. Deep learning (DL)
We used Keras 2.3.0, an open source Python library running on Py
thon 3, to develop the DL and ML techniques. Keras is a programming
The DNN is a DL technique that is an evolved variant of ANN. Unlike
interface which works on top of TensorFlow version 2.1.0. TensorFlow is
the network structure of a traditional ANN, a DNN has a fully connected
5
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
6
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
time from the complete DNN model and then re-analyzing the data. The successful orders, the duration of their association with the platform
impact (i.e., reduction in accuracy) was greater for a variable which had impacted their propensity to purchase.
a higher contribution to accurately predict purchase in a web session While we are not suggesting the lack of importance of repeat cus
and this translated into a higher importance score for the variable. We tomers for online purchases, we want to highlight the importance of
then normalized the scores in the form of ‘relative importance’ of a loyal customers for all types of purchases. Such customers experienced
variable with respect to the whole set of variables available for analysis. greater familiarity and comfort with the online platform and returned to
The value of relative importance ranges from 0 to 1, where 1 represents the platform repeatedly for additional purchases. The inclusion of these
the variable with the highest contribution to the accuracy of the model variables were a strong indicator of the need for ease of navigation as
and 0 represents the variable with the lowest contribution. Table 6 well as the customers’ familiarity and trust on the platform.
shows the relative importance of the top twelve variables for the DNN The variables sum of prices of all products clicked on, and lowest price of
model with 5 layers and 128 neurons per layer. However, each of the product added to cart, were also identified to be among the important
five runs of the DNN (corresponding to the five-fold cross-validation) predictors. They indicated the existence of a relationship between
accepted different combinations of predictor variables in order to ach platform engagement and purchase behavior. The first variable indi
ieve the best performance. This difference can be attributed to the cated a casual platform engagement session involving prospection while
varying specificities of each model [34]. Therefore, the results in Table 6 the second variable indicated consideration to add some products to the
denote the average impact of each predictor variable on the accuracy of cart and preparation for successful checkout. The variable lowest price of
the model. Also, the relative importance indices across the other four ML product added to cart was found to be a better predictor as compared to
techniques exhibited similar trends. the variable lowest price of product clicked on. This result suggested that
Of the top twelve predictors, nine belonged to the platform spending time in browsing for products did not necessarily convert to
engagement category. This result suggests that platform engagement actual purchases. Browsing was a phase where a potential customer
with the online e-commerce platform heavily impacts their purchase often explored and compared products and their features and this did
decisions. Specifically, the variables time when the session began and day not necessarily indicate an intent to purchase.
of the week when the session began were highly predictive. These results Finally, customer score indicated the effectiveness of the retailer’s
suggested that time had a significant impact on customer purchases on process for assigning this score to each customer. However, since this
an online platform. At specific times of the day and on specific days, the variable was proprietary and highly dependent on the product and other
customers had a higher propensity to purchase. In other words, there related characteristics, it was unlikely to exhibit importance for pre
existed peak shopping times similar to offline stores, even for online diction of purchases on other online retail platforms.
shopping platforms. Another interesting observation from the results shown in Table 5
Customer account lifetime and days elapsed since last purchase were also was that the duration of a session and the number of times the user logged in
found to be universally important for all techniques. These variables on an online e-commerce platform did not significantly impact pur
captured the sensitivity of the type of customer’s association with the e- chase. This indicated that engagement did not necessarily represent
commerce platform and established the conventionally accepted belief intention to purchase. This observation reinforced the need to design
that loyal customers had a higher propensity to purchase than new better recommendations and enhance platform engagement to help
customers. Interestingly, although customer account lifetime was found to customers make quicker purchase decisions while limiting online re
be a good predictor of actual sales, the number of payments made by the sources devoted to each customer. Online e-commerce platforms needed
customer did not significantly contribute to purchase decision. This to address the balance between higher customer loyalty through better
suggested that even though the customer might not have multiple (often longer) engagement and streamlined design to support quicker
purchases (also implying lower platform engagement).
Table 6
6.2. Sensitivity analysis-based feature selection
Relative importance of variables.
Variable category Variables DNN Feature selection is a core component of any machine learning
Impact on Relative application. Discarding redundant variables improves the predictive
accuracy (%) importance accuracy of ML and DL models, speeds up the training process, and re
Day of the week when duces the overall cost of computation [34,56]. The two frequently used
0.92 0.44
session began feature selection methods include filter-based and wrapper-based
Time when session methods. Filter-based methods (e.g., correlation coefficient) depend
0.42 0.20
began
Sum of prices of all
on properties of data and are often carried out as part of the data pre-
0.25 0.12 processing stage. These methods suffer from instability due to inde
products added to cart
Sum of prices of all pendence from underlying predictive models [57]. The wrapper-based
0.11 0.05
products clicked on methods have been found to perform better than filter-based methods
Platform
Lowest price of product
engagement 0.10 0.05 because they use the knowledge of the underlying learning algorithms
added to cart
attributes
Number of products [58]. Therefore, we adopted a wrapper-based sensitivity analysis-based
0.09 0.04
added to cart feature selection method that has been used in previous studies by Zhang
Highest price of product
0.07 0.04
[58] and Shen et al. [46], to generate an optimal subset of features
added to cart which can most accurately predict purchases.
Number of products
clicked on
0.07 0.03 For this, we re-trained the DL and ML techniques with the features
Highest price of product shown in Table 6, in a descending order. This meant that the first input
0.06 0.03
clicked on variable was the one with the highest relative importance and so on. The
Customer account
2.10 1.00 changes in average accuracies for the ML and Dl models over five runs
lifetime
(corresponding to the five-fold cross-validation process) are shown in
Days elapsed since last
Customers’
purchase
1.48 0.71 Fig. 2.
attributes Fig. 2 indicates that the accuracy of all models peaked when trained
Customer account score
assigned by the online 0.15 0.07 with the top twelve features and decreased thereafter. This could be
retailer attributed to the curse of dimensionality which meant that adding new
features during the training of a model, beyond an optimal number,
7
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
Fig. 2. The change in average accuracies of ML and DL techniques with feature selection.
could lead to degradation of its predictive performance [59]. However, [37] have limited business application, and hence it is prudent to focus
unlike the ML techniques which exhibited sharper reduction in their on retail sales data to develop insights about purchase behavior. We use
accuracies after the optimal point, the relative degradation in accuracy actual purchase data from a platform and show that it is possible to
for the DNN was lower. For all techniques, the sharp decline in accuracy predict purchase with a high accuracy using DNN.
could be attributed to overfitting and the noise introduced due to the Secondly, this study compares the usage of multiple advanced
addition of redundant variables. The DNN was able to handle the noise analytical techniques for prediction of purchases on an online platform.
better which made it a better choice as a predictive technique. Previous studies have asserted the importance of advanced analytical
techniques for predictive purposes [38]. We adopted a comparative
approach to examine the efficacy of DL and ML for the dataset. This
6.3. DL based prediction of purchase behavior
examination further supports other studies that have demonstrated the
higher predictive power of DL for large datasets [65]. As a result, it
Comparing the results of our analysis, we found strong evidence of
provides further empirical evidence to support the superior predictive
better performance of DL techniques over conventional ML techniques.
capabilities of DL in such a context.
The analysis using the DNN improved the accuracy over the widely used
Finally, DL has been used in domains like healthcare [41], and sales
RF model by over 6% and ROC-AUC by around 4%. While these im
prediction [34] but not in the area of online retailing. Our research has
provements might not be significantly large, but its implication for
shown that the DNN was the best performing analytical technique that
businesses is enormous, given that it would directly translate into
offered high predictive power for purchases on an online shopping
improved purchase prediction. Additionally, we found that the FPR
platform. The popular DT technique was found to be close to it in terms
decreased by close to 10% when comparing the DT technique with the
of accuracy. Future research can use these insights when using multiple
DNN. The corresponding decrease between the best ML technique (SVM)
techniques for research in this and other similar domains.
with DT was 7%. Improvement in false positives is very important for
Apart from these contributions, this study advances the extant debate
online businesses in this context as it would help them to improve their
on improving prediction for e-commerce sales. For example, some
understanding of the underlying factors affecting the purchase decision.
interesting results from this study like the significance of user account
Hence, DNN could be used as a potent tool for businesses to improve
age, rather than the number of past transactions, on consumers’ pur
their bottom line through better prediction.
chase opens up new debates on customer loyalty and platform engage
However, DNN is more resource-intensive than conventional ML
ment. This needs to be responded through intensive studies in the future
techniques. A slew of methods have been proposed, for example,
about the impact of customer loyalty (specifically relating to past pur
network pruning and deep compression [60], to reduce the resource
chases) on future purchases.
overhead without comprising on their performance. These new tech
niques involve encoding and removal of less important network weights
to generate faster and smaller NN. The use of these techniques makes the 7.2. Managerial implications
DNN suitable for a wide range of practical applications.
Based on the identification of significant predictors of online pur
7. Implications chases, we observe that platform designers should choose to design the
online platform for quicker purchase when the competition from other
7.1. Academic implications channels and competitive options is high. In such cases, platform arte
facts supporting higher engagement may not be a smart choice as it they
There are three major academic implications of this study. First, this not lead to a positive purchase decision. Our results also indicate that
study bridges the acknowledged need to predict actual purchase instead account age impacts the purchase decision. Hence, if the competition is
of purchase intentions [38]. As mentioned earlier, proxies like sales rank not very high, awareness of the product is low, and the cost of
8
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
engagement is not very high, it will be beneficial for businesses to [2] W.W. Moe, Buying, searching, or browsing: differentiating between online
shoppers using in-store navigational clickstream, J. Consum. Psychol. 13 (2003)
develop long term relationship with all users on the platform.
29–39, https://doi.org/10.1207/S15327663JCP13-1&2_03.
Finally, our use of customers’ historical data to improve the pre [3] A.E. Schlosser, T.B. White, S.M. Lloyd, Converting web site visitors into buyers:
diction of purchases on online e-commerce platforms, also accentuates how web site investment increases consumer trusting beliefs and online purchase
the need to invest in appropriate infrastructure to control the quality and intentions, J. Mark. 70 (2006) 133–148, https://doi.org/10.1509/jmkg.70.2.133.
[4] V. Kumar, G. Ramani, T. Bohling, Customer lifetime value approaches and best
veracity of sales data. Currently, businesses capture data for a large practice applications, J. Interact. Mark. 18 (2004) 60–72, https://doi.org/
number of variables and this often raises various privacy concerns. Also, 10.1002/dir.20014.
such unorganized data collection strains the organisation’s computa [5] D. Van Den Poel, W. Buckinx, Predicting online-purchasing behaviour, Eur. J.
Oper. Res. 166 (2005) 557–575, https://doi.org/10.1016/j.ejor.2004.04.022.
tional resources. Businesses can use the results from this study to [6] L.F. Jamieson, F.M. Bass, Adjusting stated intention measures to predict trial
streamline their data collection practices and focus only on collecting purchase of new products: a comparison of models and methods, J. Mark. Res. 26
those data items that can directly predict and influence the customers’ (1989) 336, https://doi.org/10.2307/3172905.
[7] T.R. Rao, Consumer’s purchase decision process: stochastic models, J. Mark. Res. 6
purchases. (1969) 321, https://doi.org/10.2307/3150138.
[8] S. Karimi, K.N. Papamichail, C.P. Holland, The effect of prior knowledge and
8. Conclusion decision-making style on the online purchase decision-making process: a typology
of consumer shopping behaviour, Decis. Support. Syst. 77 (2015) 137–147, https://
doi.org/10.1016/j.dss.2015.06.004.
This study examined predicted the actual purchase behavior of cus [9] C.K. Prahalad, V. Ramaswamy, Co-creation experiences: the next practice in value
tomers on an online platform as a consequence of their interactions with creation, J. Interact. Mark. 18 (2004) 5–14, https://doi.org/10.1002/dir.20015.
[10] M. Brown, N. Pope, K. Voges, Buying or browsing? An exploration of shopping
the online platform. For this, it used a unique anonymized web browsing
orientations and online purchase intention, Eur. J. Mark. 37 (2003) 1666–1684,
dataset comprised of historical purchase data along with data related to https://doi.org/10.1108/03090560310495401.
customer engagement with the online e-commerce platform. The study [11] L. Muzellec, E. O’Raghallaigh, Mobile technology and its impact on the consumer
identified two distinct sets of variables, i.e., platform engagement and decision-making journey how brands can capture the mobile-driven “ubiquitous”
moment of truth, J. Advert. Res. 58 (2018) 12–15, https://doi.org/10.2501/JAR-
customer characteristics, as key predictors of purchases. Out of these 2017-058.
two categories, we identified four variables, namely the time and day of [12] J.W. Palmer, Web site usability, design, and performance metrics, Inf. Syst. Res. 13
the week when a session began, duration of a customer’s association (2002) 151–167, https://doi.org/10.1287/isre.13.2.151.88.
[13] H.-H. Teo, L.-B. Oh, C. Liu, K.-K. Wei, An empirical study of the effects of
with the platform, and days elapsed since last purchase, as the most interactivity on web user attitude, International Journal of Human-Computer
significant contributors to accurate prediction of a purchase decision. Studies. 58 (2003) 281–305, https://doi.org/10.1016/S1071-5819(03)00008-9.
Additionally, the results found that the DNN outperformed other ML [14] M.M. Alhammad, S.R. Gulliver, Persuasive technology and users acceptance of E-
commerce: users perceptions of website persuasiveness, J. Electron. Commer.
techniques when applied on the same dataset. Retailers and e-commerce Organ. 12 (2014) 1–13, https://doi.org/10.4018/jeco.2014040101.
platform designers can use these findings and integrate them into their [15] H.A.M. Voorveld, G. Van Noort, M. Duijn, Building brands with interactivity: the
existing recommendation engines to improve their predictability. One role of prior brand usage in the relation between perceived website interactivity
and brand responses, J. Brand Manag. 20 (2013) 608–622, https://doi.org/
way to use these findings in such retail IT systems would be to assign 10.1057/bm.2013.3.
greater weightage to these variables in the existing recommendation [16] B.H. Ye, A.A. Barreda, F. Okumus, K. Nusair, Website interactivity and brand
engines. These, often ignored, platform engagement and customer development of online travel agencies in China: the moderating role of age, J. Bus.
Res. 99 (2019) 382–389, https://doi.org/10.1016/j.jbusres.2017.09.046.
characteristic variables should improve the accuracy of predictions
[17] J. Marbach, C.R. Lages, D. Nunan, Who are you and what do you value?
without major disruption to their ongoing operations. These findings Investigating the role of personality traits and customer-perceived value in online
will help improve purchase prediction using DNN on similar platforms. customer engagement, J. Mark. Manag. 32 (2016) 502–525, https://doi.org/
Despite these insights, this study has some limitations. First, this 10.1080/0267257X.2015.1128472.
[18] J.S. Stewart, E.G. Oliver, K.S. Cravens, S. Oishi, Managing millennials: embracing
study is situated in the context of retail sales in a particular e-commerce generational differences, Business Horizons. 60 (2017) 45–54, https://doi.org/
platform. While the dataset is sufficiently large to allow usage of ML and 10.1016/j.bushor.2016.08.011.
DL techniques for predicting purchase behavior, a much larger dataset [19] L. Aksoy, A. van Riel, J. Kandampully, J. Wirtz, A. Den Ambtman, J. Bloemer,
C. Horváth, B. Ramaseshan, J. Van de Klundert, Z. Gurhan Canli, Managing brands
might require some tweaking of the model to further improve its pre and customer engagement in online brand communities, J. Serv. Manag. 24 (2013)
dictive capabilities. Second, the data represents the customer behavior 223–244, https://doi.org/10.1108/09564231311326978.
in a European online shopping context in a particular product space. [20] S. Kagan, R. Bekkerman, Predicting purchase behavior of website audiences, Int. J.
Electron. Commer. 22 (2018) 510–539, https://doi.org/10.1080/
Although representative of the context, the findings might not be 10864415.2018.1485084.
generalizable across different customer demographics and product [21] N. Ravaja, O. Somervuori, M. Salminen, Predicting purchase decision: the role of
types. Further studies across different purchase contexts would be hemispheric asymmetry over the frontal cortex, journal of neuroscience,
psychology, and, Economics. 6 (2013) 1–13, https://doi.org/10.1037/a0029949.
needed to improve the generalizability of the findings. Additionally, [22] V. Venkatesh, R. Agarwal, Turning visitors into customers: a usability-centric
such future studies could allow us to analyze the findings and improve perspective on purchase behavior in electronic channels, Manag. Sci. 52 (2006)
them by statistically comparing the performance of various predictive 367–382, https://doi.org/10.1287/mnsc.1050.0442.
[23] C. Sismeiro, R.E. Bucklin, Modeling purchase behavior at an e-commerce web site:
models. Third, this study was not able to compare the predictive accu
a task-completion approach, J. Mark. Res. 41 (2004) 306–323, https://doi.org/
racy of the model in real-time when the data was generated. This would 10.1509/jmkr.41.3.306.35985.
have allowed us to provide just-in-time recommendations to improve [24] A.G. Close, M. Kukar-Kinney, Beyond buying: motivations behind consumers’
retail sales and would have far superior practical implications. We hope online shopping cart use, J. Bus. Res. 63 (2010) 986–992, https://doi.org/
10.1016/j.jbusres.2009.01.022.
that future studies can adopt such combined approaches that generate [25] R. Olbrich, C. Holsing, Modeling consumer purchasing behavior in social shopping
real-time recommendations that would have strong business implica communities with clickstream data, Int. J. Electron. Commer. 16 (2011) 15–40,
tions. Finally, future research can also explore the development of a https://doi.org/10.2753/JEC1086-4415160202.
[26] S.L. Gortmaker, D.W. Hosmer, S. Lemeshow, Applied logistic regression, Contemp.
deep learning-based rule extraction method with applications in related Sociol. 23 (1994) 159, https://doi.org/10.2307/2074954.
situations and compare its performance with the existing benchmarks. [27] J.R. Quinlan, Induction of decision trees, Mach. Learn. 1 (1986) 81–106, https://
Author Statement. doi.org/10.1023/A:1022643204877.
[28] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32, https://doi.org/
The authors do not wish to include any author contribution state 10.1017/CBO9781107415324.004.
ment for the paper. [29] H. Drucker, C.J.C. Surges, L. Kaufman, A. Smola, V. Vapnik, Support vector
regression machines, in: Advances in Neural Information Processing Systems,
1997: pp. 155–161.
References [30] W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous
activity, The Bulletin of Mathematical Biophysics. 5 (1943) 115–133, https://doi.
[1] W.W. Moe, P.S. Fader, Dynamic conversion behavior at e-commerce sites, Manag. org/10.1007/BF02478259.
Sci. 50 (2004) 326–335, https://doi.org/10.1287/mnsc.1040.0153.
9
N. Chaudhuri et al. Decision Support Systems xxx (xxxx) xxx
[31] S. Lessmann, B. Baesens, H.V. Seow, L.C. Thomas, Benchmarking state-of-the-art [50] S. Boughorbel, F. Jarray, M. El-Anbari, Optimal classifier for imbalanced data using
classification algorithms for credit scoring: an update of research, Eur. J. Oper. Res. Matthews Correlation Coefficient metric, PLoS ONE. 12 (2017). doi:https://doi.org
247 (2015) 124–136, https://doi.org/10.1016/j.ejor.2015.05.030. /10.1371/journal.pone.0177678.
[32] M. Chau, H. Chen, A machine learning approach to web page filtering using [51] D. Chicco, G. Jurman, The advantages of the Matthews correlation coefficient
content and structure analysis, Decis. Support. Syst. 44 (2008) 482–494, https:// (MCC) over F1 score and accuracy in binary classification evaluation, BMC
doi.org/10.1016/j.dss.2007.06.002. Genomics 21 (2020) 6, https://doi.org/10.1186/s12864-019-6413-7.
[33] S. Dreiseitl, L. Ohno-Machado, Logistic regression and artificial neural network [52] H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin, Exploring strategies for training
classification models: a methodology review, J. Biomed. Inform. 35 (2002) deep neural networks, J. Mach. Learn. Res. 10 (2009) 1–40, https://doi.org/
352–359, https://doi.org/10.1016/S1532-0464(03)00034-0. 10.1145/1577069.1577070.
[34] A.L.D. Loureiro, V.L. Miguéis, L.F.M. da Silva, Exploring the use of deep neural [53] Y. Lecun, Y. Bengio, G. Hinton, Deep learning, Nature. 521 (2015) 436–444,
networks for sales forecasting in fashion retail, Decis. Support. Syst. 114 (2018) https://doi.org/10.1038/nature14539.
81–93, https://doi.org/10.1016/j.dss.2018.08.010. [54] B. Kim, J. Park, J. Suh, Transparency and accountability in AI decision support:
[35] M. Korpusik, S. Sakaki, F. Chen, Y.Y. Chen, Recurrent neural networks for customer Explaining and visualizing convolutional neural networks for text information,
purchase prediction on Twitter, in: CEUR Workshop Proceedings, 2016: pp. 47–50. Decision Support Systems. 134 (2020). doi:https://doi.org/10.1016/j.dss.2020.11
[36] M. Mousavizadeh, D.J. Kim, R. Chen, Effects of assurance mechanisms and 3302.
consumer concerns on online purchase decisions: an empirical study, Decis. [55] C. Strobl, A.L. Boulesteix, T. Kneib, T. Augustin, A. Zeileis, Conditional variable
Support. Syst. 92 (2016) 79–90, https://doi.org/10.1016/j.dss.2016.09.011. importance for random forests, BMC Bioinformatics. 9 (2008) 307, https://doi.org/
[37] C. Koçaş, C. Akkan, A system for pricing the sales distribution from blockbusters to 10.1186/1471-2105-9-307.
the long tail, Decis. Support. Syst. 89 (2016) 56–65, https://doi.org/10.1016/j. [56] I. Iguyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach.
dss.2016.06.008. Learn. Res. 3 (2003) 1157–1182, https://doi.org/10.1162/153244303322753616.
[38] X. Hu, Q. Huang, X. Zhong, R.M. Davison, D. Zhao, The influence of peer [57] J.B. Yang, K.Q. Shen, C.J. Ong, X.P. Li, Feature selection via sensitivity analysis of
characteristics and technical features of a social shopping website on a consumer’s MLP probabilistic outputs, in: Proceedings of the 2008 IEEE International
purchase intention, Int. J. Inf. Manag. 36 (2016) 1218–1230, https://doi.org/ Conference on Systems, Man and Cybernetics, 2008: pp. 774–779. doi:https://doi.
10.1016/j.ijinfomgt.2016.08.005. org/10.1109/ICSMC.2008.4811372.
[39] G.A. Morgan, K.C. Barrett, N.L. Leech, G.W. Gloeckner, SPSS for introductory [58] P. Zhang, A novel feature selection method based on global sensitivity analysis
statistics: Use and interpretation (2004), https://doi.org/10.4324/ with application in machine learning-based prediction model, Applied Soft
9780429287657. Computing Journal. 85 (2019) In Press. doi:https://doi.org/10.1016/j.asoc.20
[40] E. Zinovyeva, W.K. Härdle, S. Lessmann, Antisocial online behavior detection using 19.105859.
deep learning, Decision Support Systems. 137 (2020). doi:https://doi.org/10.101 [59] J. Nascimento, W. Powell, Dynamic programming models and algorithms for the
6/j.dss.2020.113362. mutual fund cash balance problem, Manag. Sci. 56 (2010) 801–815, https://doi.
[41] H.M. Zolbanin, B. Davazdahemami, D. Delen, A.H. Zadeh, Data analytics for the org/10.1287/mnsc.1100.1143.
sustainable use of resources in hospitals: predicting the length of stay for patients [60] Y. He, X. Zhang, J. Sun, Channel pruning for accelerating very Deep Neural
with chronic diseases, Information and Management. In Press (2020), https://doi. Networks, in: Proceedings of the IEEE International Conference on Computer
org/10.1016/j.im.2020.103282. Vision, 2017: pp. 1398–1406. doi:https://doi.org/10.1109/ICCV.2017.155.
[42] H. Ahady Dolatsara, Y.J. Chen, C. Evans, A. Gupta, F.M. Megahed, A two-stage [61] N. Chaudhuri, I. Bose, Exploring the role of deep neural networks for post-disaster
machine learning framework to predict heart transplantation survival probabilities decision support, Decision Support Systems. 130 (2020). doi:https://doi.
over time with a monotonic probability constraint, Decision Support Systems. 137 org/10.1016/j.dss.2019.113234.
(2020). doi:https://doi.org/10.1016/j.dss.2020.113363. [62] A. Martínez, C. Schmuck, S. Pereverzyev, C. Pirker, M. Haltmeier, A machine
[43] Y. Guan, Q. Wei, G. Chen, Deep learning based personalized recommendation with learning framework for customer purchase prediction in the non-contractual
multi-view information integration, Decis. Support. Syst. 118 (2019) 58–69, setting, Eur. J. Oper. Res. 281 (2020) 588–596, https://doi.org/10.1016/j.
https://doi.org/10.1016/j.dss.2019.01.003. ejor.2018.04.034.
[44] Y. Rao, H. Xie, J. Li, F. Jin, F.L. Wang, Q. Li, Social emotion classification of short [63] B.J.D. Jacobs, B. Donkers, D. Fok, Model-based purchase predictions for large
text via topic-level maximum entropy model, Inf. Manag. 53 (2016) 978–986, assortments, Mark. Sci. 35 (2016) 389–404, https://doi.org/10.1287/
https://doi.org/10.1016/j.im.2016.04.005. mksc.2016.0985.
[45] Y. Bengio, Gradient-based optimization of hyperparameters, Neural Comput. 12 [64] X. Lu, S. He, S. Lian, S. Ba, J. Wu, Is user-generated content always helpful? The
(2000) 1889–1900, https://doi.org/10.1162/089976600300015187. effects of online forum browsing on consumers’ travel purchase decisions, Decision
[46] K.Q. Shen, C.J. Ong, X.P. Li, E.P.V. Wilder-Smith, Feature selection via sensitivity Support Systems. 137 (2020). doi:https://doi.org/10.1016/j.dss.2020.113368.
analysis of SVM probabilistic outputs, Mach. Learn. 70 (2008) 1–20, https://doi. [65] P.J. Danaher, T.S. Danaher, M.S. Smith, R. Loaiza-Maya, Advertising effectiveness
org/10.1007/s10994-007-5025-7. for multiple retailer-brands in a multimedia and multichannel environment,
[47] S. Lee, J.Y. Choeh, Predicting the helpfulness of online reviews using multilayer J. Mark. Res. 57 (2020) 445–467, https://doi.org/10.1177/0022243720910104.
perceptron neural networks, Expert Syst. Appl. 41 (2014) 3041–3046, https://doi. [66] A. Poddar, N. Donthu, Y. Wei, Web site customer orientations, web site quality, and
org/10.1016/j.eswa.2013.10.034. purchase intentions: the role of web site personality, J. Bus. Res. 62 (2009)
[48] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a 441–450, https://doi.org/10.1016/j.jbusres.2008.01.036.
simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 [67] G. Zhu, Z. Wu, Y. Wang, S. Cao, J. Cao, Online purchase decisions for tourism e-
(2014) 1929–1958. commerce, Electronic Commerce Research and Applications. 38 (2019). doi:https
[49] D.M.W.D. Powers, Evaluation: from precision, recall and f-factor to ROC, ://doi.org/10.1016/j.elerap.2019.100887.
informedness, markedness & correlation, Journal of Machine Learning
Technologies. 2 (2011) 37–63. doi:10.1.1.214.9232.
10