RP3656

1
DOES MOBILE APP FAILURE IMPACT

ONLINE AND IN-STORE SHOPPING?
Unnati Narang
Venkatesh Shankar
Sridhar Narayanan
July 2021
* Unnati Narang (unnati@illinois.edu) is Assistant Professor of Marketing, University of Illinois, Urbana

Champaign, Venkatesh Shankar (vshankar@mays.tamu.edu) is Professor of Marketing and Coleman Chair in
Marketing and Director of Research, Center for Retailing Studies at the Mays Business School, Texas A&M
University, and Sridhar Narayanan (sridhar.narayanan@stanford.edu) is an Associate Professor of Marketing at the
Graduate School of Business, Stanford University. We thank the participants at the ISMS Marketing Science
conference, the UTDFORMS conference, and research seminar participants at the University of California, Davis,
the University of Toronto, the University of Illinois, Urbana Champaign, and the University of Texas at Austin for
valuable comments.
2
DOES MOBILE APP FAILURE IMPACT

ONLINE AND IN-STORE SHOPPING?
Abstract
Mobile devices account for a majority of transactions between shoppers and marketers. Branded
retailer mobile apps have been shown to increase purchases across channels (e.g., online and
brick-and-mortar store). However, mobile apps are also prone to failures or disruptions in use.
Does a failure in an omnichannel retailer’s branded app impact shoppers’ purchases? Are there
cross-channel effects of these failures? Does the impact of failure vary across shoppers? These
questions, although important, are challenging to answer because field experiments inducing
failures are infeasible, and observational data suffer from selection issues. We identify a natural
experiment involving an exogenous two-hour failure in a large omnichannel retailer’s mobile app
during which the app became unavailable to all the shoppers. We examine the short-term impact
of app failures on purchases in both online and store channels using a difference-in-differences
approach. We investigate two potential mechanisms behind these effects, channel substitution
and brand preference dilution. We also analyze shopper heterogeneity in the effects based on
shoppers’ relationship with the retailer, past digital channel use, and other characteristics
identified using machine-learning approaches. Our analysis reveals that an app failure has a
significant overall negative effect on shoppers’ frequency, quantity, and monetary value of
purchases across channels. The effects are heterogeneous across channels and by shoppers. The
decreases in purchases across channels are driven by purchase reductions in stores and not in the
online channel. The fall in purchases in brick-and-mortar stores is consistent with the brand
preference dilution mechanism, whereas the preservation of purchases in the online channel is in
line with the channel substitution mechanism. Furthermore, shoppers with a higher monetary
value of past purchases and less recent purchases are less sensitive to app failures. The results
suggest that app failures lead to an annual revenue loss of about $0.97-$1.36 million for the
retailer in our data. About 47% shoppers contribute to about 70% of the loss. We outline targeted
failure prevention and service recovery strategies that retailers could employ.
Keywords: service failure, mobile marketing, mobile app, retailing, omnichannel, difference-in-
differences, natural experiment, causal effects
1
1. Introduction
Mobile commerce has seen tremendous growth in recent years, with mobile devices accounting
for a majority of interactions between shoppers and marketers. This growth has accelerated
through the rapid increase of smartphone penetration – about 6.1 billion people (41.5% of the
global population) used smartphones in 2020.1 Mobile applications (henceforth, apps) have
emerged as an important channel for retailers as they have been found to increase engagement
and purchases across channels (e.g., Kim et al. 2015, Xu et al. 2017, Narang and Shankar 2019).
Purchases made through retail apps grew by 54% during the recent COVID-19 pandemic (Retail
Dive 2020). While retailers have widely embraced mobile apps, there is little understanding
about how service failures in mobile apps affect shopper behavior. In this study, we empirically
examine the impact of such failures on shopper behaviors in an omnichannel context, specifically
how it varies across channels and shoppers. Causal measurement of the effect of app failures is
challenging, but we address the identification problem by using a novel natural experiment
where failures varied exogenously across users.
The study of failures in mobile apps is important because apps are highly vulnerable to
failures. The diversity of mobile operating systems (e.g., iOS, Android), devices (e.g., mobile
phone and tablet), and versions of hardware and software and their constant use across a variety
of mobile networks often result in app failures. Failures in a retailer’s mobile app have the
potential to negatively affect shoppers’ engagement with the app and their future purchases in the
online channel. In addition, app failures may have spillover effects in other channels due to both
substitution of purchases across channels and dilution of preference for the retailer brand.
Understanding how failures affect purchases in different channels is important for retailers.
1 Source: Statista report on smartphone penetration (https://tinyurl.com/b6ajrmyc) last accessed June 2, 2021.
2
Preventing and recovering from app failures is critical for managers because more than 60%
of shoppers abandon an app after experiencing failure(s) (Dimensional Research 2015). App
crashes are among the leading causes of mobile failures, contributing 65% to all iOS failures
(Blancco 2016). In 2020, several major consumer and ecommerce apps reported large volumes
of crashes due to software issues (Bugsnag 2020). About 2.6% of all app sessions result in a
crash, suggesting about 1.5 billion app failures across 60 billion app sessions annually
(Computerworld 2014). Given the extent of these app failures and their potential damage to
firms’ relationships with customers, determining the impact of app failures is important for
formulating preventive and recovery strategies.
Despite the importance of app failures, not much is known about their impact on purchases.
While app crashes in a shopper’s mobile device have been shown to negatively influence app
engagement (e.g., restart time, browsing duration, and activity level, Shi et al. 2017), the
relationship between app failures and subsequent purchases has not been studied. Furthermore, a
large proportion of shoppers use both online (desktop and mobile websites) and offline (brick-
and-mortar) retail channels. However, we do not know much about the impact of app failures on
purchases by channel, including spillovers across channels.
The effect of app failures on subsequent purchases is an empirical question because different
mechanisms can lead to different outcomes. On the one hand, multiple shopping channels allow
shoppers to substitute channels in case of a failure, mitigating any negative impact of the failure
and even potentially resulting in a positive effect of the failure on purchases in other channels
(channel substitution effect). On the other hand, an app failure might cause shoppers to evaluate
the brand adversely, leading to a negative impact of the failure on purchases across channels
(brand preference dilution effect). These two mechanisms may coexist both across and within
3
shoppers. Therefore, the sign of the net effect of failures on purchases in other channels depends
on the magnitudes of the effects induced by these two mechanisms. Retailers would benefit from
a deeper understanding of these mechanisms for devising strategies to deal with app failures.
The effects of app failure may also differ across shoppers. Shoppers may be more or less
negatively impacted by failures depending on factors such as shoppers’ relationship with the firm
(Goodman et al. 1995, Hess et al. 2003, Chandrashekaran et al. 2007, Knox and van Oest 2014,
Ma et al. 2015) and shoppers’ prior use of the firm’s digital channels (Cleeren et al. 2013, Liu
and Shankar 2015, Shi et al. 2017). It is important for managers to better understand how the
effects of failure vary across shoppers so that they can devise targeted preventive and recovery
strategies.
Our study quantifies and explains the impact of a failure in a retailer’s branded app on the
frequency, quantity, and monetary value of purchases in online and offline channels. We address
four research objectives:
 What are the effects of a failure in a retailer’s branded mobile app on the frequency,
quantity, and monetary value of subsequent purchases by the shoppers?
 How do these effects vary by channel, i.e., in the online and in-store channels of the
retailer?
 What possible mechanisms can explain the effects of an app failure?
 How do these effects vary by shoppers, i.e., by their past relationship with the firm, prior
channel use, and other shopper characteristics?
Estimation of the causal effects of an app failure on shopping outcomes is challenging. The
gold standard among the methods available to uncover the causal impact of service failures is a
randomized field experiment. However, such an experiment would be impractical in this context
because a retailer will unlikely deliberately induce failures in an app even for a subset of its
shoppers for ethical reasons. An observational study is a viable alternative, but it has to surmount
the potential endogeneity of app failures, which may occur at different times for different app
4
users. Although endogeneity could be addressed through an instrumental variables approach, it is
hard to come up with instrumental variables that are valid and exhibit sufficient variation.
We overcome the estimation challenges and mitigate the potential endogeneity of app
failures by exploiting a natural experiment involving a two-hour systemwide (affecting all app
users who attempted to use the app) exogenous failure in a large omnichannel retailer’s (similar
to Walmart and Macy’s) mobile app to estimate the short-term effect of the app failure.
Conditional on signing in on the day of the failure, whether a user experienced a failure or not
was a function of whether they attempted to use the app during the time window of the failure,
which they could not have anticipated in advance. We verify that there are no systematic
differences between users who experienced failures vs. those who did not. We take advantage of
the resulting quasi-randomness in the incidence of failure to estimate the short-term effects of the
app failure. We use a difference-in-differences (DID) approach that compares the pre- and post-
failure outcomes for the failure experiencers with those of failure non-experiencers to estimate
the effects of the app failure 14 days pre- and post- the failure.
We explore the two potential mechanisms behind the effects of the app failure, channel
substitution and brand preference dilution. We investigate the heterogeneity in the effects of
failure on shopping behavior by exploiting the panel nature of our dataset. We test for the effects
separately in online and offline channels. We also examine the moderating effects of factors such
as relationship with the firm and prior digital channel use on the effects of failure. Prior research
(e.g., Ma et al. 2015, Hansen et al. 2018) has explored these factors for services in general but
not in the digital or mobile app contexts. In addition, we recover the heterogeneity of effects at
the individual level using data-driven machine learning methods.

5
Our results show that a failure in the branded app of a large retailer has a significant overall
negative effect on shoppers’ frequency, quantity, and monetary value of purchases across
channels, but the effects are heterogeneous across channels and shoppers. Purchases in stores
decline significantly, but those in the online channel do not. Our analyses suggest that the fall in
purchases in stores is consistent with the brand preference dilution mechanism, whereas the
preservation of purchases in the online channel is in line with the channel substitution
mechanism. Furthermore, shoppers with a higher monetary value of past purchases and less
recent purchases are less sensitive to app failures. Finally, about 47% of these shoppers
contribute to about 70% of the losses in annual revenues that amount to $0.97-$1.36 million.
Our research contributes to the literature by: (1) quantifying the effects of app failure on
multiple purchase outcomes such as frequency, quantity, and monetary value of purchases; (2)
examining the impact of app failure in different channels, including channel spillover effects; (3)
exploring the mechanisms behind the observed effects; and (4) uncovering the moderators of the
effects of app failure on purchases and the heterogeneity in effects across shoppers. These novel
characteristics of our study contribute to the research streams on service marketing, channel
choice, and mobile apps.
In the remainder of the paper, we first discuss the related literature. Next, we discuss our data
and empirical setting. Subsequently, we describe our empirical strategy, lay out and test the key
identification strategy, and conduct our empirical analysis. We estimate the effect of an app
failure across all channels and for each channel, examine the mechanisms underlying the effects
and assess the heterogeneity of effects by shopper characteristics. We perform several robustness
checks to rule out alternative explanations. We conclude by discussing the implications of our
results for managers.

6
2. Related Literature
2.1. Services Marketing and Service Failures
In recent years, technology-enabled services have risen in importance, leading to important shifts
(Dotzel et al. 2013). First, services that can be delivered without human or interpersonal
interaction have grown tremendously. Online and mobile retailing no longer require shoppers to
interact with human associates to make purchases. Second, closely related to this idea is the fact
that services are increasingly powered by technologies such as mobile apps that allow anytime-
anywhere access and convenience. Third, recent events such as the COVID-19 pandemic have
made it necessary for services to be delivered with little physical contact with sales associates,
boosting consumer adoption of technology-driven solutions.
With growing reliance on technologies for service delivery and the complexity of the
technology environment in which these services are delivered, service failures are attracting
greater attention. A service failure can be defined as service performance that falls below
customer expectations (Hoffman and Bateson 2001). Service failures are widespread and are
expensive to mend. Service failures resulting from deviations between expected and actual
performance damage customer satisfaction and brand preference (Smith and Bolton 1998). Post-
failure satisfaction tends to be lower even after a successful recovery and is further negatively
impacted by the severity of the initial failure (Andreassen 1999, McCollough et al. 2000). In
interpersonal service encounters, human interactions and employee behaviors influence both
failure effect and recovery (Bitner et al. 1990, Meuter et al. 2000). In technology-based
encounters, such as those in e-tailing and with self-service technologies (e.g., automated teller
machines [ATMs]), the opportunity for human interaction is typically small after experiencing
failure (Forbes et al. 2005, Forbes 2008). However, there may be significant heterogeneity in
7
how consumers react to service failures (Halbheer et al. 2018).
The mobile context, particularly mobile apps, differs from interpersonal or other self-service
technology contexts, so it is difficult to predict the direction and extent of the impact of an app
failure on shopping outcomes. First, mobile apps are accessible at any time and in any location
through an individual’s mobile device. On the one hand, because a shopper can tap, interact,
engage, or transact multiple times at little additional cost on a mobile app, the shopper may treat
any one service failure as acceptable without significantly altering her subsequent shopping
outcomes. Such an experience differs from that with a self-service technological device such as
an ATM, which may need the shopper to travel to a specific location or incur other hassle costs
that may not exist in the mobile app context. On the other hand, the costs of switching to a
competitor are also much lower in the mobile app context, where a typical shopper uses and
compares multiple apps. Thus, a service failure in any one app may aggravate the shopper’s
frustration with the app, leading to strong negative effects on outcomes such as purchases from
the relevant app provider.
Second, a mobile app is one of the many touchpoints available to shoppers in today’s
omnichannel shopping environment. Thus, a shopper who experiences a failure in the app could
move to the web-based channel or even the offline or store channel. In such cases, the impact of
a failure on the app could be zero or even positive (if the switch to the other channel leads to
greater engagement of the shopper with the retailer). By contrast, if the channels act as
complements (e.g., if the shopper uses one channel for researching products and another for
purchasing) or if the failure impacts the preference for retailer brand, a failure in one channel
could impede the shopper’s engagement in other channels. Thus, it is difficult to predict the
effects of app failure, in particular, about how they might spill over to other channels.
8
2.2. Channel Choice and Channel Migration
A shopper’s experience in one channel can influence their behavior in other channels. Prior
research on cross-channel effects is mixed, showing both substitution and complementarity
effects, leading to positive and negative synergies between channels (e.g., Avery et al. 2012,
Pauwels and Neslin 2015). The relative benefits of channels determine whether shoppers
continue using existing channels or switch to a new channel (Ansari et al. 2008, Chintagunta et
al. 2012). When a bricks-and-clicks retailer opens an offline store or an online-first retailer opens
an offline showroom, its offline presence drives sales in online stores (Wang and Goldfarb 2017,
Bell et al. 2018).2 This is particularly true for shoppers in areas with low brand presence prior to
store opening and for shoppers with an acute need for the product. However, the local shoppers
may switch from purchasing online to offline after an offline store opens, even becoming less
sensitive to online discounts (Forman et al. 2009). In the long run, the store channel shares a
complementary relationship with the Internet and catalog channels (Avery et al. 2012).
While the relative benefits of one channel may lead shoppers to buy more in other channels,
the costs associated with one channel may also have implications for purchases beyond that
channel. In a truly integrated omnichannel retailing environment, the distinctions between
physical and online channels blur, with the online channel representing a showroom without
walls (Brynjolfsson et al. 2013). Mobile technologies are at the forefront of these shifts. More
than 80% of shoppers use a mobile device while shopping even inside a store (Google M/A/R/C
Study 2013). As a result, if there are substantial costs associated with using a mobile channel
(e.g., those induced by app failures), such costs may spill over to other channels. If shoppers use
the different channels in complementary ways, the disruption of one of those channels could
2 A bricks-and-clicks retailer is a retailer with both offline (“bricks”) and online (“clicks”) presence.
9
negatively impact their engagement with the other channels as well. However, if shoppers treat
the channels as substitutes, failures in one channel may drive the shoppers to purchase in another
channel. If an app failure dilutes shoppers’ preference for the retailer brand, it may lead to
negative consequences across channels. Overall, the direction of the effect of app failures on
outcomes in other channels such as in brick-and-mortar stores and online channels depends on
which of these competing and potentially co-existing mechanisms is dominant.
2.3. Mobile Apps
The nascent but evolving research in mobile apps shows positive effects of mobile app channel
introduction and use on engagement and purchases in other channels (Kim et al. 2015, Xu et al.
2017, Narang and Shankar 2019) and for coupon redemptions (Fong et al. 2015, Andrews et al.
2016, Ghose et al. 2019) under different contingencies as well as privacy tradeoffs of paid apps
(Kummer et al. 2019).
To our knowledge, only one study has examined crashes in a mobile app on shoppers’ app
use. Shi et al. (2017) find that while crashes have a negative impact on future engagement with
the app, this effect is lower for those with greater prior usage experience and for less persistent
crashes. However, while they look at subsequent engagement of the shoppers with the mobile
app, they do not examine purchases. Thus, our research adds to Shi et al. (2017) in several ways.
First, we focus on estimating the causal effects of failure. To this end, we exploit the random
variation in failures induced by systemwide failures. Second, we quantify the value of app
failure’s effects on subsequent purchases. The outcomes we study include the frequency,
quantity, and value of purchases, while the key outcome in that study is app engagement. Third,
we examine the cross-channel effects of mobile app failures, including in physical stores, while
Shi et al. (2017) study subsequent engagement with the app provider only within the app.
10
Finally, we explore the mechanisms behind the effects of failure and examine the moderating
effects of relationship with the retailer and prior digital and heterogeneity in shoppers’ sensitivity
to failures using a machine learning approach.
3. Research Setting and Data
3.1. Research Setting
We obtained the dataset for our empirical analysis from a large U.S.-based retailer. The retailer
sells a variety of products, including software such as video games and hardware such as video
game consoles and controllers, downloadable content, consumer electronics and wireless
services to 32 million customers. The gaming industry is large ($160 billion revenues in 2020),
and the retailer is a major player in this industry, offering a rich setting. The retailer has a large
offline presence similar to Walmart, PetSmart, or any other primarily brick-and-mortar chain
with an omnichannel strategy. The retailer has a store network comprising 4,175 brick-and-
mortar stores across the U.S. Additionally, it has a large ecommerce website and a mobile app
that is the focus of our study.
The app allows shoppers to browse the retailer’s product catalog, get deals, locate nearby
stores, check loyalty points, know the latest product launches, checkout product reviews, order
online through a mobile browser, as well as make purchases through the app itself. The app is
typical of mobile apps of large retailers (e.g., PetSmart, Costco) in features and shopper
interactions. Figure 1 shows some screenshots from the app.
< Figure 1 about here >
The online and offline channel sales mix of the retailer in our data is typical of most large
retailers. About 76% of the total sales for the top 100 largest retailers in the U.S. are from similar
retailers with a store network of 1,000 or more stores (National Retail Federation 2018). Most of
11
these large retailers have a predominant brick-and-mortar presence and a growing online
presence. For example, Walmart’s online revenues constitute 7.6% of all revenues, 1.3% of all
PetSmart’s sales come from the online channel, and Home Depot generates 6.8% of all revenues
from ecommerce.3 For the retailer in our data, online sales comprised 10.2% of overall revenues,
somewhat higher than that for similar large retailers. Furthermore, about 26% of the shoppers
bought online in the 12 months before the failure event we study. The retailer’s online sales
displayed a 13% annual average growth in the last five years, similar to these retailers who also
exhibited double digit growth (Barron’s 2018). Its annual online sales revenues are also
substantial at $1.1 billion. Therefore, the cross-channel effects of any app-related event or
interventions are important for this retailer, similar to other retailers. Our research context offers
a rich setting to examine the effects of a mobile app failure for a multi-channel retailer with a
large store network.
3.2. Data and Sample
Our goal is to quantify the impact of an app failure in a retailer’s branded mobile app. We
leverage unique failure shocks that arise from server errors in the backend system of the firm and
cause a failure, making the app unavailable for 2-5 hours. Our focal retailer’s app experienced an
exogenous systemwide failure about every 7-10 weeks during 2014-2018. We are able to identify
and collect data on two such exogenous systemwide failure shocks, one in 2014 and the other in
2018. We use the data on the 2018 failure as our main sample because the app had a fully
functional in-app shopping feature. This failure occurred on April 11, 2018 and lasted for two
hours between 12pm and 2pm. The firm provided us with mobile app use data and transactional
data across all channels for all the app users who accessed the app on the failure day; 70,568
3 Source: eMarketer Retail, https://retail-index.emarketer.com/

12
experienced the failure, while 66,121 did not.
The app data capture events that shoppers experience in the app, along with their timestamps.
These data recorded the exogenous app failure event as a ‘server error’ at the time when users
tried to access the app if they experienced this failure. This event represents an exogenous app
breakdown for all the users, and the data allow us to identify shoppers who logged in during the
failure event window to experience the systemwide app failure. We also observe their purchases
in stores and online. The online channel represents purchases at the retailer’s website, including
the mobile browser.
Table 1 provides the descriptive statistics for the variables of interest. Over a period of 14
days pre- and post- failure, shoppers make an average of a little less than one purchase
comprising about 1.6 items for a value of about $43. In the 12 months before failure, on average,
shoppers make purchases worth $623 and buy 0.66 times in the online channel. Overall, 52% of
the shoppers experience the failure during our focal failure event.
< Table 1 about here >
4. Empirical Strategy
4.1. Empirical Strategy
Our empirical strategy is to leverage the exogenous systemwide failure in the app to estimate the
effect of app failure on shopping outcomes. The main idea behind our empirical approach is that
conditional on signing in on the day of the failure, whether a user experienced a failure or not
was a function of whether they attempted to use the app during the time window of the failure,
which they could not have anticipated in advance. Furthermore, the time window is such that
there is no systematic difference between users who signed in during this period, vs. those who
signed in at other times during the day. This exogeneity in failures allows us to compare those
who experienced the failure (and thus could not use the app) with those who didn’t among the
13
shoppers who attempted to use the app on the day of the failure. We examine this assumption in
the data by testing for balance between shoppers who experience a failure and those who do not,
using a set of pre-failure variables. To determine the treatment effect of a failure, we conduct a
DID analysis, comparing the post-failure behaviors with the pre-failure behaviors of shoppers
who logged in on the day of the failure and experienced it (treatment group) relative to those
who logged in on that day but did not experience the failure (control group).
To analyze the treatment effects within and across channels, we repeat this analysis with the
same outcome variables separately for the offline and online channel. To understand the
underlying mechanisms for the effects, we examine two explanations, brand preference dilution
and channel substitution, using the data on shoppers’ closeness to purchase (based on their app
use and location at the time of failure) and their time to next purchase to check for consistency
with these mechanisms. To analyze heterogeneity in treatment effects by shopper, we first
perform a moderator analysis using a priori factors identified in the literature such as prior
relationship strength and digital channel use, followed by a data driven machine learning (causal
forest) approach to fully explore all sources of heterogeneity across shoppers. Finally, we carry
out multiple robustness checks.
4.2. Exogeneity of Failure Shock
To verify that there is no systematic difference between shoppers who experience the failure
shock in the app and those who do not, we examine three types of evidence. First, we present
plots of the behavioral trends in shopping for both failure-experiencers and non-experiencers for
the failure shock in the 14 days before the app failure. Figure 2 depicts the monetary value of
daily purchases by those who experienced the failure and those who did not. The purchase trends
in the pre-period are parallel and nearly identical for the two groups (p > 0.10), assuring us that
14
these shoppers do not systematically differ in their purchase behavior. The trends are similar for
the frequency and quantity of purchases, and the proportion of online purchases (see Web
Appendix Figure A1). Second, we compare their observed demographic variables, such as
gender and membership in the retailer’s loyalty program (Figure 3). We do not find any
significant differences in these variables across the two groups (p > 0.10). Third, the results from
the subsequent use of a propensity score matched (PSM) sample using nearest neighbor
matching show similar treatment effects, suggesting that the treatment is indeed exogenous (see
Web Appendix Table A1). These verification checks give us confidence in the validity of our
empirical strategy.
< Figures 2 and 3 about here >
4.3. Econometric Model and Identification
As described in the previous section, we estimate the effects of app failure on shopping outcomes
by relying on a quasi-experimental research design with a DID approach (e.g., Angrist and
Pischke 2009).
Our two-way fixed effects (TWFE) linear DID regression takes the following form:
(1) 𝑌𝑖𝑡 = 𝛼0 + 𝛼1𝐹𝑖𝑃𝑡 + 𝜇𝑖 + 𝜆𝑡 + 𝜗𝑖𝑡
where i is shopper, t is the time period (14 days pre or post failure), and Y is the outcome variable
(frequency, quantity, monetary value of purchases), F is a dummy variable denoting treatment (1
if shopper i experienced the app failure and 0 otherwise), P is a dummy variable denoting the
period (1 for the period after the systemwide app failure and 0 otherwise), α is a coefficient
vector, 𝜇𝑖 is shopper fixed effect, 𝜆𝑡 is time fixed effect (post period relative to pre period), and ϑ
is an error term. We cluster standard errors at the shopper level, following Bertrand et al. (2004).
The coefficient of FiPt, i.e., 𝛼1, is the treatment effect of the app failure.
15
The identification of this treatment effect hinges on the conditional independence of the
failure, i.e., the failure is random conditional on a shopper logging into the app during the time
window of the failure shock. Any unobserved time-invariant differences among shoppers are
accounted for by shopper fixed effects and any time-varying factors common to both groups are
accounted for by time fixed effects in our model.
5. Empirical Analysis Results
5.1. Relationship between App Failures and Purchases
We first examine the overall differences in shopping behaviors between shoppers who
experienced failures and those who did not using model-free evidence 14 days pre and post
failure. Our goal is to estimate the short-term effects (i.e., 14 days pre and post) of this failure to
avoid overlapping with any other failures before or after. Our estimates represent the local
average treatment effect (LATE) of the failure. Using a two-week period also allows us to
equally include any “day of the week” effects in shopping.4
Table 2 reports the raw comparisons of post-failure vs. pre-failure purchase outcome
variables for both failure experiencers (70,568 treated) and non-experiencers (66,121 control)
among the set of consumers who accessed the app on the day of the failure. We find that post-
failure, shoppers who experienced the systemwide failure had 0.04 (p < 0.001) lower purchase
frequency, 0.07 (p < 0.001) lower purchase quantity, and $2.42 (p < 0.001) lower monetary value
than shoppers who did not experience the failure. A simple comparison of shopping outcomes
across the two groups shows that the average monetary value of purchases increased by 81.8%
($30.41 to $55.28) for failure-experiencers, while it increased by 87.6% ($30.75 to $57.70) for
non-failure experiencers post failure relative to the pre period (p < 0.001). Given our
4 We also estimated a model for a longer period of four weeks and found similar effects (see Figure 6 and Table D1).
16
identification strategy, the diminished growth in the monetary value of purchases for failure
experiencers relative to non-experiencers comes from the exogenous failure shock.

5.2. Main Diff-in-Diff Model Results
We first examine the effects of the app failure on the frequency, quantity, and monetary value of
subsequent purchases by the shoppers across all channels. The results from the DID model in
Table 3 show a negative and significant effect of app failure on the frequency (𝛼1 = -0.024, p <
0.01), quantity (𝛼1 = -0.057, p < 0.01), and monetary value of purchases (𝛼1 = -2.181, p < 0.01)
across channels. Relative to the pre-period for the control group, the treated group experiences a
decline in frequency of 3.2% (p < 0.01), quantity of 3.7% (p < 0.01), and monetary value of
7.1% (p < 0.01).5
5.3 Heterogeneity in Effects by Channel
Next, we examine the channel spillover effects of app failures in greater depth. We first split the
total purchases into offline and online purchases and repeat our regression analyses for each
channel. Table 4 reports the results for these alternative channel-based dependent variables. App
failure has a significant negative effect on the frequency (𝛼1 = -0.02, p < 0.01), quantity (𝛼1 = -
0.05, p < 0.01), and monetary value of purchases (𝛼1 = -2.09, p < 0.01) in the offline channel.
We do not find a significant (p > 0.10) effect of app failure on any of the purchase outcomes in
the online channel. Because there is no corresponding increase in the online channel and because
the overall purchases drop, we conclude that the decreases in overall purchases across channels
are largely due to declines in in-store purchases. This finding shows that for a primarily brick-
5 We calculate the percentage change by dividing the treatment coefficient by the intercept. For instance, the
treatment coefficient for value of purchases (2.18) divided by intercept (30.40) amounts to a 7.1% change.
17
and-mortar retailer with a growing online presence, the negative effects of a failure in the mobile
app spills over to offline sales.
5.4. Potential Mechanisms
We next explore the potential mechanisms behind the effects of app failure using a descriptive
analysis of individual shoppers’ app usage and behavior at the time of failure. Based on when
shoppers experience the failure in their shopping journey (i.e., close to or far from purchase), we
may observe channel substitution and brand preference dilution effects of the app failure.
Shoppers who are close to purchase at the time of app failure may quickly switch channels and
complete their purchase through the mobile or desktop website forms of the online channel.
However, shoppers who are far from purchase when the app fails may be early in their shopping
journey and hence, may reduce their preference for the retailer brand and buy less than their
planned amount subsequently.
To explore the role of stage in the shopping journey in explaining the differential effects of
app failure, we utilize information in the data about the app page on which the shopper was when
the failure occurred. This information allows us to examine the effects of app failure across
shoppers based on whether they are close to or far from purchase at the time of failure. Table 5
reports the DID model results of the analysis relating to the app failure occurring on purchase
related and non-purchase related pages. Purchase-related pages in an app involve pages that are
closer to purchase, such as those relating to adding a product to shopping cart, clicking checkout,
or making payments. In contrast, non-purchase-related pages relate to activities farther from
purchase, such as browsing products and obtaining store related information. The effect of app
failure is negative and significant (p < 0.001) on all the outcome variables for shoppers who
18
experience failure on a non-purchase related page than for shoppers who experience failure on a
purchase related page. Shoppers who already have a strong purchase intent and are on a
purchase-related page right before the failure are not as negatively affected as those without a
strong purchase intent or on a non-purchase related page.
To further explore the role of the purchase funnel, we compare the change in the value of
purchases between the post and the pre app failure time periods for two groups of shoppers,
those close to and those far from purchase based on a median split of re-login attempts during the
failure window. The median number of attempts is three. The negative effect of failure for
shoppers who make greater re-login attempts is lower (Value of purchases(post-pre, high attempt) =
28.03, Value of purchases(post-pre, control) = 26.95, p > 0.01) than for shoppers who make fewer re-
login attempts (Value of purchases(post-pre, low attempt) = 20.33, Value of purchases(post-pre, control) =
26.95, p < 0.001). The group of shoppers who are close to purchase at the time of app failure are
likely to repeatedly attempt to re-login during the failure duration to complete their intended
purchase. Such shoppers may eventually make the purchase in another channel, resulting in
channel substitution. However, the group of shoppers who are far from purchase at the time of
failure, make fewer attempts to log back during the failure time window. A greater negative
effect of app failure for such shoppers may be due to brand preference dilution.
Failure-experiencers who were close to a purchase or had purchase intent, would have had to
determine whether to complete the transaction, and if so, whether to do it online or offline. For
shoppers who typically buy online, the cost of going to the retailer’s website to complete a
purchase interrupted by the app failure is smaller than that of going to the store to complete the
purchase. Therefore, these shoppers will likely complete the transaction online and not exhibit
19
any significant decrease in shopping outcomes in the online channel post failure. Thus, channel
substitution effect likely explains the insignificant effects of app failure in the online channel. By
contrast, shoppers who typically buy in the retailer’s brick-and-mortar stores and who experience
the app failure, will likely have a diminished perception of the retailer with fewer incentives to
buy from the stores in the future. Thus, brand preference dilution effect may prevail for these
shoppers after app failure. This effect is due to a negative spillover from the app channel to the
offline channel for shoppers experiencing the failure even if they are primarily offline shoppers.
Indeed, a negative message or experience can have an adverse spillover effect on attributes or
contexts outside the realm of the message or experience (Ahluwalia et al. 2001).
To further explore channel substitution toward the online channel, we examine the time
elapsed between the occurrence of the failure and subsequent purchase in the online channel.
Failure experiencers’ inter-purchase time online (Meantreated = 162.8 hours) is much shorter than
non-experiencers’ (Meancontrol = 180.7 hours) (p = 0.003). This result further suggests that after
an app failure, shoppers look to complete their intended purchases in the online channel.
Next, to understand channel substitution toward the offline channel, we examine the effect of
app failure for shoppers who were geographically close to a physical store at the time of failure
for the subsample of shoppers who allow location tracking in the app. Shoppers who are closer to
a store when they experience the app failure could more easily complete their purchase in the
store than shoppers farther from a store. Table 6 reports the DID model for the subsample of
shoppers located within two miles of the retailer’s store at the time of failure for shoppers who
allow location sharing in the app. The results show that shoppers closer to the retailer’s store are
not negatively affected by the failure. Rather surprisingly, both the basket size and the monetary
value of purchases for shoppers close to a store are significantly higher after app failure (p <
20
0.05). This result suggests that shoppers who experience a failure close to or at a physical store
end up buying additional items in the store. An implication is that channel substitution to a store
can lead to more purchases, but that channel substitution is less likely for shoppers who are
farther from the store. However, the proportion of shoppers close to the store at the time of
failure is very small (2.4%), so the average effects of app failure for the failure experiencers on
offline purchases and all purchases are still negative in our main model.
To further analyze the role of distance to the store at the time of failure, we present the
contrast analysis between shoppers who were less than two miles and those who were greater
than two miles from the nearest store at the time of failure in Table 7. The basket sizes of these
groups of shoppers do not differ post failure. However, shoppers closer to the store spend more
than those farther from the store post failure, suggesting that the app failure is associated with
channel substitution in purchases for shoppers closer to the store.
Much of the analyses we provide for the mechanisms is primarily descriptive and
exploratory. Nevertheless, overall, the evidence is consistent with the asymmetry in the effect of
app failure on shopping outcomes across the two channels.
5.5. Heterogeneity by Shopper: Relationship Strength and Prior Digital Use
The literatures on relationship marketing and service recovery suggest two factors that may
moderate the impact of app failures on outcomes: relationship strength and prior digital channel
use. These variables are typically used in direct marketing for targeting and can provide useful
managerial insights about the heterogeneity in the effects of failures as well.
Relationship Strength. The service marketing literature offers mixed evidence on the
21
moderating role of the strength of customer relationship with the firm in the effect of service
failure on shopping outcomes. Some studies suggest that stronger relationship may aggravate the
effect of failures on product evaluation, satisfaction, and on purchases (Goodman et al. 1995,
Chandrashekaran et al. 2007, Gijsenberg et al. 2015). Other studies show that stronger
relationship attenuates the negative effect of service failures (Hess et al. 2003, Knox and van
Oest 2014). Consistent with the direct marketing literature (Schmittlein et al. 1987, Bolton
1998), we operationalize customer relationship using the RFM (recency, frequency, and
monetary value) dimensions. Because of high correlation between the interactions of frequency
with (failure experiencers x post shock) and value of purchases with (failure experience x post
shock) (r = 0.90, p < 0.001) and because value of purchases is more important for the retailer, we
drop frequency of past purchases.
Prior Digital Channel Use. The moderating role of a shopper’s prior use of the retailer’s
online channel in app failure’s impact on shopping outcomes could be positive or negative. On
the one hand, more digitally experienced app users may be less susceptible to the negative
impact of an app crash on subsequent engagement with the app than less digitally experienced
app users (Shi et al. 2017) because they are conditioned to expect some level of technology
failures, consistent with the product harm crises literature (Cleeren et al. 2013, Liu and Shankar
2015) and the expectation-confirmation theory (Oliver 1980, Tax et al. 1998, Cleeren et al.
2008). On the other hand, prior digital channel exposure may heighten shopper expectations and
make them less tolerant of failures. We operationalize this variable as the cumulative number of
purchases that the shopper made from the retailer’s website prior to experiencing a failure.
The results of the model with relationship strength and past digital channel use as moderators
appear in Table 8. Consistent with our expectations, the monetary value of past purchases has
22
positive and significant interaction coefficients with the DID model variable across all the
outcome variables (p < 0.001). Thus, app failures have a smaller effect on shoppers who
purchased more from the retailer, consistent with the results of Ahluwalia et al. (2001). Recency
has negative coefficients (p < 0.01), suggesting that the more recent shoppers are less tolerant of
failure. A failure shock also affects the frequency and value of purchases (p < 0.01) of shoppers
with greater digital channel or online purchase experience with the retailer more.
5.6. Heterogeneity by Shopper: Causal Forest Approach
In addition to the service marketing literature-based moderator variables examined earlier, we
also explore heterogeneity in treatment effects relating to additional managerially useful
observed variables (e.g., gender, loyalty level) not fully examined by prior research.
Unfortunately, including these variables as additional moderators in the DID analysis explodes
the number of main and interaction effects.
Recent methods of causal inference using machine learning such as the causal forest
approach allow us to recover individual-level conditional average treatment effects (CATE)
(Athey et al. 2017, Wager and Athey 2018). The causal forest is an ensemble of causal trees that
averages the predictions of treatment effects produced by each tree for thousands of trees. It has
been applied in marketing to model customer churn and information disclosure (Ascarza 2018,
Guo et al. 2021). A causal tree is similar to a regression tree. The typical objective of a
regression tree is to build accurate predictions of the outcome variable by recursively splitting
the data into subgroups that differ the most on the outcome variable given their covariates. A
regression tree has decision split nodes characterized by binary conditions on covariates and leaf
or terminal nodes at the bottom of the tree. The regression tree algorithm continuously partitions
23
the data, evaluating and re-evaluating at each node to determine (a) whether further splits would
improve prediction, and (b) the covariate and the value of the covariate on which to split. The
goodness-of-fit criterion used to evaluate the splitting decision at each node is the mean squared
error (MSE) computed as the deviation of the observed outcome from the predicted outcome.
The tree algorithm continues making further splits as long as the MSE decreases by more than a
specified threshold.
The causal tree model adapts the regression tree algorithm in several ways to make it
amenable for causal inference. First, it explicitly moves the goodness-of-fit-criterion to treatment
effects rather than the MSE of the outcome measure. Second, it employs “honest” estimates, that
is, the data on which the tree is built (training data) are separate from the data on which it is
tested for prediction of heterogeneity (test data). Thus, the tree is honest if for a unit i in the
training sample, it only uses the response Yi to estimate the within-leaf treatment effect, or to
decide where to place the splits, but not both (Athey and Imbens 2016, Athey et al. 2017). To
avoid overfitting, we use cross-validation approaches in the tree-building stage.
Importantly, the goodness-of-fit criterion for causal trees is the difference between the
estimated and the actual treatment effect at each node. While this criterion ensures that all the
degrees of freedom are used well, it is challenging because we never observe the true effect.
5.6.1. Causal Tree: Goodness-of-fit Criterion
Following Wager and Athey (2018), if we have n independent and identically distributed training
examples labeled i = 1, ..., n, each of which consists of a feature vector Xi  [0, 1]d, a response
Yi  R, and a treatment indicator Wi  [0, 1], the CATE at x is:
(2) 𝜏(𝑥) = 𝔼[𝑌1𝑖 ― 𝑌0𝑖 | 𝑋𝑖 = 𝑥]

24
We assume unconfoundedness, i.e., conditional on Xi, the treatment Wi is independent of
outcome Yi. Because the true treatment effect is not observed, we cannot directly compute the
goodness-of-fit criterion for creating splits in a tree. This goodness-of-fit criterion is as follows.
2
[
(3) 𝑄𝑖𝑛𝑓𝑒𝑎𝑠𝑖𝑏𝑙𝑒 = 𝔼 ((𝜏𝑖(𝑋𝑖) ― 𝜏𝑖(𝑋𝑖)) ]
Because 𝜏𝑖(𝑋𝑖) is not observed, we follow Athey and Imbens’s (2016) approach to create a
transformed outcome 𝑌∗𝑖 that represents the true treatment effect. Assume that the treatment
indicator Wi is a random variable. Suppose there is a 50% probability for a unit i to be in the
treated or the control group, an unbiased true treatment effect can be obtained for that unit by just
using its outcomes Y in the following way. Let
(4) 𝑌∗𝑖 = 2𝑌𝑖 𝑖𝑓 𝑊𝑖 = 0 and 𝑌∗𝑖 = ― 2𝑌𝑖 𝑖𝑓 𝑊𝑖 = 1
It follows that:
1 1
(5) 𝔼[𝑌∗𝑖] = 2.(2𝔼[𝑌𝑖(1)] ― 2𝔼[𝑌𝑖(0)]) = 𝔼[𝜏𝑖]
Therefore, we can compute the goodness-of-fit criterion for deciding node splits in a causal
tree using the expectation of the transformed outcome (Athey and Imbens 2016). Once we
generate causal trees, we can compute the treatment effect within each leaf because it has a finite
number of observations and standard asymptotics apply within a leaf. The differences in the
treated and control units’ outcomes within each leaf produce the treatment effect in that leaf.
5.6.2. Causal Forest Ensemble
In the final step, we create an ensemble of trees using ideas from model averaging and bagging.
Specifically, we take predictions from thousands of trees and average over them (Guo et al.
2021). This step retains the unbiased and honest nature of tree-based estimates but reduces the
variance. The forest averages over the estimates from B trees in the following manner.
(6) 𝜏 (𝑥) = 𝐵―1∑𝐵

𝑏=1 𝜏 𝑏(𝑥)
25
Because monetary value of purchases is the key outcome variable of interest to the retailer,
we estimate individual level treatment effect on value of purchases for each failure experiencer
separately using the observed covariate data. These covariates include gender and loyalty
program in addition to the three theoretically-driven moderators, namely, value of past
purchases, recency of past purchases and prior digital channel use. These individual attributes are
important for identifying individual-level effects and for developing targeting approaches (e.g.,
Neumann et al. 2019). We use a random sample of two-thirds of our data as training data and the
remaining one-third as test data for predicting CATE. We use half of the training data to
maintain honest estimates and for cross-validation to avoid overfitting.
5.6.3. Causal Forest Results
The estimates from causal forest using 1,000 trees appear in Table 9. About 96% of the shoppers
have a negative value of CATE with an average of -1.739. The distribution of CATE across
shoppers appears in Figure 4. The shopper quintiles based on CATE levels reflects this
distribution in Figure 5, which shows that Segment 1 of the most sensitive shoppers exhibit
higher variance than the rest.
< Table 9 and Figures 4 and 5 about here >
Next, we regress the CATE estimate on the covariate space to identify the covariates that best
explain treatment heterogeneity. The results appear in Table 10. They show that all the
covariates, including gender and loyalty are significant (p < 0.001). Shoppers with higher value
of past purchases and more frequent online purchases are less sensitive to an app failure than
others. Shoppers who bought more recently in the past are less tolerant of an app failure. Some
of these results complement those from the moderator analysis.

26
The causal forest-derived CATE regression differs from the moderator DID regression in
important ways. First, the moderator regression uses the entire sample for estimation, while the
causal forest, the basis for the CATE regression, uses a subset of the data (the training sample)
for estimation. Second, the causal forest underlying the CATE regression splits the training data
further to estimate an honest tree, estimating from an even smaller subset of the moderator
regression sample. Third, relative to the linear moderator regression, the CATE regression can
handle a much larger number of covariates. Because of these differences, the results of the
CATE regression model may not exactly mirror those of the moderator regression model.
However, the broad contours of the results remain unchanged.
5.7. Replicating the Analysis for Another App Failure
To explore the generalizability of our results, we extend our analysis from one failure shock to
another shock since such failures are common and occur about 5-7 times each year for the focal
app. While granular data on these failures are difficult to collect even for a single failure, we
were able to collect data for another failure that had occurred on November 3, 2014 at 5:30 pm
and lasted five hours. Of the shoppers who accessed the app that day, 70,884 experienced the
failure, while 63,604 did not. We repeat the DID analyses for this sample. The results appear in
Tables 11 and 12. Consistent with the main model results, those for this failure show a negative
effect of the app failure on purchases across channels (p < 0.001). The effect size translates to a
7.5%, 5.5%, and 6.2% decrease in frequency, quantity, and monetary value, respectively. As in
the main sample, the decreases are primarily in brick-and-mortar stores.
< Tables 11 and 12 about here >

27
6. Robustness Checks and Ruling out Alternative Explanations
We perform several robustness checks and tests to rule out alternative explanations for the effect
of an app failure on purchases.
Alternative model specifications. Although the failure in our data is exogenous and we
include shopper fixed effects, to be sure, in addition to our proposed DID model, we also
estimate models with shopper covariates to estimate the treatment effect of interest. Additionally,
we estimate Poisson count data models for the frequency and quantity variables. The results from
these models replicate the findings from Tables 3 and 4 and appear in the Web Appendix Tables
B1-B2 and C1-C2, respectively. The coefficients of the treatment effect from Table B1 and C1
represent changes in outcomes due to the app failure, conditioned on covariates. These results are
substantively similar to those in Tables 3 and 4. The insensitivity of the results to control
variables suggests that the effect of unobservables relative to these observed covariates would
have to be very large to significantly change our results (Altonji et al. 2005). Similarly, the
results are robust to a Poisson specification, reported in Tables B2 and C2. We also estimate
models with weekly fixed effects. The results from these models show consistent treatment
effects as our main model.
Outliers. We re-estimate the models by removing outliers from our data. We remove
extremely heavy spenders who are greater than three standard deviations away from the mean in
monetary value of purchases in the pre-period. Web Appendix Tables B3 and C3 report these
results. We find the results to be consistent with those reported earlier.
Existing shoppers. Another possible explanation for app failures’ effect can be that only new
or dormant shoppers are sensitive to failures, perhaps due to low switching costs. Therefore, we
remove those with no purchases in the last 12 months to see if their behavior is similar to that of
28
the existing shoppers. Indeed, Web Appendix Tables B4 and C4 report substantively similar
results after excluding the new or dormant shoppers.
Alternative measures of digital channel use moderators. In lieu of past online purchases
frequency as a measure of prior digital channel use, we use measures based on median split in
the number and share of online purchases, and whether a shopper is an online buyer or not. The
results for all these alternative online purchase measures are similar to our proposed model
results. The results are shown in Web Appendix Tables B5 and C5 for digital channel use
operationalized as whether a shopper is an online buyer or not.
Regression discontinuity analysis. To ensure that there are no unobservable differences
between failure experiencers and non-experiencers based on the time of login, we carry out a
‘regression discontinuity’ (RD) style analysis in the one hour before the start time of the service
failure. For the RD analysis, we consider only app users in the neighborhood of this time, using
as control group those users who logged in one hour before and after the failure period and as
treated the users who logged in during the failure period. The results are substantively similar to
our main model results and appear in Web Appendix Tables B6 and C6.
Falsification/Placebo tests. To rule out the possibility that our regression estimates
spuriously pick up variations driven by factors other than the failure, we conduct additional
falsification and placebo tests. First, we randomly reassign treatment to our sample users. The
results from these checks appear in Tables B7 and C7. As reported in these tables, we do not find
an effect for the treatment coefficient in these placebo tests. Second, we randomly reassign the
timing of treatment. The results from these checks appear in Tables B8 and C8. Again, we do not
find any treatment effects. These falsification tests mitigate concerns for spurious correlations.
29
Longer-term effect of failures. Our main analysis shows the short-term 14-day effect of app
failures. To explore how these effects evolve over time, we examined the outcomes four weeks
pre- and post- failure event. There is a steep fall in the period immediately after the failure.
However, purchases climb back to higher levels over the next three weeks. Therefore, the effect
is strongest immediately after failure. These patterns appear in Figure 6 and Web Appendix
Table D1. The table shows the coefficients of the interactions of weekly dummies with TREAT
for a DID regression. Because an app failure occurs every 7-10 weeks, we estimate the effects
four weeks pre and post so as to avoid our pre- or post- periods overlapping with any other
failure that we cannot observe.
Stacked model for channel effects. The results for online and offline purchases in Table 4 do
not show the relative sizes of the effects across the two channels. To examine these relative
effects, we estimate a stacked model of online and offline outcomes that includes a channel
dummy. The results for this model appear in Web Appendix Table D2. We interpret the effects
as a proportion of the purchases within the channel and conclude that the effects in the offline
channel are more negative than those in the online channel (p < 0.01). We also estimated a DID
regression model with value of purchases in the offline channel as a proportion of total purchases
and found negative and significant effects of failure (p < 0.01).
7. Summary, Economic Significance, Managerial Implications, and Limitations
7.1. Summary
In this paper, we addressed novel research questions: What is the effect of a service failure in a
retailer’s mobile app on the frequency, quantity, and monetary value of purchases in online and
offline channels? What possible mechanisms may explain these effects? How do shoppers’
30
relationship strength and prior digital channel use moderate these effects? How heterogeneous is
shoppers’ sensitivity to failures? By answering these questions, our research fills an important
gap at the crossroads of three disparate streams of research in different stages of development:
the mature stream of service failures, the growing stream of omnichannel marketing, and the
nascent stream of mobile marketing. We leveraged a random systemwide failure in the app to
measure the causal effect of an app failure. To our knowledge, this is the first study to causally
estimate the effects of a digital service failure using real world data. Using unique data spanning
online and offline retail channels, we examined the spillover effects of such failures across
channels and examined heterogeneity in these effects based on channels and shoppers.
Our results reveal that app failures have a significant negative effect on shoppers’ frequency,
quantity, and monetary value of purchases across channels. These effects are heterogeneous
across channels and shoppers. Interestingly, the overall decreases in purchases across channels
are driven by reductions in store purchases and not in digital channels. Furthermore, we find that
that shoppers with higher monetary value of past purchases are less sensitive to app failures.
Overall, our nuanced analyses of the mechanisms by which an app failure affects purchases
offer new and insightful explanations in a cross-channel context, including channel substitution
and brand preference dilution. Our findings from shopper heterogeneity analyses are consistent
with the view that some customers may be tolerant of technological failures (Meuter et al. 2000).
Finally, our study offers novel insights into the cross-channel implications of app failures.
7.2. Economic Significance
The economic effects of failures are sizeable for any retailer to alter its service failure preventive
and recovery strategies. Our main model (Table 3) shows a decrease of $2.18 in the 14 days after
failure resulting in an immediate economic loss of about $153,838 in revenues from just our
31
sample of 70,568 failure experiencers. Based on our weekly estimates, the economic impact of
an app failure in a retailer’s branded app is a revenue loss of about $194,768 from a single failure
for five weeks.6 The retailer experiences about 5-7 failures each year, resulting in a potential loss
of $0.97-1.36 million across failures. Importantly, as the app user base grows, the loss from
failures can also grow substantially if left unmonitored.
The economic effect is meaningful for several reasons. First, since retailers operate on thin
margins (2-3% in many categories) and are cost-conscious, such an economic loss is impactful.
Second, the effect size of 7.1% from our results is consistent with those from other similar causal
studies. For example, exposure to banner advertising has been shown to lift purchase intention by
0.473% worth 42 cents/click to the firm (Goldfarb and Tucker 2011). Third, in the mobile
context, the effect of being in a crowd (of five people relative to two per square meter when
receiving a mobile promotion) results in an economically meaningful 2.2% more clicks
(Andrews et al. 2016). Fourth, Akca and Rao (2020) argue that a revenue drop of $5.32 million
is economically significant for a large company such as Orbitz. Fifth, as sales through the mobile
app and online sales are growing rapidly, this effect is only getting larger. Sixth, our estimates
are for a two-hour app failure in a year for five weeks. Our estimates reflect the short-term
impact but may cause long-term depletion in brand preference as well.
To gauge the potential long-term economic damage, we examined the shoppers’ purchases
over six months after the failure. We estimated the potential revenue loss from failure-
experiencing customers who substantially reduce their spending after the failure, ultimately
becoming “lost” customers. We compared the percentage of customers in the treated and control
groups who dropped their average spending to less than one standard deviation of their pre-
6We compute this figure by using the weekly effect coefficients in Table D1, i.e., $(0.52 + 1.40 + 0.28 + 0.23 +
0.33)*N for the first five weeks for N = 70,568 failure experiencers, totaling $194,768.
32
failure average spending. This percentage was significantly higher for failure-experiencers
(6.51%) than for non-failure-experiencers (6.09%). The incremental loss translates into a
permanent loss of $1.89 million revenues for the retailer based on 1,778 (70,568*0.42%*6)
“lost” customers at $53.10 per customer assuming an average customer lifetime of 20 years for
six failures in a year. This estimate is consequential for retailers.
7.3. Managerial Implications
Service failure and low-quality service likely lead to relationship termination (Sriram et al.
2015). The insights from our research better inform executives in managing their mobile app and
channels and offer implications for service failure preventive and recovery strategies.
Preventive Strategies. Managers can use the estimate that an app failure in a branded retail
app results in a 7.1% decrease in monetary value of purchases to budget resources for their
efforts to prevent or reduce app failures. The result that the effects of an app failure vary by
online and offline channel offers guidance to managers for preventing app failures right before a
major offline event or in-store sale when more traffic is expected in-store. Similarly, managers
can identify active store visitors and plan a dedicated strategy for them.
By identifying failure-sensitive shoppers based on relationship strength, prior digital use, and
individual-level CATE estimates, managers can take proactive actions to prevent these shoppers
from reducing their shopping intensity with the firm. Figure 7 represents the loss of revenues
(spending) from each percentile of shoppers at different levels of failure sensitivity.
About 70% of the losses in revenues due to failure arise from just 47% of the shoppers.
Managers can manage these shoppers’ expectations through email and app notification
messaging channels. Warning shoppers of typical number of disruptions in the app can preempt
33
negative attributions and attitudes, and limit potential brand dilution and drop in revenues due to
app failure.
Recovery Strategies. The finding that an app failure in a branded retail app results in reduced
purchases across channels suggests that managers should develop interventions and recovery
strategies to mitigate the negative effects of app failures not just in the mobile channel, but also
in other channels, in particular, the offline channel. Thus, seamlessly collecting and integrating
data from a mobile app with data from its stores and websites can help an omnichannel retailer
build continuity in shoppers’ experiences and even offer recovery in multiple channels.
Immediately after a shopper experiences an app failure, the manager of the app could provide
gentle nudges and even incentives for the shopper to complete an abandoned transaction on the
app. Typically, a manager may need to provide these nudges and incentives through other
communication channels such as email, phone call, or face-to-face chat. These nudges are similar
in spirit and execution to those from firms like Fitbit and Amazon, who remind customers
through email to reconnect when they disconnect their watch and smart speaker, respectively. If
the store is a dominant channel for the retailer, the retailer should use its store associates to
reassure or incentivize shoppers. In some cases, managers can even offer incentives in other
channels to complete a transaction disrupted by an app failure.
The finding that app failure can enhance spending for shoppers experiencing the failure close
to the store offers useful cross-selling opportunities for the retailer. After a systemwide failure is
resolved, retailers can proactively promote in the store nearest to each failure-experiencing
shopper products based on the shoppers’ purchase history.
Managers should mitigate the negative effects of app failures for the most sensitive shoppers
first. They should proactively identify failure-sensitive shoppers and design preemptive
34
strategies to mitigate any adverse effects. We find that shoppers with weaker relationship with
the retailer are more sensitive to failures. Thus, firms should address such shoppers for recovery
after a careful cost-benefit analysis. This is important because apps serve as a gateway for future
purchases for these shoppers.
Finally, our analysis of heterogeneity in shoppers’ sensitivity to app failures suggests that
managers should satisfy first the shoppers with the highest values of CATE. Interventions
targeted at the 47% of the shoppers who contribute to 70% of losses could lead to higher returns.
7.4. Limitations
Our study has limitations that future research can address. First, we analyze available data on
two failures in a branded retailer’s mobile app, so we could not fully explore all the failures with
varying durations and timing. Second, our results are most informative for similar retailers that
have a large brick-and-mortar presence but growing online and in-app purchases. If data are
available, future research could replicate our analyses for app failures for primarily online
retailers with an expanding offline presence (e.g., Bonobos, Warby Parker). Third, we do not
have data on competing apps that shoppers may use. Additional research could study shoppers’
switching behavior if data on competing apps are available. Fourth, our data contain relatively
low number of purchases in the mobile channel. For better generalizability of the extent of
spillover across channels, our analysis could be extended to contexts in which a substantial
portion of purchases are made within the app. Fifth, we do not have data on purchases made
through the app vs. mobile browser. Studying the differences between these two mobile sub-
channels is a fruitful future research avenue. Sixth, we are unable to test specific prevention and
recovery strategies for app failures. Mobile apps may be an effective way to recover from the
adverse effects of service failures (Tucker and Yu 2018). Our approach provides a way to
35
identify app-failure sensitive shoppers, but we do not have data on shoppers’ responses to service
recovery to recommend the best mitigation strategy. The strategies we do recommend could be
tested in ethically permissible field studies. Finally, we focus on the short-term impact of a
failure in a causal setting. If data on multiple failures over the long run are available and can be
corrected for endogeneity, researchers can study the long-term implications of multiple failures.
36
References
Ahluwalia R, Unnava HR, Burnkrant RE (2001) The moderating role of commitment on the
spillover effect of marketing communications. J. Marketing Res. 38(4):458–470.
Akca S, Rao A (2020) Value of aggregators. Marketing Sci. 39(5):893–922.
Altonji JG, Elder TE, Taber CR (2005) Selection on observed and unobserved variables:
Assessing the effectiveness of catholic schools. J. Political Econom. 113(1):151–184.
Andreassen TW (1999) What drives customer loyalty with complaint resolution? J. Service Res.
1(4):324–332.
Andrews M, Luo X, Fang Z, Ghose A (2016) Mobile ad effectiveness: Hyper- contextual
targeting with crowdedness. Marketing Sci. 35(2):218–233.
Angrist JD, Pischke JS (2009) Mostly Harmless Econometrics: An Empiricist’s Companion
(Princeton university press, Princeton).
Ansari A, Mela CF, Neslin SA (2008) Customer channel migration. J. Marketing Res. 45(1):60–
76.
Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc. Natl.
Acad. Sci. 113(27):7353– 7360.
Athey S, Imbens G, Pham T, Wager S (2017) Estimating average treatment effects:
Supplementary analyses and remaining challenges. Amer. Econom. Rev. 107(5):278–81.
Avery J, Steenburgh TJ, Deighton J, Caravella M (2012) Adding bricks to clicks: Predicting the
patterns of cross-channel elasticities over time. J. Marketing 76(3):96–111.
Barron’s (2018) Walmart: Can it meet its digital sales growth targets? Accessed November 5,
2020, https://www.barrons.com/articles/walmart-can-it-meet-its-digital-sales-growth-targets-
1519681783.
Bell DR, Gallino S, Moreno A (2018) Offline showrooms in omnichannel retail: Demand and
operational benefits. Management Sci. 64(4):1629– 1651.
Bertrand M, Duflo E, Mullainathan S (2004) How much should we trust differences-in-
differences estimates? Quart. J. Econom. 119(1):249–275.
Bitner MJ, Booms BH, Tetreault MS (1990) The service encounter: diagnosing favorable and
unfavorable incidents. J. Marketing 54(1):71–84.
Blancco (2016) The state of mobile device performance and health: Q2. Accessed November 5,
2020, https://www2.blancco.com/en/research-study/state-of-mobile-device-performance-and-
health-trend-report-q2-2016.
Bolton RN (1998) A dynamic model of the duration of the customer’s relationship with a
continuous service provider: The role of satisfaction. Marketing Sci. 17(1):45–65.
Brynjolfsson E, Hu YJ, Rahman MS (2013) Competing in the Age of Omnichannel Retailing
(MIT Cambridge, MA).
Bugsnag (2020) SDKs should not crash apps — learnings from the Facebook outage. Accessed
July 20, 2021, https://www.bugsnag.com/blog/sdks-should-not-crash-apps.
Chandrashekaran M, Rotte K, Tax SS, Grewal R (2007) Satisfaction strength and customer
loyalty. J. Marketing Res. 44(1):153–163.
Chintagunta PK, Chu J, Cebollada J (2012) Quantifying transaction costs in online/off-line
grocery channel choice. Marketing Sci. 31(1):96–114.
Cleeren K, Dekimpe MG, Helsen K (2008) Weathering product-harm crises. J. Acad. Marketing
Sci. 36(2):262–270.
37
Cleeren K, Van Heerde HJ, Dekimpe MG (2013) Rising from the ashes: How brands and
categories can overcome product-harm crises. J. Marketing 77(2):58–77.
Computerworld (2014) iOS 8 app crash rate falls 25% since release. Accessed November 5, 2020,
https://www.computerworld.com/article/2841794/ios-8-app-crash-rate-falls-25-since-
release.html.
Dimensional Research (2015) Mobile user survey: Failing to meet user expectations. Accessed
November 5, 2020, https://techbeacon.com/resources/survey-mobile-app-users-report-failing-
meet-user-expectations.
Dotzel T, Shankar V, Berry LL (2013) Service innovativeness and firm value. J. Marketing Res.
50(2):259–276.
Fong NM, Fang Z, Luo X (2015) Geo-conquesting: Competitive locational targeting of mobile
promotions. J. Marketing Res. 52(5):726–735.
Forbes LP (2008) When something goes wrong and no one is around: non- internet self-service
technology failure and recovery. J. Services Marketing 22(4): 316–27.
Forbes LP, Kelley SW, Hoffman KD (2005) Typologies of e-commerce retail failures and
recovery strategies. J. Services Marketing 19(5): 280–92.
Forman C, Ghose A, Goldfarb A (2009) Competition between local and electronic markets: How
the benefit of buying online depends on where you live. Management Sci. 55(1):47–57.
Ghose A, Kwon HE, Lee D, Oh W (2019) Seizing the commuting moment: Contextual targeting
based on mobile transportation apps. Inform. Systems Res. 30(1):154–174.
Ghose A, Li B, Liu S (2019) Mobile targeting using customer trajectory patterns. Management
Sci. 65(11):5027–5049.
Gijsenberg MJ, Van Heerde HJ, Verhoef PC (2015) Losses loom longer than gains: Modeling the
impact of service crises on perceived service quality over time. J. Marketing Res. 52(5):642–
656.
Goldfarb A, Tucker C (2011) Online display advertising: Targeting and obtrusiveness. Marketing
Sci. 30(3):389–404.
Google M/A/R/C Study (2013) Mobile in-store research: How in-store shoppers are using mobile
devices. Accessed November 5, 2020,
https://www.thinkwithgoogle.com/_qs/documents/889/mobile-in-store_research-studies.pdf.
Guo T, Sriram S, Manchanda P (2021) The effect of information disclosure on industry payments
to physicians. J. Marketing Res. 58(1):115-140.
Halbheer D, G¨artner DL, Gerstner E, Koenigsberg O (2018) Optimizing service failure and
damage control. Internat. J. Res. Marketing 35(1):100–115.
Hansen N, Kupfer AK, Hennig-Thurau T (2018) Brand crises in the digital age: The short-and
long-term effects of social media firestorms on consumers and brands. Internat. J. Res.
Marketing 35(4):557–574.
Hess Jr RL, Ganesan S, Klein NM (2003) Service failure and recovery: The impact of relationship
factors on customer satisfaction. J Acad. Marketing Sci. 31(2):127–145.
Hoffman KD, Bateson JE (2001) Essentials of services marketing: Concepts, strategies and cases
(South-Western Pub).
Kim SJ, Wang RJH, Malthouse EC (2015) The effects of adopting and using a brand’s mobile
application on customers’ subsequent purchase behavior. J. Interactive Marketing 31:28–41.
Knox G, Van Oest R (2014) Customer complaints and recovery effectiveness: A customer base
approach. J. Marketing 78(5):42–57.
38
Kummer M, Schulte P (2019) When private information settles the bill: Money and privacy in
Google’s market for smartphone applications. Management Sci. 65(8):3470–3494.
Liu Y, Shankar V (2015) The dynamic impact of product-harm crises on brand preference and
advertising effectiveness: An empirical analysis of the automobile industry. Management Sci.
61(10):2514-2535.
Ma L, Sun B, Kekre S (2015) The squeaky wheel gets the grease—an empirical analysis of
customer voice and firm intervention on twitter. Marketing Sci. 34(5):627–645.
McCollough MA, Berry LL, Yadav MS (2000) An empirical investigation of customer
satisfaction after service failure and recovery. J. Service Res. 3(2):121–137.
Meuter ML, Ostrom AL, Roundtree RI, Bitner MJ (2000) Self-service technologies:
Understanding customer satisfaction with technology-based service encounters. J. Marketing
64(3):50–64.
Narang U, Shankar V (2019) Mobile app introduction and online and offline purchases and
product returns. Marketing Sci. 38(5):756–772.
National Retail Federation (2018) Top 100 retailers 2018. Accessed November 5, 2020,
https://nrf.com/resources/top-retailers/top-100-retailers/top-100-retailers-2018.
Neumann N, Tucker CE, Whitfield T (2019) Frontiers: How effective is third- party consumer
profiling? Evidence from field studies. Marketing Sci. 38(6):918–926.
Oliver RL (1980) A cognitive model of the antecedents and consequences of satisfaction
decisions. J. Marketing Res. 17(4):460–469.
Pauwels K, Neslin SA (2015) Building with bricks and mortar: The revenue impact of opening
physical stores in a multichannel environment. J. Retail 91(2):182–197.
Retail Dive (2020) Retailers see a 36% increase in mobile app downloads, and 54% growth in in-
app purchases during COVID. Accessed June 29, 2021, https://tinyurl.com/wc6bf7kc.
Schmittlein DC, Morrison DG, Colombo R (1987) Counting your customers: Who are they and
what will they do next? Management Sci. 33(1):1–24.
Shi S, Kalyanam K, Wedel M (2017) What does agile and lean mean for customers? An analysis
of mobile app crashes. Working paper, Santa Clara University, Santa Clara.
Smith AK, Bolton RN (1998) An experimental investigation of customer reactions to service
failure and recovery encounters: Paradox or peril? J. Service Res. 1(1):65–81.
Sriram S, Chintagunta PK, Manchanda P (2015) Service quality variability and termination
behavior. Management Sci. 61(11):2739–2759.
Tax SS, Brown SW, Chandrashekaran M (1998) Customer evaluations of service complaint
experiences: implications for relationship marketing. J. Marketing 62(2):60–76.
Tucker CE, Yu S (2019) Does it lead to more equal treatment? An empirical study of the effect of
smartphone use on customer complaint resolution. Working paper, Massachusetts Institute of
Technology, Boston.
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using
random forests. J. Amer. Statist. Assoc. 113(523):1228–1242.
Wang K, Goldfarb A (2017) Can offline stores drive online sales? J. Marketing Res. 54(5):706–
719.
Xu K, Chan J, Ghose A, Han SP (2017) Battle of the channels: The impact of tablets on digital
commerce. Management Sci. 63(5):1469–1492.
39
Table 1. Summary Statistics

Variable Mean Std. dev.
Frequency of purchases 0.82 1.34
Quantity of purchases 1.61 3.32
Value of purchases ($) 43.31 96.42
App failure/Failure experiencer 0.52 0.50
Recency of past purchases (in days) -45.68 68.83
Value of past purchases ($) 629.60 699.38
Frequency of past online purchases 0.66 1.97
Notes: These statistics of the variables are over pre- and post- 14 days of the failure. The past purchases are computed over a one-
year period. N = 273,378.
Table 2. Model-Free Evidence: Means of Outcome Variables for Treated and Control Groups
Variable Treated Treated Control Control
pre post pre post
period period period period
Frequency of purchases 0.74 0.89 0.75 0.93
Quantity of purchases 1.52 1.69 1.52 1.76
Value of purchases ($) 30.41 55.28 30.75 57.70
Frequency of purchases – Online 0.03 0.04 0.03 0.04
Quantity of purchases – Online 0.05 0.06 0.05 0.07
Value of purchases – Online ($) 1.34 2.93 1.50 3.17
Frequency of purchases – Offline 0.70 0.85 0.71 0.88
Quantity of purchases – Offline 1.47 1.63 1.47 1.69
Value of purchases – Offline ($) 29.07 52.35 29.25 54.53
Notes: These statistics are based on pre- and post- 14 days of the failures. N = 273,378.
Table 3. DID Model Results of Failure Shock for Purchases Across Channels
Variable Frequency of Quantity of Value of
purchases purchases purchases
Failure experiencer -0.024** -0.057** -2.181**
x Post shock (DID) (0.008) (0.020) (0.681)
Intercept 0.740*** 1.508*** 30.40***
(0.003) (0.005) (0.15)
R squared 0.004 0.001 0.018
Effect size -4.67% -4.93% -9.04%
Mean Y 0.82 1.61 43.31
Shopper fixed
effects YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 273,378.
40
Table 4. DID Model Results of Failure Shock for Purchases by Channel

Offline Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases Purchases purchases purchases Purchases
Failure
experiencer x
Post shock -0.022** -0.054** -2.088** -0.001 -0.002 -0.093
(DID) (0.008) (0.019) (0.660) (0.002) (0.003) (0.154)
Intercept 0.705*** 1.457*** 28.983*** 0.034*** 0.051*** 1.413***
(0.002) (0.005) (0.660) (0.0001) (0.001) (0.038)
R squared 0.0038 0.0001 0.0169 0.0016 0.0002 0.0016
Effect size -3.08% -3.74% -7.14% - - -
Mean Y 0.78 1.56 41.08 0.04 0.06 2.23
Shopper fixed
effects YES YES YES YES YES YES
Time fixed
Table 5. DID Model Results for Failures Occurring on Purchase and Non-Purchase Related Pages
Failure on purchase related page Failure on non-purchase related page
purchases purchases purchases purchases purchases purchases
Failure
experiencer x
Post shock 0.0003 -0.016 0.907 -0.053*** -0.108*** -4.627***
(DID) (0.013) (0.038) (1.195) (0.009) (0.022) (0.762)
Intercept 0.747*** 1.520*** 30.711*** 0.734*** 1.493*** 30.282***
(0.003) (0.007) (0.226) (0.002) (0.012) (0.497)
R squared 0.004 0.001 0.019 0.004 0.001 0.018
Mean Y 0.836 1.637 44.270 0.813 1.591 42.850
Shopper fixed
Time fixed
in-Differences. N = 160,662 for failure on purchase related page. N= 217,418 for failure on non-purchase related page.
Table 6. DID Model Results for Value of Purchases and Basket Size by Channel for Shoppers Close to a Store
(< 2 Miles) at the Time of Failure
Offline Online
Variable Value of Basket size Value of Basket size
purchases purchases
Failure experiencer x 13.542* 0.134* 0.885 0.023
Post shock (DID) (5.306) (0.058) (1.178) (0.020)
Intercept 33.413*** 0.870*** 1.818*** 0.055***
(1.272) (0.014) (0.308) (0.005)
R squared 0.0395 0.0064 0.0027 0.0012
Mean Y 55.00 0.95 3.18 0.07
Shopper fixed effects YES YES YES YES
Time fixed effects YES YES YES YES
in-Differences. Two miles is the median distance from the retailer’s nearest store at the time of failure. N = 6,572.
41
Table 7. Contrast Analysis Based on Distance to Store at the Time of Failure for Failure Experiencers
Variable Offline value of Offline basket
purchases size
Close to store x Post 14.130* 0.083
shock (5.707) (0.065)
R squared 0.0432 0.0064
Mean Y 54.96 0.98
Shopper fixed effects YES YES
Time fixed effects YES YES
Note: Closeness to store is defined using the median distance of 2 miles. There are 1,298 failure-experiencers within 2 miles of
the store at the time of failure and 1,527 failure-experiences who are 2 miles or farther from the store among those who opt-in for
location sharing. *** p < 0.001, ** p < 0.01, * p < 0.05. N = 5,650.
Table 8. DID Model Results of Failure Shock for Purchases Across Channels:
Moderating Effects of Relationship with Retailer and Past Online Purchase Frequency
Failure experiencer x Post shock (DID) -0.208*** -0.373*** -15.113***
(0.001) (0.031) (1.058)
DID x Past value of purchases 0.0001*** 0.0002*** 0.021***
(0.000) (0.000) (0.001)
DID x Recency of purchases -0.001*** -0.003*** -0.015**
(0.000) (0.000) (0.005)
DID x Past online purchase frequency -0.015** -0.017 -1.360***
(0.004) (0.012) (0.282)
Intercept 0.739*** 1.508*** 30.396***
(0.002) (0.004) (0.169)
R squared 0.0075 0.0006 0.0348
Mean Y 0.84 1.64 44.07
Shopper fixed effects YES YES YES
Notes: DID = Difference-in-Differences. Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p <
0.01, * p < 0.05. N = 273,378.
Table 9. Causal Forest Results: Summary of Individual Shopper Treatment Effect for Value of Purchases
Ntest Mean SD
τ 45,563 -1.660 1.136
τ| τ < 0 43,748 -1.739 1.089
τ| τ > 0 1,815 0.239 0.198
Note: 𝜏 represents the estimated Conditional Average Treatment Effect (CATE) for each individual in the test data.
42
Table 10. Results of Causal Forest Post-hoc CATE Regression for Value of Purchases
Variable Coefficient (Standard Error)
Intercept -0.958***(0.012)
Past value of purchases 0.000***(0.000)
Recency of purchases -0.005***(0.000)
Past online purchase frequency 0.037***(0.002)
Gender (female) -0.190***(0.008)
Loyalty program -0.340***(0.011)
R squared 0.493
Note: *** p < 0.001. N = 45,563.
Table 11. Results for November Failure: Failure Shock and Purchases Across Channels
Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure experiencer x -0.083*** -0.138*** -5.460***
Post shock (DID) (0.009) (0.023) (1.059)
Intercept 1.005*** 2.406*** 79.357***
(0.002) (0.006) (0.262)
R squared 0.0056 0.0004 0.0024
Mean Y 1.104 2.496 87.920
0.01, * p < 0.05. N = 286,976.
Table 12. Results for November Failure: Failure Shock and Purchases by Channel
(1) Offline (2) Online
Failure
experiencer x
Post shock -0.082*** -0.138*** -5.448*** -0.001 0.000 -0.012
(DID) (0.008) (0.023) (1.036) (0.002) (0.003) (0.205)
Intercept 0.968*** 2.360*** 76.697*** 0.035*** 0.044*** 2.528***
(0.002) (0.006) (0.257) (0.001) (0.002) (0.100)
R squared 0.0059 0.0004 0.0024 0.0000 0.0000 0.0000
Mean Y 1.069 2.450 85.270 0.210 0.298 13.760
Shopper fixed
Time fixed
0.01, * p < 0.05. N = 286,976.
43
Figure 1. App Screenshots
Figure 2 . Comparison of Failure-Experiencers’ and Non-Experiencers’ Purchases 14 Days before Failure
Note: The red line represents failure experiencers, while the solid black line represents the failure non-experiencers.
Figure 3. Comparison of Failure-Experiencers and Non-Experiencers
Note: Loyalty program level represents whether shoppers were enrolled (=1) or not (=0) in an advanced reward program.
44
Figure 4. Causal Forest Results: Individual CATE
Figure 5. Causal Forest Results: Quintiles By CATE
Note: Segment 1 represents shoppers most adversely affected by failure and Segment 5 represents those least adversely affected.
Figure 6. App Failure Effects on Value of Purchases Over Four Weeks

45
Figure 7. Retailer’s Revenue Loss by Percentile of Shoppers Experiencing App Failure
Note: CATE = Conditional Average Treatment Effect.

i
Web Appendix A
Checks for Exogeneity of Failure
Table A1. Robustness of Table 3 Results to Propensity Score Matching Estimates

x Post shock (DID) (0.008) (0.020) (0.699)
Intercept
0.747*** 1.521*** 30.776***
(0.002) (0.005) (0.175)
R squared 0.003 0.001 0.016
Mean Y 0.84 1.64 44.07
Shopper fixed effect YES YES YES
ii
Figure A1. Pre-Period Purchase Trends for Failure Experiencers and Non-Experiencers
(a) Past Frequency of Purchases
(b) Past Quantity of Purchases
(c) Past Proportion of Online Purchases
Note: The unit of X axis is number of days before the failure event.
iii
Web Appendix B
Robustness Check for Table 3 (Main Treatment Effect) Results
In this section, we present the results for robustness checks for the main estimation in Table 3
relating to: (a) alternative models with covariates and using Poisson model (Tables B1-B2), (b)
outliers (Table B3), (c) existing shoppers (Table B4), (d) alternative measures for prior use of
digital channels (Table B5), (e) regression-discontinuity style analysis (Table B6), and (f)
falsification/placebo checks (Tables B7 and B8).
Table B1. Robustness of Table 3 Results to Inclusion of Covariates Across Channels

Failure
experiencer x
Post shock -0.024* -0.057* -2.181**
(DID) (0.010) (0.025) (0.731)
Failure -0.021** -0.030 -0.694
experiencer (0.007) (0.018) (0.517)
Gender -0.009 -0.006 -0.444
(0.007) (0.018) (0.522)
Loyalty 0.009 0.022 0.800
program (0.006) (0.015) (0.42)
Intercept 0.745*** 1.508*** 30.222***
(0.007) (0.017) (0.492)
R squared 0.004 0.001 0.018
Mean Y 0.82 1.61 43.31
Notes: Robust standard errors clustered by shoppers are in parentheses. Time fixed effects are included; *** p < 0.001, ** p <
0.01, * p < 0.05. DID = Difference-in-Differences. N = 273,378.
Table B2. DID Poisson Model Results Across Channels

Variable Frequency of Quantity of
purchases purchases
Failure experiencer x -0.021* -0.031*
Post shock (DID) (0.010) (0.012)
Log pseudo-likelihood -92,658.71 -167,663.92
Mean Y 0.82 1.61
Shopper fixed effects YES YES
Time fixed effects YES YES
Notes: Robust standard errors in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-in-Differences. N =
273,378.
iv
Table B3. Robustness of Table 3 Results to Outlier Spenders

Failure experiencer x Post -0.023* -0.055* -2.139**
shock (DID) (0.008) (0.019) (0.677)
Intercept 0.728*** 1.472*** 29.524***
(0.002) (0.005) (0.169)
R squared 0.0043 0.0013 0.0192
Mean Y 0.81 1.59 42.69
Table B4. Robustness of Table 3 Results to Existing Shoppers Across Channels

x Post shock (DID) (0.001) (0.020) (0.693)
Intercept 0.756*** 1.541*** 31.061***
(0.002) (0.005) (0.173)
R squared 0.0038 0.0010 0.0181
Mean Y 0.84 1.64 44.07
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001. DID = Difference-in-Differences. N =
267,534.
Table B5. Robustness of Table 3 Results to Alternative Measures of Digital Channel Use
Based on Past Online Purchase (or Not) Before Failure
Failure experiencer x Post shock (DID) -0.193*** -0.389*** -12.935***
(0.012) (0.03) (0.879)
DID x Value of past purchases 0.000*** 0.000*** 0.017***
(0.000) (0.000) (0.001)
DID x Recency of purchases -0.001*** -0.003*** -0.022***
(0.000) (0.000) (0.006)
DID x Past online buyer or not -0.019*** -0.029*** -1.344***
(0.003) (0.007) (0.220)
Intercept 0.640*** 1.141*** 25.034***
(0.006) (0.015) (0.446)
R squared 0.1587 0.1217 0.0932
Mean Y 0.84 1.64 44.07
Notes: Robust standard errors clustered by shoppers are in parentheses; Each moderator interacts with the difference-in-
differences (DID) term failure experiencers x post shock; *** p < 0.001. The observations include those of shoppers with at least
one purchase in the past for computing recency. N = 267,534.
v
Table B6. Robustness of Table 3 Results to Regression Discontinuity Style Analysis

Failure experiencer x Post shock (DID) -0.045*** -0.09** -3.169**
(0.012) (0.03) (1.112)
Intercept 0.725*** 1.478*** 30.218***
(0.002) (0.005) (0.195)
R squared 0.0031 0.0007 0.0160
Mean Y 0.80 1.56 42.07
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001. DID = Difference-in-Differences. N =
198,432.
Table B7. Falsification/Placebo Check (Re-assigned Treatment) for Failure Shock and Purchases Across Channels
Failure experiencer x -0.000 -0.009 0.399
Post shock (DID) (0.166) (0.019) (0.680)
Intercept 0.740*** 1.508*** 30.397
(0.002) (0.004) (0.170)
R squared 0.0038 0.0010 0.018
Table B8. Falsification/Placebo Check (Re-assigned Timing) for Failure Shock and Purchases Across Channels
Failure experiencer x 0.0003 0.005 -0.928
Post shock (DID) (0.018) (0.021) (0.660)
Intercept 0.895*** 1.815*** 38.026***
(0.002) (0.005) (0.164)
R squared 0.000 0.0005 0.0003
vi
Web Appendix C
Robustness Check for Table 4 (By Channel) Results
In this section, we present the results for robustness checks for the cross-channel estimation in
Table 4 relating to (a) alternative models with covariates and using Poisson model (Tables C1-
C2), (b) outliers (Table C3), (c) existing shoppers (Table C4), (d) alternative measures for prior
use of digital channels (Table C5), and (e) regression-discontinuity style analysis (Table C6), and
(f) placebo falsification checks (Tables C7 and C8).
Table C1. Robustness of Table 4 Results to Inclusion of Covariates by Channel

Offline Online
Failure
experiencer x
Post shock -0.022* -0.055* -2.088** -0.001 -0.003 -0.093
(DID) (0.010) (0.025) (0.709) (0.002) (0.004) (0.158)
Failure -0.018* -0.025 -0.527 -0.003* -0.005 -0.167
experiencer (0.007) (0.018) (0.501) (0.001) (0.003) (0.112)
Post shock 0.170*** 0.221*** 25.275*** 0.009*** 0.015*** 1.672***
(0.007) (0.018) (0.509) (0.001) (0.003) (0.113)
Gender -0.007 -0.003 -0.410 -0.002 -0.003 -0.033
(0.007) (0.018) (0.506) (0.001) (0.003) (0.113)
Loyalty 0.011 0.026 0.889* -0.002 -0.004 -0.089
program (0.006) (0.014) (0.407) (0.001) (0.002) (0.091)
Intercept 0.707*** 1.451*** 28.651*** 0.037*** 0.057*** 1.571***
(0.007) (0.017) (0.477) (0.001) (0.003) (0.106)
R squared 0.0040 0.0011 0.0169 0.0003 0.0003 0.0016
Mean Y 0.78 1.56 41.08 0.04 0.06 2.23
Notes: Robust standard errors clustered by shoppers are in parentheses. Time fixed effects are included; *** p < 0.001, ** p <
0.01. DID = Difference-in-Differences. N = 273,378.
Table C2. DID Poisson Model Results by Channel

Offline Online
Variable Frequency Quantity of Frequency Quantity of
of purchases purchases of purchases purchases
Failure experiencer x -0.020* -0.031* -0.019 -0.018

Post shock (DID) (0.010) (0.012) (0.043) (0.055)
Log pseudo-likelihood -88,907.82 -162,972.62 -6,395.99 -9,303.36
Mean Y 0.78 1.56 0.04 0.06
Shopper fixed effects YES YES YES YES
Time fixed effects YES YES YES YES
Notes: Robust standard errors in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-in-Differences. N =
273,378.
vii
Table C3. Robustness of Table 4 Results to Outlier Spenders

Offline Online
Failure
experiencer x
Post shock -0.021* -0.053* -2.068** -0.001 -0.002 -0.071
(DID) (0.007) (0.019) (0.656) (0.002) (0.003) (0.152)
Intercept 0.694*** 1.423*** 28.185*** 0.033*** 0.049*** 1.339***
(0.002) (0.005) (0.3164) (0.001) (0.002) (0.038)
R squared 0.0041 0.0012 0.0179 0.0003 0.0000 0.0017
Mean Y 0.78 1.53 40.51 0.04 0.06 2.18
Shopper fixed
Time fixed
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, * p < 0.05. DID = Difference-in-
Differences. N = 272,706.
Table C4. Robustness of Table 4 Results to Existing Shoppers by Channel

Offline Online
Failure
experiencer x
Post shock -0.024** -0.059** -2.166** -0.002 -0.003 -0.117
(DID) (0.007) (0.019) (0.672) (0.002) (0.003) (0.156)
Intercept 0.721*** 1.489*** 29.616*** 0.035*** 0.052*** 1.449***
(0.002) (0.005) (0.168) (0.000) (0.001) (0.039)
R squared 0.0037 0.0009 0.0169 0.0003 0.0002 0.0016
Mean Y 0.80 1.58 41.81 0.04 0.06 2.26
Shopper fixed
Time fixed
viii
Table C5. Robustness of Table 4 Results to Alternative Measure of Digital Channel Use Based on Past Online
Purchase (or Not) Before Failure
Offline Online
purchases purchases Purchases purchases purchases purchases
Failure
experiencer x
Post shock -0.186*** -0.373*** -12.123*** -0.007** -0.015** -0.812***
(DID) (0.011) (0.029) (0.852) (0.002) (0.005) (0.195)
DID x Value
of past 0.000*** 0.000*** 0.016*** 0.000*** 0.000*** 0.001***
purchases (0.000) (0.000) (0.001) (0.000) (0.000) (0.000)
DID x
Recency of -0.001*** -0.003*** -0.017** 0.000*** 0.000*** -0.005***
purchases (0.000) (0.000) (0.006) (0.000) (0.000) (0.001)
DID x Past -0.011*** -0.017* -1.089*** -0.008*** -0.012*** -0.255***
online buyer (0.003) (0.007) (0.213) (0.001) (0.001) (0.049)
Intercept 0.616*** 1.112*** 24.138*** 0.024*** 0.029*** 0.896***
(0.006) (0.015) (0.432) (0.001) (0.002) (0.099)
R squared 0.1584 0.1207 0.0927 0.0868 0.0647 0.0245
Mean Y 0.80 1.58 41.81 0.04 0.06 2.26
Shopper
fixed effect YES YES YES YES YES YES
Time fixed
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001. DID = Difference-in-Differences. The
observations include those of shoppers with at least one purchase in the past for computing recency. N = 267,534.
Table C6. Robustness of Table 4 Results to Regression Discontinuity Style Analysis

Offline Online
Variable Frequency of Quantity of Value of Frequency Quantity of Value of
purchases purchases purchases of purchases purchases purchases
Failure experiencer -0.044*** -0.087** -3.105** -0.001 -0.003 -0.064
x Post shock (DID) (0.012) (0.030) (1.074) (0.003) (0.006) (0.253)
Intercept 0.692*** 1.429*** 28.841*** 0.033*** 0.048*** 1.376***
(0.002) (0.006) (0.189) (0.000) (0.001) (0.044)
R squared 0.0030 0.0006 0.0150 0.0001 0.0002 0.0014
Mean Y 0.76 1.51 39.94 0.04 0.05 2.12
Shopper fixed
Time fixed effects YES YES YES YES YES YES
ix
Table C7. Falsification/Placebo Check (Re-assigned Treatment) for Failure Shock and Purchases by Channel
Failure experiencer -0.001 -0.012 -0.349 0.001 0.003 0 .050
x Post shock (DID) (0.008) (0.019) (0.659) (0.002) (0.003) (0.153)
Intercept 0.705*** 1.457*** 28.983 0.034*** 0.51*** 1.414***
(0.002) (0.005) (0.165) (0.0004) (0.001) (0.038)
R squared 0.0037 0.0009 0.0168 0.0003 0.0002 0.0015
Shopper fixed
Table C8. Falsification/Placebo Check (Re-assigned Timing) for Failure Shock and Purchases by Channel
Failure experiencer -0.0009 0.003 -0.916 0.001 0.001 -0.012
x Post shock (DID) (0.008) (0.021) (0.640) (0.002) (0.003) (0.151)
Intercept 0.858*** 1.762 36.510*** 0.036*** 0.052*** 1.515***
(0.002) (0.005) (0.159) (0.0004) (0.001) (0.038)
R squared 0.0001 0.0006 0.0045 0.0001 0.0001 0.0012
Shopper fixed
Table C9. Robustness of Table 4 Results to Outlier Spenders

Offline Online
Failure
experiencer x
Post shock -0.021* -0.053* -2.068** -0.001 -0.002 -0.071
(DID) (0.007) (0.019) (0.656) (0.002) (0.003) (0.152)
Intercept 0.694*** 1.423*** 28.185*** 0.033*** 0.049*** 1.339***
(0.002) (0.005) (0.3164) (0.001) (0.002) (0.038)
R squared 0.0041 0.0012 0.0179 0.0003 0.0000 0.0017
Mean Y 0.78 1.53 40.51 0.04 0.06 2.18
Shopper fixed
Time fixed
x
Web Appendix D
Other Robustness Checks
Table D1. Effects of App Failure on Average Value of Purchases Each Week
Variable Estimate
(Standard error)
Treat x Week 0 -0.52 (0.34)
Treat x Week 1 -1.40** (0.54)
Treat x Week 2 -0.28 (0.27)
Treat x Week 3 -0.23 (0.26)
Treat x Week 4 -0.33* (0.25)
Intercept 13.19*** (0.09)
Mean Y 15.92
Shopper fixed effects YES
Time fixed effects YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. N = 1,366,890.
DID = Difference-in-Differences.
Table D2. Results of DID Model with Stacked Online and Offline Purchases and Channel Dummies
Failure experiencer x Post shock (DID) -0.001 -0.003 -0.093
(.002) (0.003) (0.154)
DID x Channel dummy -0.021** -0.052** -1.994**
(0.008) (0.019) (0.678)
Post shock x Channel dummy 0.161*** 0.206*** 23.602***
(0.006) (0.014) (0.495)
Intercept 0.037*** 0.754*** 15.198***
(0.001) (0.002) (0.085)
R squared 0.0636 0.0362 0.0644
Mean Y 0.82 1.61 43.31
in-Differences. Channel dummy is 1 for offline purchases and 0 for online purchases. N = 546,756.

RP3656

Uploaded by

Copyright:

Available Formats

RP3656

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RP3656

Uploaded by

Copyright:

Available Formats

1

DOES MOBILE APP FAILURE IMPACT

* Unnati Narang (unnati@illinois.edu) is Assistant Professor of Marketing, University of Illinois, Urbana

DOES MOBILE APP FAILURE IMPACT

where failures varied exogenously across users.

formulating preventive and recovery strategies.

purchases by channel, including spillovers across channels.

four research objectives:

users. Although endogeneity could be addressed through an instrumental variables approach, it is

the individual level using data-driven machine learning methods.

choice, and mobile apps.

results for managers.

2.1. Services Marketing and Service Failures

boosting consumer adoption of technology-driven solutions.

how consumers react to service failures (Halbheer et al. 2018).

the relevant app provider.

2.2. Channel Choice and Channel Migration

research on cross-channel effects is mixed, showing both substitution and complementarity

channel. In a truly integrated omnichannel retailing environment, the distinctions between

which of these competing and potentially co-existing mechanisms is dominant.

2.3. Mobile Apps

(Kummer et al. 2019).

to failures using a machine learning approach.

3. Research Setting and Data

3.1. Research Setting

that is the focus of our study.

interactions. Figure 1 shows some screenshots from the app.

< Figure 1 about here >

large store network.

3.2. Data and Sample

3 Source: eMarketer Retail, https://retail-index.emarketer.com/

experienced the failure, while 66,121 did not.

the mobile browser.

< Table 1 about here >

4.1. Empirical Strategy

with these mechanisms. To analyze heterogeneity in treatment effects by shopper, we first

out multiple robustness checks.

4.2. Exogeneity of Failure Shock

< Figures 2 and 3 about here >

4.3. Econometric Model and Identification

(1) 𝑌𝑖𝑡 = 𝛼0 + 𝛼1𝐹𝑖𝑃𝑡 + 𝜇𝑖 + 𝜆𝑡 + 𝜗𝑖𝑡

(frequency, quantity, monetary value of purchases), F is a dummy variable denoting treatment (1

accounted for by time fixed effects in our model.

5. Empirical Analysis Results

5.1. Relationship between App Failures and Purchases

equally include any “day of the week” effects in shopping.4

experiencers relative to non-experiencers comes from the exogenous failure shock.

< Table 2 about here >

7.1% (p < 0.01).5

< Table 3 about here >

5.3 Heterogeneity in Effects by Channel

app spills over to offline sales.

< Table 4 about here >

5.4. Potential Mechanisms

planned amount subsequently.

or making payments. In contrast, non-purchase-related pages relate to activities farther from

strong purchase intent or on a non-purchase related page.

< Table 5 about here >

< Table 6 about here >

channel substitution in purchases for shoppers closer to the store.