RP3656

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

1

DOES MOBILE APP FAILURE IMPACT


ONLINE AND IN-STORE SHOPPING?

Unnati Narang
Venkatesh Shankar
Sridhar Narayanan

July 2021

* Unnati Narang (unnati@illinois.edu) is Assistant Professor of Marketing, University of Illinois, Urbana


Champaign, Venkatesh Shankar (vshankar@mays.tamu.edu) is Professor of Marketing and Coleman Chair in
Marketing and Director of Research, Center for Retailing Studies at the Mays Business School, Texas A&M
University, and Sridhar Narayanan (sridhar.narayanan@stanford.edu) is an Associate Professor of Marketing at the
Graduate School of Business, Stanford University. We thank the participants at the ISMS Marketing Science
conference, the UTDFORMS conference, and research seminar participants at the University of California, Davis,
the University of Toronto, the University of Illinois, Urbana Champaign, and the University of Texas at Austin for
valuable comments.
2

DOES MOBILE APP FAILURE IMPACT


ONLINE AND IN-STORE SHOPPING?

Abstract

Mobile devices account for a majority of transactions between shoppers and marketers. Branded
retailer mobile apps have been shown to increase purchases across channels (e.g., online and
brick-and-mortar store). However, mobile apps are also prone to failures or disruptions in use.
Does a failure in an omnichannel retailer’s branded app impact shoppers’ purchases? Are there
cross-channel effects of these failures? Does the impact of failure vary across shoppers? These
questions, although important, are challenging to answer because field experiments inducing
failures are infeasible, and observational data suffer from selection issues. We identify a natural
experiment involving an exogenous two-hour failure in a large omnichannel retailer’s mobile app
during which the app became unavailable to all the shoppers. We examine the short-term impact
of app failures on purchases in both online and store channels using a difference-in-differences
approach. We investigate two potential mechanisms behind these effects, channel substitution
and brand preference dilution. We also analyze shopper heterogeneity in the effects based on
shoppers’ relationship with the retailer, past digital channel use, and other characteristics
identified using machine-learning approaches. Our analysis reveals that an app failure has a
significant overall negative effect on shoppers’ frequency, quantity, and monetary value of
purchases across channels. The effects are heterogeneous across channels and by shoppers. The
decreases in purchases across channels are driven by purchase reductions in stores and not in the
online channel. The fall in purchases in brick-and-mortar stores is consistent with the brand
preference dilution mechanism, whereas the preservation of purchases in the online channel is in
line with the channel substitution mechanism. Furthermore, shoppers with a higher monetary
value of past purchases and less recent purchases are less sensitive to app failures. The results
suggest that app failures lead to an annual revenue loss of about $0.97-$1.36 million for the
retailer in our data. About 47% shoppers contribute to about 70% of the loss. We outline targeted
failure prevention and service recovery strategies that retailers could employ.

Keywords: service failure, mobile marketing, mobile app, retailing, omnichannel, difference-in-
differences, natural experiment, causal effects
1

1. Introduction

Mobile commerce has seen tremendous growth in recent years, with mobile devices accounting

for a majority of interactions between shoppers and marketers. This growth has accelerated

through the rapid increase of smartphone penetration – about 6.1 billion people (41.5% of the

global population) used smartphones in 2020.1 Mobile applications (henceforth, apps) have

emerged as an important channel for retailers as they have been found to increase engagement

and purchases across channels (e.g., Kim et al. 2015, Xu et al. 2017, Narang and Shankar 2019).

Purchases made through retail apps grew by 54% during the recent COVID-19 pandemic (Retail

Dive 2020). While retailers have widely embraced mobile apps, there is little understanding

about how service failures in mobile apps affect shopper behavior. In this study, we empirically

examine the impact of such failures on shopper behaviors in an omnichannel context, specifically

how it varies across channels and shoppers. Causal measurement of the effect of app failures is

challenging, but we address the identification problem by using a novel natural experiment

where failures varied exogenously across users.

The study of failures in mobile apps is important because apps are highly vulnerable to

failures. The diversity of mobile operating systems (e.g., iOS, Android), devices (e.g., mobile

phone and tablet), and versions of hardware and software and their constant use across a variety

of mobile networks often result in app failures. Failures in a retailer’s mobile app have the

potential to negatively affect shoppers’ engagement with the app and their future purchases in the

online channel. In addition, app failures may have spillover effects in other channels due to both

substitution of purchases across channels and dilution of preference for the retailer brand.

Understanding how failures affect purchases in different channels is important for retailers.

1 Source: Statista report on smartphone penetration (https://tinyurl.com/b6ajrmyc) last accessed June 2, 2021.
2

Preventing and recovering from app failures is critical for managers because more than 60%

of shoppers abandon an app after experiencing failure(s) (Dimensional Research 2015). App

crashes are among the leading causes of mobile failures, contributing 65% to all iOS failures

(Blancco 2016). In 2020, several major consumer and ecommerce apps reported large volumes

of crashes due to software issues (Bugsnag 2020). About 2.6% of all app sessions result in a

crash, suggesting about 1.5 billion app failures across 60 billion app sessions annually

(Computerworld 2014). Given the extent of these app failures and their potential damage to

firms’ relationships with customers, determining the impact of app failures is important for

formulating preventive and recovery strategies.

Despite the importance of app failures, not much is known about their impact on purchases.

While app crashes in a shopper’s mobile device have been shown to negatively influence app

engagement (e.g., restart time, browsing duration, and activity level, Shi et al. 2017), the

relationship between app failures and subsequent purchases has not been studied. Furthermore, a

large proportion of shoppers use both online (desktop and mobile websites) and offline (brick-

and-mortar) retail channels. However, we do not know much about the impact of app failures on

purchases by channel, including spillovers across channels.

The effect of app failures on subsequent purchases is an empirical question because different

mechanisms can lead to different outcomes. On the one hand, multiple shopping channels allow

shoppers to substitute channels in case of a failure, mitigating any negative impact of the failure

and even potentially resulting in a positive effect of the failure on purchases in other channels

(channel substitution effect). On the other hand, an app failure might cause shoppers to evaluate

the brand adversely, leading to a negative impact of the failure on purchases across channels

(brand preference dilution effect). These two mechanisms may coexist both across and within
3

shoppers. Therefore, the sign of the net effect of failures on purchases in other channels depends

on the magnitudes of the effects induced by these two mechanisms. Retailers would benefit from

a deeper understanding of these mechanisms for devising strategies to deal with app failures.

The effects of app failure may also differ across shoppers. Shoppers may be more or less

negatively impacted by failures depending on factors such as shoppers’ relationship with the firm

(Goodman et al. 1995, Hess et al. 2003, Chandrashekaran et al. 2007, Knox and van Oest 2014,

Ma et al. 2015) and shoppers’ prior use of the firm’s digital channels (Cleeren et al. 2013, Liu

and Shankar 2015, Shi et al. 2017). It is important for managers to better understand how the

effects of failure vary across shoppers so that they can devise targeted preventive and recovery

strategies.

Our study quantifies and explains the impact of a failure in a retailer’s branded app on the

frequency, quantity, and monetary value of purchases in online and offline channels. We address

four research objectives:

 What are the effects of a failure in a retailer’s branded mobile app on the frequency,
quantity, and monetary value of subsequent purchases by the shoppers?
 How do these effects vary by channel, i.e., in the online and in-store channels of the
retailer?
 What possible mechanisms can explain the effects of an app failure?
 How do these effects vary by shoppers, i.e., by their past relationship with the firm, prior
channel use, and other shopper characteristics?

Estimation of the causal effects of an app failure on shopping outcomes is challenging. The

gold standard among the methods available to uncover the causal impact of service failures is a

randomized field experiment. However, such an experiment would be impractical in this context

because a retailer will unlikely deliberately induce failures in an app even for a subset of its

shoppers for ethical reasons. An observational study is a viable alternative, but it has to surmount

the potential endogeneity of app failures, which may occur at different times for different app
4

users. Although endogeneity could be addressed through an instrumental variables approach, it is

hard to come up with instrumental variables that are valid and exhibit sufficient variation.

We overcome the estimation challenges and mitigate the potential endogeneity of app

failures by exploiting a natural experiment involving a two-hour systemwide (affecting all app

users who attempted to use the app) exogenous failure in a large omnichannel retailer’s (similar

to Walmart and Macy’s) mobile app to estimate the short-term effect of the app failure.

Conditional on signing in on the day of the failure, whether a user experienced a failure or not

was a function of whether they attempted to use the app during the time window of the failure,

which they could not have anticipated in advance. We verify that there are no systematic

differences between users who experienced failures vs. those who did not. We take advantage of

the resulting quasi-randomness in the incidence of failure to estimate the short-term effects of the

app failure. We use a difference-in-differences (DID) approach that compares the pre- and post-

failure outcomes for the failure experiencers with those of failure non-experiencers to estimate

the effects of the app failure 14 days pre- and post- the failure.

We explore the two potential mechanisms behind the effects of the app failure, channel

substitution and brand preference dilution. We investigate the heterogeneity in the effects of

failure on shopping behavior by exploiting the panel nature of our dataset. We test for the effects

separately in online and offline channels. We also examine the moderating effects of factors such

as relationship with the firm and prior digital channel use on the effects of failure. Prior research

(e.g., Ma et al. 2015, Hansen et al. 2018) has explored these factors for services in general but

not in the digital or mobile app contexts. In addition, we recover the heterogeneity of effects at

the individual level using data-driven machine learning methods.


5

Our results show that a failure in the branded app of a large retailer has a significant overall

negative effect on shoppers’ frequency, quantity, and monetary value of purchases across

channels, but the effects are heterogeneous across channels and shoppers. Purchases in stores

decline significantly, but those in the online channel do not. Our analyses suggest that the fall in

purchases in stores is consistent with the brand preference dilution mechanism, whereas the

preservation of purchases in the online channel is in line with the channel substitution

mechanism. Furthermore, shoppers with a higher monetary value of past purchases and less

recent purchases are less sensitive to app failures. Finally, about 47% of these shoppers

contribute to about 70% of the losses in annual revenues that amount to $0.97-$1.36 million.

Our research contributes to the literature by: (1) quantifying the effects of app failure on

multiple purchase outcomes such as frequency, quantity, and monetary value of purchases; (2)

examining the impact of app failure in different channels, including channel spillover effects; (3)

exploring the mechanisms behind the observed effects; and (4) uncovering the moderators of the

effects of app failure on purchases and the heterogeneity in effects across shoppers. These novel

characteristics of our study contribute to the research streams on service marketing, channel

choice, and mobile apps.

In the remainder of the paper, we first discuss the related literature. Next, we discuss our data

and empirical setting. Subsequently, we describe our empirical strategy, lay out and test the key

identification strategy, and conduct our empirical analysis. We estimate the effect of an app

failure across all channels and for each channel, examine the mechanisms underlying the effects

and assess the heterogeneity of effects by shopper characteristics. We perform several robustness

checks to rule out alternative explanations. We conclude by discussing the implications of our

results for managers.


6

2. Related Literature

2.1. Services Marketing and Service Failures

In recent years, technology-enabled services have risen in importance, leading to important shifts

(Dotzel et al. 2013). First, services that can be delivered without human or interpersonal

interaction have grown tremendously. Online and mobile retailing no longer require shoppers to

interact with human associates to make purchases. Second, closely related to this idea is the fact

that services are increasingly powered by technologies such as mobile apps that allow anytime-

anywhere access and convenience. Third, recent events such as the COVID-19 pandemic have

made it necessary for services to be delivered with little physical contact with sales associates,

boosting consumer adoption of technology-driven solutions.

With growing reliance on technologies for service delivery and the complexity of the

technology environment in which these services are delivered, service failures are attracting

greater attention. A service failure can be defined as service performance that falls below

customer expectations (Hoffman and Bateson 2001). Service failures are widespread and are

expensive to mend. Service failures resulting from deviations between expected and actual

performance damage customer satisfaction and brand preference (Smith and Bolton 1998). Post-

failure satisfaction tends to be lower even after a successful recovery and is further negatively

impacted by the severity of the initial failure (Andreassen 1999, McCollough et al. 2000). In

interpersonal service encounters, human interactions and employee behaviors influence both

failure effect and recovery (Bitner et al. 1990, Meuter et al. 2000). In technology-based

encounters, such as those in e-tailing and with self-service technologies (e.g., automated teller

machines [ATMs]), the opportunity for human interaction is typically small after experiencing

failure (Forbes et al. 2005, Forbes 2008). However, there may be significant heterogeneity in
7

how consumers react to service failures (Halbheer et al. 2018).

The mobile context, particularly mobile apps, differs from interpersonal or other self-service

technology contexts, so it is difficult to predict the direction and extent of the impact of an app

failure on shopping outcomes. First, mobile apps are accessible at any time and in any location

through an individual’s mobile device. On the one hand, because a shopper can tap, interact,

engage, or transact multiple times at little additional cost on a mobile app, the shopper may treat

any one service failure as acceptable without significantly altering her subsequent shopping

outcomes. Such an experience differs from that with a self-service technological device such as

an ATM, which may need the shopper to travel to a specific location or incur other hassle costs

that may not exist in the mobile app context. On the other hand, the costs of switching to a

competitor are also much lower in the mobile app context, where a typical shopper uses and

compares multiple apps. Thus, a service failure in any one app may aggravate the shopper’s

frustration with the app, leading to strong negative effects on outcomes such as purchases from

the relevant app provider.

Second, a mobile app is one of the many touchpoints available to shoppers in today’s

omnichannel shopping environment. Thus, a shopper who experiences a failure in the app could

move to the web-based channel or even the offline or store channel. In such cases, the impact of

a failure on the app could be zero or even positive (if the switch to the other channel leads to

greater engagement of the shopper with the retailer). By contrast, if the channels act as

complements (e.g., if the shopper uses one channel for researching products and another for

purchasing) or if the failure impacts the preference for retailer brand, a failure in one channel

could impede the shopper’s engagement in other channels. Thus, it is difficult to predict the

effects of app failure, in particular, about how they might spill over to other channels.
8

2.2. Channel Choice and Channel Migration

A shopper’s experience in one channel can influence their behavior in other channels. Prior

research on cross-channel effects is mixed, showing both substitution and complementarity

effects, leading to positive and negative synergies between channels (e.g., Avery et al. 2012,

Pauwels and Neslin 2015). The relative benefits of channels determine whether shoppers

continue using existing channels or switch to a new channel (Ansari et al. 2008, Chintagunta et

al. 2012). When a bricks-and-clicks retailer opens an offline store or an online-first retailer opens

an offline showroom, its offline presence drives sales in online stores (Wang and Goldfarb 2017,

Bell et al. 2018).2 This is particularly true for shoppers in areas with low brand presence prior to

store opening and for shoppers with an acute need for the product. However, the local shoppers

may switch from purchasing online to offline after an offline store opens, even becoming less

sensitive to online discounts (Forman et al. 2009). In the long run, the store channel shares a

complementary relationship with the Internet and catalog channels (Avery et al. 2012).

While the relative benefits of one channel may lead shoppers to buy more in other channels,

the costs associated with one channel may also have implications for purchases beyond that

channel. In a truly integrated omnichannel retailing environment, the distinctions between

physical and online channels blur, with the online channel representing a showroom without

walls (Brynjolfsson et al. 2013). Mobile technologies are at the forefront of these shifts. More

than 80% of shoppers use a mobile device while shopping even inside a store (Google M/A/R/C

Study 2013). As a result, if there are substantial costs associated with using a mobile channel

(e.g., those induced by app failures), such costs may spill over to other channels. If shoppers use

the different channels in complementary ways, the disruption of one of those channels could

2 A bricks-and-clicks retailer is a retailer with both offline (“bricks”) and online (“clicks”) presence.
9

negatively impact their engagement with the other channels as well. However, if shoppers treat

the channels as substitutes, failures in one channel may drive the shoppers to purchase in another

channel. If an app failure dilutes shoppers’ preference for the retailer brand, it may lead to

negative consequences across channels. Overall, the direction of the effect of app failures on

outcomes in other channels such as in brick-and-mortar stores and online channels depends on

which of these competing and potentially co-existing mechanisms is dominant.

2.3. Mobile Apps

The nascent but evolving research in mobile apps shows positive effects of mobile app channel

introduction and use on engagement and purchases in other channels (Kim et al. 2015, Xu et al.

2017, Narang and Shankar 2019) and for coupon redemptions (Fong et al. 2015, Andrews et al.

2016, Ghose et al. 2019) under different contingencies as well as privacy tradeoffs of paid apps

(Kummer et al. 2019).

To our knowledge, only one study has examined crashes in a mobile app on shoppers’ app

use. Shi et al. (2017) find that while crashes have a negative impact on future engagement with

the app, this effect is lower for those with greater prior usage experience and for less persistent

crashes. However, while they look at subsequent engagement of the shoppers with the mobile

app, they do not examine purchases. Thus, our research adds to Shi et al. (2017) in several ways.

First, we focus on estimating the causal effects of failure. To this end, we exploit the random

variation in failures induced by systemwide failures. Second, we quantify the value of app

failure’s effects on subsequent purchases. The outcomes we study include the frequency,

quantity, and value of purchases, while the key outcome in that study is app engagement. Third,

we examine the cross-channel effects of mobile app failures, including in physical stores, while

Shi et al. (2017) study subsequent engagement with the app provider only within the app.
10

Finally, we explore the mechanisms behind the effects of failure and examine the moderating

effects of relationship with the retailer and prior digital and heterogeneity in shoppers’ sensitivity

to failures using a machine learning approach.

3. Research Setting and Data

3.1. Research Setting

We obtained the dataset for our empirical analysis from a large U.S.-based retailer. The retailer

sells a variety of products, including software such as video games and hardware such as video

game consoles and controllers, downloadable content, consumer electronics and wireless

services to 32 million customers. The gaming industry is large ($160 billion revenues in 2020),

and the retailer is a major player in this industry, offering a rich setting. The retailer has a large

offline presence similar to Walmart, PetSmart, or any other primarily brick-and-mortar chain

with an omnichannel strategy. The retailer has a store network comprising 4,175 brick-and-

mortar stores across the U.S. Additionally, it has a large ecommerce website and a mobile app

that is the focus of our study.

The app allows shoppers to browse the retailer’s product catalog, get deals, locate nearby

stores, check loyalty points, know the latest product launches, checkout product reviews, order

online through a mobile browser, as well as make purchases through the app itself. The app is

typical of mobile apps of large retailers (e.g., PetSmart, Costco) in features and shopper

interactions. Figure 1 shows some screenshots from the app.

< Figure 1 about here >

The online and offline channel sales mix of the retailer in our data is typical of most large

retailers. About 76% of the total sales for the top 100 largest retailers in the U.S. are from similar

retailers with a store network of 1,000 or more stores (National Retail Federation 2018). Most of
11

these large retailers have a predominant brick-and-mortar presence and a growing online

presence. For example, Walmart’s online revenues constitute 7.6% of all revenues, 1.3% of all

PetSmart’s sales come from the online channel, and Home Depot generates 6.8% of all revenues

from ecommerce.3 For the retailer in our data, online sales comprised 10.2% of overall revenues,

somewhat higher than that for similar large retailers. Furthermore, about 26% of the shoppers

bought online in the 12 months before the failure event we study. The retailer’s online sales

displayed a 13% annual average growth in the last five years, similar to these retailers who also

exhibited double digit growth (Barron’s 2018). Its annual online sales revenues are also

substantial at $1.1 billion. Therefore, the cross-channel effects of any app-related event or

interventions are important for this retailer, similar to other retailers. Our research context offers

a rich setting to examine the effects of a mobile app failure for a multi-channel retailer with a

large store network.

3.2. Data and Sample

Our goal is to quantify the impact of an app failure in a retailer’s branded mobile app. We

leverage unique failure shocks that arise from server errors in the backend system of the firm and

cause a failure, making the app unavailable for 2-5 hours. Our focal retailer’s app experienced an

exogenous systemwide failure about every 7-10 weeks during 2014-2018. We are able to identify

and collect data on two such exogenous systemwide failure shocks, one in 2014 and the other in

2018. We use the data on the 2018 failure as our main sample because the app had a fully

functional in-app shopping feature. This failure occurred on April 11, 2018 and lasted for two

hours between 12pm and 2pm. The firm provided us with mobile app use data and transactional

data across all channels for all the app users who accessed the app on the failure day; 70,568

3 Source: eMarketer Retail, https://retail-index.emarketer.com/


12

experienced the failure, while 66,121 did not.

The app data capture events that shoppers experience in the app, along with their timestamps.

These data recorded the exogenous app failure event as a ‘server error’ at the time when users

tried to access the app if they experienced this failure. This event represents an exogenous app

breakdown for all the users, and the data allow us to identify shoppers who logged in during the

failure event window to experience the systemwide app failure. We also observe their purchases

in stores and online. The online channel represents purchases at the retailer’s website, including

the mobile browser.

Table 1 provides the descriptive statistics for the variables of interest. Over a period of 14

days pre- and post- failure, shoppers make an average of a little less than one purchase

comprising about 1.6 items for a value of about $43. In the 12 months before failure, on average,

shoppers make purchases worth $623 and buy 0.66 times in the online channel. Overall, 52% of

the shoppers experience the failure during our focal failure event.

< Table 1 about here >

4. Empirical Strategy

4.1. Empirical Strategy

Our empirical strategy is to leverage the exogenous systemwide failure in the app to estimate the

effect of app failure on shopping outcomes. The main idea behind our empirical approach is that

conditional on signing in on the day of the failure, whether a user experienced a failure or not

was a function of whether they attempted to use the app during the time window of the failure,

which they could not have anticipated in advance. Furthermore, the time window is such that

there is no systematic difference between users who signed in during this period, vs. those who

signed in at other times during the day. This exogeneity in failures allows us to compare those

who experienced the failure (and thus could not use the app) with those who didn’t among the
13

shoppers who attempted to use the app on the day of the failure. We examine this assumption in

the data by testing for balance between shoppers who experience a failure and those who do not,

using a set of pre-failure variables. To determine the treatment effect of a failure, we conduct a

DID analysis, comparing the post-failure behaviors with the pre-failure behaviors of shoppers

who logged in on the day of the failure and experienced it (treatment group) relative to those

who logged in on that day but did not experience the failure (control group).

To analyze the treatment effects within and across channels, we repeat this analysis with the

same outcome variables separately for the offline and online channel. To understand the

underlying mechanisms for the effects, we examine two explanations, brand preference dilution

and channel substitution, using the data on shoppers’ closeness to purchase (based on their app

use and location at the time of failure) and their time to next purchase to check for consistency

with these mechanisms. To analyze heterogeneity in treatment effects by shopper, we first

perform a moderator analysis using a priori factors identified in the literature such as prior

relationship strength and digital channel use, followed by a data driven machine learning (causal

forest) approach to fully explore all sources of heterogeneity across shoppers. Finally, we carry

out multiple robustness checks.

4.2. Exogeneity of Failure Shock

To verify that there is no systematic difference between shoppers who experience the failure

shock in the app and those who do not, we examine three types of evidence. First, we present

plots of the behavioral trends in shopping for both failure-experiencers and non-experiencers for

the failure shock in the 14 days before the app failure. Figure 2 depicts the monetary value of

daily purchases by those who experienced the failure and those who did not. The purchase trends

in the pre-period are parallel and nearly identical for the two groups (p > 0.10), assuring us that
14

these shoppers do not systematically differ in their purchase behavior. The trends are similar for

the frequency and quantity of purchases, and the proportion of online purchases (see Web

Appendix Figure A1). Second, we compare their observed demographic variables, such as

gender and membership in the retailer’s loyalty program (Figure 3). We do not find any

significant differences in these variables across the two groups (p > 0.10). Third, the results from

the subsequent use of a propensity score matched (PSM) sample using nearest neighbor

matching show similar treatment effects, suggesting that the treatment is indeed exogenous (see

Web Appendix Table A1). These verification checks give us confidence in the validity of our

empirical strategy.

< Figures 2 and 3 about here >

4.3. Econometric Model and Identification

As described in the previous section, we estimate the effects of app failure on shopping outcomes

by relying on a quasi-experimental research design with a DID approach (e.g., Angrist and

Pischke 2009).

Our two-way fixed effects (TWFE) linear DID regression takes the following form:

(1) 𝑌𝑖𝑡 = 𝛼0 + 𝛼1𝐹𝑖𝑃𝑡 + 𝜇𝑖 + 𝜆𝑡 + 𝜗𝑖𝑡

where i is shopper, t is the time period (14 days pre or post failure), and Y is the outcome variable

(frequency, quantity, monetary value of purchases), F is a dummy variable denoting treatment (1

if shopper i experienced the app failure and 0 otherwise), P is a dummy variable denoting the

period (1 for the period after the systemwide app failure and 0 otherwise), α is a coefficient

vector, 𝜇𝑖 is shopper fixed effect, 𝜆𝑡 is time fixed effect (post period relative to pre period), and ϑ

is an error term. We cluster standard errors at the shopper level, following Bertrand et al. (2004).

The coefficient of FiPt, i.e., 𝛼1, is the treatment effect of the app failure.
15

The identification of this treatment effect hinges on the conditional independence of the

failure, i.e., the failure is random conditional on a shopper logging into the app during the time

window of the failure shock. Any unobserved time-invariant differences among shoppers are

accounted for by shopper fixed effects and any time-varying factors common to both groups are

accounted for by time fixed effects in our model.

5. Empirical Analysis Results

5.1. Relationship between App Failures and Purchases

We first examine the overall differences in shopping behaviors between shoppers who

experienced failures and those who did not using model-free evidence 14 days pre and post

failure. Our goal is to estimate the short-term effects (i.e., 14 days pre and post) of this failure to

avoid overlapping with any other failures before or after. Our estimates represent the local

average treatment effect (LATE) of the failure. Using a two-week period also allows us to

equally include any “day of the week” effects in shopping.4

Table 2 reports the raw comparisons of post-failure vs. pre-failure purchase outcome

variables for both failure experiencers (70,568 treated) and non-experiencers (66,121 control)

among the set of consumers who accessed the app on the day of the failure. We find that post-

failure, shoppers who experienced the systemwide failure had 0.04 (p < 0.001) lower purchase

frequency, 0.07 (p < 0.001) lower purchase quantity, and $2.42 (p < 0.001) lower monetary value

than shoppers who did not experience the failure. A simple comparison of shopping outcomes

across the two groups shows that the average monetary value of purchases increased by 81.8%

($30.41 to $55.28) for failure-experiencers, while it increased by 87.6% ($30.75 to $57.70) for

non-failure experiencers post failure relative to the pre period (p < 0.001). Given our

4 We also estimated a model for a longer period of four weeks and found similar effects (see Figure 6 and Table D1).
16

identification strategy, the diminished growth in the monetary value of purchases for failure

experiencers relative to non-experiencers comes from the exogenous failure shock.

< Table 2 about here >


5.2. Main Diff-in-Diff Model Results

We first examine the effects of the app failure on the frequency, quantity, and monetary value of

subsequent purchases by the shoppers across all channels. The results from the DID model in

Table 3 show a negative and significant effect of app failure on the frequency (𝛼1 = -0.024, p <

0.01), quantity (𝛼1 = -0.057, p < 0.01), and monetary value of purchases (𝛼1 = -2.181, p < 0.01)

across channels. Relative to the pre-period for the control group, the treated group experiences a

decline in frequency of 3.2% (p < 0.01), quantity of 3.7% (p < 0.01), and monetary value of

7.1% (p < 0.01).5

< Table 3 about here >

5.3 Heterogeneity in Effects by Channel

Next, we examine the channel spillover effects of app failures in greater depth. We first split the

total purchases into offline and online purchases and repeat our regression analyses for each

channel. Table 4 reports the results for these alternative channel-based dependent variables. App

failure has a significant negative effect on the frequency (𝛼1 = -0.02, p < 0.01), quantity (𝛼1 = -

0.05, p < 0.01), and monetary value of purchases (𝛼1 = -2.09, p < 0.01) in the offline channel.

We do not find a significant (p > 0.10) effect of app failure on any of the purchase outcomes in

the online channel. Because there is no corresponding increase in the online channel and because

the overall purchases drop, we conclude that the decreases in overall purchases across channels

are largely due to declines in in-store purchases. This finding shows that for a primarily brick-

5 We calculate the percentage change by dividing the treatment coefficient by the intercept. For instance, the
treatment coefficient for value of purchases (2.18) divided by intercept (30.40) amounts to a 7.1% change.
17

and-mortar retailer with a growing online presence, the negative effects of a failure in the mobile

app spills over to offline sales.

< Table 4 about here >

5.4. Potential Mechanisms

We next explore the potential mechanisms behind the effects of app failure using a descriptive

analysis of individual shoppers’ app usage and behavior at the time of failure. Based on when

shoppers experience the failure in their shopping journey (i.e., close to or far from purchase), we

may observe channel substitution and brand preference dilution effects of the app failure.

Shoppers who are close to purchase at the time of app failure may quickly switch channels and

complete their purchase through the mobile or desktop website forms of the online channel.

However, shoppers who are far from purchase when the app fails may be early in their shopping

journey and hence, may reduce their preference for the retailer brand and buy less than their

planned amount subsequently.

To explore the role of stage in the shopping journey in explaining the differential effects of

app failure, we utilize information in the data about the app page on which the shopper was when

the failure occurred. This information allows us to examine the effects of app failure across

shoppers based on whether they are close to or far from purchase at the time of failure. Table 5

reports the DID model results of the analysis relating to the app failure occurring on purchase

related and non-purchase related pages. Purchase-related pages in an app involve pages that are

closer to purchase, such as those relating to adding a product to shopping cart, clicking checkout,

or making payments. In contrast, non-purchase-related pages relate to activities farther from

purchase, such as browsing products and obtaining store related information. The effect of app

failure is negative and significant (p < 0.001) on all the outcome variables for shoppers who
18

experience failure on a non-purchase related page than for shoppers who experience failure on a

purchase related page. Shoppers who already have a strong purchase intent and are on a

purchase-related page right before the failure are not as negatively affected as those without a

strong purchase intent or on a non-purchase related page.

< Table 5 about here >

To further explore the role of the purchase funnel, we compare the change in the value of

purchases between the post and the pre app failure time periods for two groups of shoppers,

those close to and those far from purchase based on a median split of re-login attempts during the

failure window. The median number of attempts is three. The negative effect of failure for

shoppers who make greater re-login attempts is lower (Value of purchases(post-pre, high attempt) =

28.03, Value of purchases(post-pre, control) = 26.95, p > 0.01) than for shoppers who make fewer re-

login attempts (Value of purchases(post-pre, low attempt) = 20.33, Value of purchases(post-pre, control) =

26.95, p < 0.001). The group of shoppers who are close to purchase at the time of app failure are

likely to repeatedly attempt to re-login during the failure duration to complete their intended

purchase. Such shoppers may eventually make the purchase in another channel, resulting in

channel substitution. However, the group of shoppers who are far from purchase at the time of

failure, make fewer attempts to log back during the failure time window. A greater negative

effect of app failure for such shoppers may be due to brand preference dilution.

Failure-experiencers who were close to a purchase or had purchase intent, would have had to

determine whether to complete the transaction, and if so, whether to do it online or offline. For

shoppers who typically buy online, the cost of going to the retailer’s website to complete a

purchase interrupted by the app failure is smaller than that of going to the store to complete the

purchase. Therefore, these shoppers will likely complete the transaction online and not exhibit
19

any significant decrease in shopping outcomes in the online channel post failure. Thus, channel

substitution effect likely explains the insignificant effects of app failure in the online channel. By

contrast, shoppers who typically buy in the retailer’s brick-and-mortar stores and who experience

the app failure, will likely have a diminished perception of the retailer with fewer incentives to

buy from the stores in the future. Thus, brand preference dilution effect may prevail for these

shoppers after app failure. This effect is due to a negative spillover from the app channel to the

offline channel for shoppers experiencing the failure even if they are primarily offline shoppers.

Indeed, a negative message or experience can have an adverse spillover effect on attributes or

contexts outside the realm of the message or experience (Ahluwalia et al. 2001).

To further explore channel substitution toward the online channel, we examine the time

elapsed between the occurrence of the failure and subsequent purchase in the online channel.

Failure experiencers’ inter-purchase time online (Meantreated = 162.8 hours) is much shorter than

non-experiencers’ (Meancontrol = 180.7 hours) (p = 0.003). This result further suggests that after

an app failure, shoppers look to complete their intended purchases in the online channel.

Next, to understand channel substitution toward the offline channel, we examine the effect of

app failure for shoppers who were geographically close to a physical store at the time of failure

for the subsample of shoppers who allow location tracking in the app. Shoppers who are closer to

a store when they experience the app failure could more easily complete their purchase in the

store than shoppers farther from a store. Table 6 reports the DID model for the subsample of

shoppers located within two miles of the retailer’s store at the time of failure for shoppers who

allow location sharing in the app. The results show that shoppers closer to the retailer’s store are

not negatively affected by the failure. Rather surprisingly, both the basket size and the monetary

value of purchases for shoppers close to a store are significantly higher after app failure (p <
20

0.05). This result suggests that shoppers who experience a failure close to or at a physical store

end up buying additional items in the store. An implication is that channel substitution to a store

can lead to more purchases, but that channel substitution is less likely for shoppers who are

farther from the store. However, the proportion of shoppers close to the store at the time of

failure is very small (2.4%), so the average effects of app failure for the failure experiencers on

offline purchases and all purchases are still negative in our main model.

< Table 6 about here >

To further analyze the role of distance to the store at the time of failure, we present the

contrast analysis between shoppers who were less than two miles and those who were greater

than two miles from the nearest store at the time of failure in Table 7. The basket sizes of these

groups of shoppers do not differ post failure. However, shoppers closer to the store spend more

than those farther from the store post failure, suggesting that the app failure is associated with

channel substitution in purchases for shoppers closer to the store.

< Table 7 about here >

Much of the analyses we provide for the mechanisms is primarily descriptive and

exploratory. Nevertheless, overall, the evidence is consistent with the asymmetry in the effect of

app failure on shopping outcomes across the two channels.

5.5. Heterogeneity by Shopper: Relationship Strength and Prior Digital Use

The literatures on relationship marketing and service recovery suggest two factors that may

moderate the impact of app failures on outcomes: relationship strength and prior digital channel

use. These variables are typically used in direct marketing for targeting and can provide useful

managerial insights about the heterogeneity in the effects of failures as well.

Relationship Strength. The service marketing literature offers mixed evidence on the
21

moderating role of the strength of customer relationship with the firm in the effect of service

failure on shopping outcomes. Some studies suggest that stronger relationship may aggravate the

effect of failures on product evaluation, satisfaction, and on purchases (Goodman et al. 1995,

Chandrashekaran et al. 2007, Gijsenberg et al. 2015). Other studies show that stronger

relationship attenuates the negative effect of service failures (Hess et al. 2003, Knox and van

Oest 2014). Consistent with the direct marketing literature (Schmittlein et al. 1987, Bolton

1998), we operationalize customer relationship using the RFM (recency, frequency, and

monetary value) dimensions. Because of high correlation between the interactions of frequency

with (failure experiencers x post shock) and value of purchases with (failure experience x post

shock) (r = 0.90, p < 0.001) and because value of purchases is more important for the retailer, we

drop frequency of past purchases.

Prior Digital Channel Use. The moderating role of a shopper’s prior use of the retailer’s

online channel in app failure’s impact on shopping outcomes could be positive or negative. On

the one hand, more digitally experienced app users may be less susceptible to the negative

impact of an app crash on subsequent engagement with the app than less digitally experienced

app users (Shi et al. 2017) because they are conditioned to expect some level of technology

failures, consistent with the product harm crises literature (Cleeren et al. 2013, Liu and Shankar

2015) and the expectation-confirmation theory (Oliver 1980, Tax et al. 1998, Cleeren et al.

2008). On the other hand, prior digital channel exposure may heighten shopper expectations and

make them less tolerant of failures. We operationalize this variable as the cumulative number of

purchases that the shopper made from the retailer’s website prior to experiencing a failure.

The results of the model with relationship strength and past digital channel use as moderators

appear in Table 8. Consistent with our expectations, the monetary value of past purchases has
22

positive and significant interaction coefficients with the DID model variable across all the

outcome variables (p < 0.001). Thus, app failures have a smaller effect on shoppers who

purchased more from the retailer, consistent with the results of Ahluwalia et al. (2001). Recency

has negative coefficients (p < 0.01), suggesting that the more recent shoppers are less tolerant of

failure. A failure shock also affects the frequency and value of purchases (p < 0.01) of shoppers

with greater digital channel or online purchase experience with the retailer more.

< Table 8 about here >

5.6. Heterogeneity by Shopper: Causal Forest Approach

In addition to the service marketing literature-based moderator variables examined earlier, we

also explore heterogeneity in treatment effects relating to additional managerially useful

observed variables (e.g., gender, loyalty level) not fully examined by prior research.

Unfortunately, including these variables as additional moderators in the DID analysis explodes

the number of main and interaction effects.

Recent methods of causal inference using machine learning such as the causal forest

approach allow us to recover individual-level conditional average treatment effects (CATE)

(Athey et al. 2017, Wager and Athey 2018). The causal forest is an ensemble of causal trees that

averages the predictions of treatment effects produced by each tree for thousands of trees. It has

been applied in marketing to model customer churn and information disclosure (Ascarza 2018,

Guo et al. 2021). A causal tree is similar to a regression tree. The typical objective of a

regression tree is to build accurate predictions of the outcome variable by recursively splitting

the data into subgroups that differ the most on the outcome variable given their covariates. A

regression tree has decision split nodes characterized by binary conditions on covariates and leaf

or terminal nodes at the bottom of the tree. The regression tree algorithm continuously partitions
23

the data, evaluating and re-evaluating at each node to determine (a) whether further splits would

improve prediction, and (b) the covariate and the value of the covariate on which to split. The

goodness-of-fit criterion used to evaluate the splitting decision at each node is the mean squared

error (MSE) computed as the deviation of the observed outcome from the predicted outcome.

The tree algorithm continues making further splits as long as the MSE decreases by more than a

specified threshold.

The causal tree model adapts the regression tree algorithm in several ways to make it

amenable for causal inference. First, it explicitly moves the goodness-of-fit-criterion to treatment

effects rather than the MSE of the outcome measure. Second, it employs “honest” estimates, that

is, the data on which the tree is built (training data) are separate from the data on which it is

tested for prediction of heterogeneity (test data). Thus, the tree is honest if for a unit i in the

training sample, it only uses the response Yi to estimate the within-leaf treatment effect, or to

decide where to place the splits, but not both (Athey and Imbens 2016, Athey et al. 2017). To

avoid overfitting, we use cross-validation approaches in the tree-building stage.

Importantly, the goodness-of-fit criterion for causal trees is the difference between the

estimated and the actual treatment effect at each node. While this criterion ensures that all the

degrees of freedom are used well, it is challenging because we never observe the true effect.

5.6.1. Causal Tree: Goodness-of-fit Criterion

Following Wager and Athey (2018), if we have n independent and identically distributed training

examples labeled i = 1, ..., n, each of which consists of a feature vector Xi  [0, 1]d, a response

Yi  R, and a treatment indicator Wi  [0, 1], the CATE at x is:

(2) 𝜏(𝑥) = 𝔼[𝑌1𝑖 ― 𝑌0𝑖 | 𝑋𝑖 = 𝑥]


24

We assume unconfoundedness, i.e., conditional on Xi, the treatment Wi is independent of

outcome Yi. Because the true treatment effect is not observed, we cannot directly compute the

goodness-of-fit criterion for creating splits in a tree. This goodness-of-fit criterion is as follows.
2
[
(3) 𝑄𝑖𝑛𝑓𝑒𝑎𝑠𝑖𝑏𝑙𝑒 = 𝔼 ((𝜏𝑖(𝑋𝑖) ― 𝜏𝑖(𝑋𝑖)) ]
Because 𝜏𝑖(𝑋𝑖) is not observed, we follow Athey and Imbens’s (2016) approach to create a

transformed outcome 𝑌∗𝑖 that represents the true treatment effect. Assume that the treatment

indicator Wi is a random variable. Suppose there is a 50% probability for a unit i to be in the

treated or the control group, an unbiased true treatment effect can be obtained for that unit by just

using its outcomes Y in the following way. Let

(4) 𝑌∗𝑖 = 2𝑌𝑖 𝑖𝑓 𝑊𝑖 = 0 and 𝑌∗𝑖 = ― 2𝑌𝑖 𝑖𝑓 𝑊𝑖 = 1

It follows that:
1 1
(5) 𝔼[𝑌∗𝑖] = 2.(2𝔼[𝑌𝑖(1)] ― 2𝔼[𝑌𝑖(0)]) = 𝔼[𝜏𝑖]

Therefore, we can compute the goodness-of-fit criterion for deciding node splits in a causal

tree using the expectation of the transformed outcome (Athey and Imbens 2016). Once we

generate causal trees, we can compute the treatment effect within each leaf because it has a finite

number of observations and standard asymptotics apply within a leaf. The differences in the

treated and control units’ outcomes within each leaf produce the treatment effect in that leaf.

5.6.2. Causal Forest Ensemble

In the final step, we create an ensemble of trees using ideas from model averaging and bagging.

Specifically, we take predictions from thousands of trees and average over them (Guo et al.

2021). This step retains the unbiased and honest nature of tree-based estimates but reduces the

variance. The forest averages over the estimates from B trees in the following manner.

(6) 𝜏 (𝑥) = 𝐵―1∑𝐵


𝑏=1 𝜏 𝑏(𝑥)
25

Because monetary value of purchases is the key outcome variable of interest to the retailer,

we estimate individual level treatment effect on value of purchases for each failure experiencer

separately using the observed covariate data. These covariates include gender and loyalty

program in addition to the three theoretically-driven moderators, namely, value of past

purchases, recency of past purchases and prior digital channel use. These individual attributes are

important for identifying individual-level effects and for developing targeting approaches (e.g.,

Neumann et al. 2019). We use a random sample of two-thirds of our data as training data and the

remaining one-third as test data for predicting CATE. We use half of the training data to

maintain honest estimates and for cross-validation to avoid overfitting.

5.6.3. Causal Forest Results

The estimates from causal forest using 1,000 trees appear in Table 9. About 96% of the shoppers

have a negative value of CATE with an average of -1.739. The distribution of CATE across

shoppers appears in Figure 4. The shopper quintiles based on CATE levels reflects this

distribution in Figure 5, which shows that Segment 1 of the most sensitive shoppers exhibit

higher variance than the rest.

< Table 9 and Figures 4 and 5 about here >

Next, we regress the CATE estimate on the covariate space to identify the covariates that best

explain treatment heterogeneity. The results appear in Table 10. They show that all the

covariates, including gender and loyalty are significant (p < 0.001). Shoppers with higher value

of past purchases and more frequent online purchases are less sensitive to an app failure than

others. Shoppers who bought more recently in the past are less tolerant of an app failure. Some

of these results complement those from the moderator analysis.

< Table 10 about here >


26

The causal forest-derived CATE regression differs from the moderator DID regression in

important ways. First, the moderator regression uses the entire sample for estimation, while the

causal forest, the basis for the CATE regression, uses a subset of the data (the training sample)

for estimation. Second, the causal forest underlying the CATE regression splits the training data

further to estimate an honest tree, estimating from an even smaller subset of the moderator

regression sample. Third, relative to the linear moderator regression, the CATE regression can

handle a much larger number of covariates. Because of these differences, the results of the

CATE regression model may not exactly mirror those of the moderator regression model.

However, the broad contours of the results remain unchanged.

5.7. Replicating the Analysis for Another App Failure

To explore the generalizability of our results, we extend our analysis from one failure shock to

another shock since such failures are common and occur about 5-7 times each year for the focal

app. While granular data on these failures are difficult to collect even for a single failure, we

were able to collect data for another failure that had occurred on November 3, 2014 at 5:30 pm

and lasted five hours. Of the shoppers who accessed the app that day, 70,884 experienced the

failure, while 63,604 did not. We repeat the DID analyses for this sample. The results appear in

Tables 11 and 12. Consistent with the main model results, those for this failure show a negative

effect of the app failure on purchases across channels (p < 0.001). The effect size translates to a

7.5%, 5.5%, and 6.2% decrease in frequency, quantity, and monetary value, respectively. As in

the main sample, the decreases are primarily in brick-and-mortar stores.

< Tables 11 and 12 about here >


27

6. Robustness Checks and Ruling out Alternative Explanations

We perform several robustness checks and tests to rule out alternative explanations for the effect

of an app failure on purchases.

Alternative model specifications. Although the failure in our data is exogenous and we

include shopper fixed effects, to be sure, in addition to our proposed DID model, we also

estimate models with shopper covariates to estimate the treatment effect of interest. Additionally,

we estimate Poisson count data models for the frequency and quantity variables. The results from

these models replicate the findings from Tables 3 and 4 and appear in the Web Appendix Tables

B1-B2 and C1-C2, respectively. The coefficients of the treatment effect from Table B1 and C1

represent changes in outcomes due to the app failure, conditioned on covariates. These results are

substantively similar to those in Tables 3 and 4. The insensitivity of the results to control

variables suggests that the effect of unobservables relative to these observed covariates would

have to be very large to significantly change our results (Altonji et al. 2005). Similarly, the

results are robust to a Poisson specification, reported in Tables B2 and C2. We also estimate

models with weekly fixed effects. The results from these models show consistent treatment

effects as our main model.

Outliers. We re-estimate the models by removing outliers from our data. We remove

extremely heavy spenders who are greater than three standard deviations away from the mean in

monetary value of purchases in the pre-period. Web Appendix Tables B3 and C3 report these

results. We find the results to be consistent with those reported earlier.

Existing shoppers. Another possible explanation for app failures’ effect can be that only new

or dormant shoppers are sensitive to failures, perhaps due to low switching costs. Therefore, we

remove those with no purchases in the last 12 months to see if their behavior is similar to that of
28

the existing shoppers. Indeed, Web Appendix Tables B4 and C4 report substantively similar

results after excluding the new or dormant shoppers.

Alternative measures of digital channel use moderators. In lieu of past online purchases

frequency as a measure of prior digital channel use, we use measures based on median split in

the number and share of online purchases, and whether a shopper is an online buyer or not. The

results for all these alternative online purchase measures are similar to our proposed model

results. The results are shown in Web Appendix Tables B5 and C5 for digital channel use

operationalized as whether a shopper is an online buyer or not.

Regression discontinuity analysis. To ensure that there are no unobservable differences

between failure experiencers and non-experiencers based on the time of login, we carry out a

‘regression discontinuity’ (RD) style analysis in the one hour before the start time of the service

failure. For the RD analysis, we consider only app users in the neighborhood of this time, using

as control group those users who logged in one hour before and after the failure period and as

treated the users who logged in during the failure period. The results are substantively similar to

our main model results and appear in Web Appendix Tables B6 and C6.

Falsification/Placebo tests. To rule out the possibility that our regression estimates

spuriously pick up variations driven by factors other than the failure, we conduct additional

falsification and placebo tests. First, we randomly reassign treatment to our sample users. The

results from these checks appear in Tables B7 and C7. As reported in these tables, we do not find

an effect for the treatment coefficient in these placebo tests. Second, we randomly reassign the

timing of treatment. The results from these checks appear in Tables B8 and C8. Again, we do not

find any treatment effects. These falsification tests mitigate concerns for spurious correlations.
29

Longer-term effect of failures. Our main analysis shows the short-term 14-day effect of app

failures. To explore how these effects evolve over time, we examined the outcomes four weeks

pre- and post- failure event. There is a steep fall in the period immediately after the failure.

However, purchases climb back to higher levels over the next three weeks. Therefore, the effect

is strongest immediately after failure. These patterns appear in Figure 6 and Web Appendix

Table D1. The table shows the coefficients of the interactions of weekly dummies with TREAT

for a DID regression. Because an app failure occurs every 7-10 weeks, we estimate the effects

four weeks pre and post so as to avoid our pre- or post- periods overlapping with any other

failure that we cannot observe.

< Figure 6 about here >

Stacked model for channel effects. The results for online and offline purchases in Table 4 do

not show the relative sizes of the effects across the two channels. To examine these relative

effects, we estimate a stacked model of online and offline outcomes that includes a channel

dummy. The results for this model appear in Web Appendix Table D2. We interpret the effects

as a proportion of the purchases within the channel and conclude that the effects in the offline

channel are more negative than those in the online channel (p < 0.01). We also estimated a DID

regression model with value of purchases in the offline channel as a proportion of total purchases

and found negative and significant effects of failure (p < 0.01).

7. Summary, Economic Significance, Managerial Implications, and Limitations

7.1. Summary

In this paper, we addressed novel research questions: What is the effect of a service failure in a

retailer’s mobile app on the frequency, quantity, and monetary value of purchases in online and

offline channels? What possible mechanisms may explain these effects? How do shoppers’
30

relationship strength and prior digital channel use moderate these effects? How heterogeneous is

shoppers’ sensitivity to failures? By answering these questions, our research fills an important

gap at the crossroads of three disparate streams of research in different stages of development:

the mature stream of service failures, the growing stream of omnichannel marketing, and the

nascent stream of mobile marketing. We leveraged a random systemwide failure in the app to

measure the causal effect of an app failure. To our knowledge, this is the first study to causally

estimate the effects of a digital service failure using real world data. Using unique data spanning

online and offline retail channels, we examined the spillover effects of such failures across

channels and examined heterogeneity in these effects based on channels and shoppers.

Our results reveal that app failures have a significant negative effect on shoppers’ frequency,

quantity, and monetary value of purchases across channels. These effects are heterogeneous

across channels and shoppers. Interestingly, the overall decreases in purchases across channels

are driven by reductions in store purchases and not in digital channels. Furthermore, we find that

that shoppers with higher monetary value of past purchases are less sensitive to app failures.

Overall, our nuanced analyses of the mechanisms by which an app failure affects purchases

offer new and insightful explanations in a cross-channel context, including channel substitution

and brand preference dilution. Our findings from shopper heterogeneity analyses are consistent

with the view that some customers may be tolerant of technological failures (Meuter et al. 2000).

Finally, our study offers novel insights into the cross-channel implications of app failures.

7.2. Economic Significance

The economic effects of failures are sizeable for any retailer to alter its service failure preventive

and recovery strategies. Our main model (Table 3) shows a decrease of $2.18 in the 14 days after

failure resulting in an immediate economic loss of about $153,838 in revenues from just our
31

sample of 70,568 failure experiencers. Based on our weekly estimates, the economic impact of

an app failure in a retailer’s branded app is a revenue loss of about $194,768 from a single failure

for five weeks.6 The retailer experiences about 5-7 failures each year, resulting in a potential loss

of $0.97-1.36 million across failures. Importantly, as the app user base grows, the loss from

failures can also grow substantially if left unmonitored.

The economic effect is meaningful for several reasons. First, since retailers operate on thin

margins (2-3% in many categories) and are cost-conscious, such an economic loss is impactful.

Second, the effect size of 7.1% from our results is consistent with those from other similar causal

studies. For example, exposure to banner advertising has been shown to lift purchase intention by

0.473% worth 42 cents/click to the firm (Goldfarb and Tucker 2011). Third, in the mobile

context, the effect of being in a crowd (of five people relative to two per square meter when

receiving a mobile promotion) results in an economically meaningful 2.2% more clicks

(Andrews et al. 2016). Fourth, Akca and Rao (2020) argue that a revenue drop of $5.32 million

is economically significant for a large company such as Orbitz. Fifth, as sales through the mobile

app and online sales are growing rapidly, this effect is only getting larger. Sixth, our estimates

are for a two-hour app failure in a year for five weeks. Our estimates reflect the short-term

impact but may cause long-term depletion in brand preference as well.

To gauge the potential long-term economic damage, we examined the shoppers’ purchases

over six months after the failure. We estimated the potential revenue loss from failure-

experiencing customers who substantially reduce their spending after the failure, ultimately

becoming “lost” customers. We compared the percentage of customers in the treated and control

groups who dropped their average spending to less than one standard deviation of their pre-

6We compute this figure by using the weekly effect coefficients in Table D1, i.e., $(0.52 + 1.40 + 0.28 + 0.23 +
0.33)*N for the first five weeks for N = 70,568 failure experiencers, totaling $194,768.
32

failure average spending. This percentage was significantly higher for failure-experiencers

(6.51%) than for non-failure-experiencers (6.09%). The incremental loss translates into a

permanent loss of $1.89 million revenues for the retailer based on 1,778 (70,568*0.42%*6)

“lost” customers at $53.10 per customer assuming an average customer lifetime of 20 years for

six failures in a year. This estimate is consequential for retailers.

7.3. Managerial Implications

Service failure and low-quality service likely lead to relationship termination (Sriram et al.

2015). The insights from our research better inform executives in managing their mobile app and

channels and offer implications for service failure preventive and recovery strategies.

Preventive Strategies. Managers can use the estimate that an app failure in a branded retail

app results in a 7.1% decrease in monetary value of purchases to budget resources for their

efforts to prevent or reduce app failures. The result that the effects of an app failure vary by

online and offline channel offers guidance to managers for preventing app failures right before a

major offline event or in-store sale when more traffic is expected in-store. Similarly, managers

can identify active store visitors and plan a dedicated strategy for them.

By identifying failure-sensitive shoppers based on relationship strength, prior digital use, and

individual-level CATE estimates, managers can take proactive actions to prevent these shoppers

from reducing their shopping intensity with the firm. Figure 7 represents the loss of revenues

(spending) from each percentile of shoppers at different levels of failure sensitivity.

< Figure 7 about here >

About 70% of the losses in revenues due to failure arise from just 47% of the shoppers.

Managers can manage these shoppers’ expectations through email and app notification

messaging channels. Warning shoppers of typical number of disruptions in the app can preempt
33

negative attributions and attitudes, and limit potential brand dilution and drop in revenues due to

app failure.

Recovery Strategies. The finding that an app failure in a branded retail app results in reduced

purchases across channels suggests that managers should develop interventions and recovery

strategies to mitigate the negative effects of app failures not just in the mobile channel, but also

in other channels, in particular, the offline channel. Thus, seamlessly collecting and integrating

data from a mobile app with data from its stores and websites can help an omnichannel retailer

build continuity in shoppers’ experiences and even offer recovery in multiple channels.

Immediately after a shopper experiences an app failure, the manager of the app could provide

gentle nudges and even incentives for the shopper to complete an abandoned transaction on the

app. Typically, a manager may need to provide these nudges and incentives through other

communication channels such as email, phone call, or face-to-face chat. These nudges are similar

in spirit and execution to those from firms like Fitbit and Amazon, who remind customers

through email to reconnect when they disconnect their watch and smart speaker, respectively. If

the store is a dominant channel for the retailer, the retailer should use its store associates to

reassure or incentivize shoppers. In some cases, managers can even offer incentives in other

channels to complete a transaction disrupted by an app failure.

The finding that app failure can enhance spending for shoppers experiencing the failure close

to the store offers useful cross-selling opportunities for the retailer. After a systemwide failure is

resolved, retailers can proactively promote in the store nearest to each failure-experiencing

shopper products based on the shoppers’ purchase history.

Managers should mitigate the negative effects of app failures for the most sensitive shoppers

first. They should proactively identify failure-sensitive shoppers and design preemptive
34

strategies to mitigate any adverse effects. We find that shoppers with weaker relationship with

the retailer are more sensitive to failures. Thus, firms should address such shoppers for recovery

after a careful cost-benefit analysis. This is important because apps serve as a gateway for future

purchases for these shoppers.

Finally, our analysis of heterogeneity in shoppers’ sensitivity to app failures suggests that

managers should satisfy first the shoppers with the highest values of CATE. Interventions

targeted at the 47% of the shoppers who contribute to 70% of losses could lead to higher returns.

7.4. Limitations

Our study has limitations that future research can address. First, we analyze available data on

two failures in a branded retailer’s mobile app, so we could not fully explore all the failures with

varying durations and timing. Second, our results are most informative for similar retailers that

have a large brick-and-mortar presence but growing online and in-app purchases. If data are

available, future research could replicate our analyses for app failures for primarily online

retailers with an expanding offline presence (e.g., Bonobos, Warby Parker). Third, we do not

have data on competing apps that shoppers may use. Additional research could study shoppers’

switching behavior if data on competing apps are available. Fourth, our data contain relatively

low number of purchases in the mobile channel. For better generalizability of the extent of

spillover across channels, our analysis could be extended to contexts in which a substantial

portion of purchases are made within the app. Fifth, we do not have data on purchases made

through the app vs. mobile browser. Studying the differences between these two mobile sub-

channels is a fruitful future research avenue. Sixth, we are unable to test specific prevention and

recovery strategies for app failures. Mobile apps may be an effective way to recover from the

adverse effects of service failures (Tucker and Yu 2018). Our approach provides a way to
35

identify app-failure sensitive shoppers, but we do not have data on shoppers’ responses to service

recovery to recommend the best mitigation strategy. The strategies we do recommend could be

tested in ethically permissible field studies. Finally, we focus on the short-term impact of a

failure in a causal setting. If data on multiple failures over the long run are available and can be

corrected for endogeneity, researchers can study the long-term implications of multiple failures.
36

References

Ahluwalia R, Unnava HR, Burnkrant RE (2001) The moderating role of commitment on the
spillover effect of marketing communications. J. Marketing Res. 38(4):458–470.
Akca S, Rao A (2020) Value of aggregators. Marketing Sci. 39(5):893–922.
Altonji JG, Elder TE, Taber CR (2005) Selection on observed and unobserved variables:
Assessing the effectiveness of catholic schools. J. Political Econom. 113(1):151–184.
Andreassen TW (1999) What drives customer loyalty with complaint resolution? J. Service Res.
1(4):324–332.
Andrews M, Luo X, Fang Z, Ghose A (2016) Mobile ad effectiveness: Hyper- contextual
targeting with crowdedness. Marketing Sci. 35(2):218–233.
Angrist JD, Pischke JS (2009) Mostly Harmless Econometrics: An Empiricist’s Companion
(Princeton university press, Princeton).
Ansari A, Mela CF, Neslin SA (2008) Customer channel migration. J. Marketing Res. 45(1):60–
76.
Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc. Natl.
Acad. Sci. 113(27):7353– 7360.
Athey S, Imbens G, Pham T, Wager S (2017) Estimating average treatment effects:
Supplementary analyses and remaining challenges. Amer. Econom. Rev. 107(5):278–81.
Avery J, Steenburgh TJ, Deighton J, Caravella M (2012) Adding bricks to clicks: Predicting the
patterns of cross-channel elasticities over time. J. Marketing 76(3):96–111.
Barron’s (2018) Walmart: Can it meet its digital sales growth targets? Accessed November 5,
2020, https://www.barrons.com/articles/walmart-can-it-meet-its-digital-sales-growth-targets-
1519681783.
Bell DR, Gallino S, Moreno A (2018) Offline showrooms in omnichannel retail: Demand and
operational benefits. Management Sci. 64(4):1629– 1651.
Bertrand M, Duflo E, Mullainathan S (2004) How much should we trust differences-in-
differences estimates? Quart. J. Econom. 119(1):249–275.
Bitner MJ, Booms BH, Tetreault MS (1990) The service encounter: diagnosing favorable and
unfavorable incidents. J. Marketing 54(1):71–84.
Blancco (2016) The state of mobile device performance and health: Q2. Accessed November 5,
2020, https://www2.blancco.com/en/research-study/state-of-mobile-device-performance-and-
health-trend-report-q2-2016.
Bolton RN (1998) A dynamic model of the duration of the customer’s relationship with a
continuous service provider: The role of satisfaction. Marketing Sci. 17(1):45–65.
Brynjolfsson E, Hu YJ, Rahman MS (2013) Competing in the Age of Omnichannel Retailing
(MIT Cambridge, MA).
Bugsnag (2020) SDKs should not crash apps — learnings from the Facebook outage. Accessed
July 20, 2021, https://www.bugsnag.com/blog/sdks-should-not-crash-apps.
Chandrashekaran M, Rotte K, Tax SS, Grewal R (2007) Satisfaction strength and customer
loyalty. J. Marketing Res. 44(1):153–163.
Chintagunta PK, Chu J, Cebollada J (2012) Quantifying transaction costs in online/off-line
grocery channel choice. Marketing Sci. 31(1):96–114.
Cleeren K, Dekimpe MG, Helsen K (2008) Weathering product-harm crises. J. Acad. Marketing
Sci. 36(2):262–270.
37

Cleeren K, Van Heerde HJ, Dekimpe MG (2013) Rising from the ashes: How brands and
categories can overcome product-harm crises. J. Marketing 77(2):58–77.
Computerworld (2014) iOS 8 app crash rate falls 25% since release. Accessed November 5, 2020,
https://www.computerworld.com/article/2841794/ios-8-app-crash-rate-falls-25-since-
release.html.
Dimensional Research (2015) Mobile user survey: Failing to meet user expectations. Accessed
November 5, 2020, https://techbeacon.com/resources/survey-mobile-app-users-report-failing-
meet-user-expectations.
Dotzel T, Shankar V, Berry LL (2013) Service innovativeness and firm value. J. Marketing Res.
50(2):259–276.
Fong NM, Fang Z, Luo X (2015) Geo-conquesting: Competitive locational targeting of mobile
promotions. J. Marketing Res. 52(5):726–735.
Forbes LP (2008) When something goes wrong and no one is around: non- internet self-service
technology failure and recovery. J. Services Marketing 22(4): 316–27.
Forbes LP, Kelley SW, Hoffman KD (2005) Typologies of e-commerce retail failures and
recovery strategies. J. Services Marketing 19(5): 280–92.
Forman C, Ghose A, Goldfarb A (2009) Competition between local and electronic markets: How
the benefit of buying online depends on where you live. Management Sci. 55(1):47–57.
Ghose A, Kwon HE, Lee D, Oh W (2019) Seizing the commuting moment: Contextual targeting
based on mobile transportation apps. Inform. Systems Res. 30(1):154–174.
Ghose A, Li B, Liu S (2019) Mobile targeting using customer trajectory patterns. Management
Sci. 65(11):5027–5049.
Gijsenberg MJ, Van Heerde HJ, Verhoef PC (2015) Losses loom longer than gains: Modeling the
impact of service crises on perceived service quality over time. J. Marketing Res. 52(5):642–
656.
Goldfarb A, Tucker C (2011) Online display advertising: Targeting and obtrusiveness. Marketing
Sci. 30(3):389–404.
Google M/A/R/C Study (2013) Mobile in-store research: How in-store shoppers are using mobile
devices. Accessed November 5, 2020,
https://www.thinkwithgoogle.com/_qs/documents/889/mobile-in-store_research-studies.pdf.
Guo T, Sriram S, Manchanda P (2021) The effect of information disclosure on industry payments
to physicians. J. Marketing Res. 58(1):115-140.
Halbheer D, G¨artner DL, Gerstner E, Koenigsberg O (2018) Optimizing service failure and
damage control. Internat. J. Res. Marketing 35(1):100–115.
Hansen N, Kupfer AK, Hennig-Thurau T (2018) Brand crises in the digital age: The short-and
long-term effects of social media firestorms on consumers and brands. Internat. J. Res.
Marketing 35(4):557–574.
Hess Jr RL, Ganesan S, Klein NM (2003) Service failure and recovery: The impact of relationship
factors on customer satisfaction. J Acad. Marketing Sci. 31(2):127–145.
Hoffman KD, Bateson JE (2001) Essentials of services marketing: Concepts, strategies and cases
(South-Western Pub).
Kim SJ, Wang RJH, Malthouse EC (2015) The effects of adopting and using a brand’s mobile
application on customers’ subsequent purchase behavior. J. Interactive Marketing 31:28–41.
Knox G, Van Oest R (2014) Customer complaints and recovery effectiveness: A customer base
approach. J. Marketing 78(5):42–57.
38

Kummer M, Schulte P (2019) When private information settles the bill: Money and privacy in
Google’s market for smartphone applications. Management Sci. 65(8):3470–3494.
Liu Y, Shankar V (2015) The dynamic impact of product-harm crises on brand preference and
advertising effectiveness: An empirical analysis of the automobile industry. Management Sci.
61(10):2514-2535.
Ma L, Sun B, Kekre S (2015) The squeaky wheel gets the grease—an empirical analysis of
customer voice and firm intervention on twitter. Marketing Sci. 34(5):627–645.
McCollough MA, Berry LL, Yadav MS (2000) An empirical investigation of customer
satisfaction after service failure and recovery. J. Service Res. 3(2):121–137.
Meuter ML, Ostrom AL, Roundtree RI, Bitner MJ (2000) Self-service technologies:
Understanding customer satisfaction with technology-based service encounters. J. Marketing
64(3):50–64.
Narang U, Shankar V (2019) Mobile app introduction and online and offline purchases and
product returns. Marketing Sci. 38(5):756–772.
National Retail Federation (2018) Top 100 retailers 2018. Accessed November 5, 2020,
https://nrf.com/resources/top-retailers/top-100-retailers/top-100-retailers-2018.
Neumann N, Tucker CE, Whitfield T (2019) Frontiers: How effective is third- party consumer
profiling? Evidence from field studies. Marketing Sci. 38(6):918–926.
Oliver RL (1980) A cognitive model of the antecedents and consequences of satisfaction
decisions. J. Marketing Res. 17(4):460–469.
Pauwels K, Neslin SA (2015) Building with bricks and mortar: The revenue impact of opening
physical stores in a multichannel environment. J. Retail 91(2):182–197.
Retail Dive (2020) Retailers see a 36% increase in mobile app downloads, and 54% growth in in-
app purchases during COVID. Accessed June 29, 2021, https://tinyurl.com/wc6bf7kc.
Schmittlein DC, Morrison DG, Colombo R (1987) Counting your customers: Who are they and
what will they do next? Management Sci. 33(1):1–24.
Shi S, Kalyanam K, Wedel M (2017) What does agile and lean mean for customers? An analysis
of mobile app crashes. Working paper, Santa Clara University, Santa Clara.
Smith AK, Bolton RN (1998) An experimental investigation of customer reactions to service
failure and recovery encounters: Paradox or peril? J. Service Res. 1(1):65–81.
Sriram S, Chintagunta PK, Manchanda P (2015) Service quality variability and termination
behavior. Management Sci. 61(11):2739–2759.
Tax SS, Brown SW, Chandrashekaran M (1998) Customer evaluations of service complaint
experiences: implications for relationship marketing. J. Marketing 62(2):60–76.
Tucker CE, Yu S (2019) Does it lead to more equal treatment? An empirical study of the effect of
smartphone use on customer complaint resolution. Working paper, Massachusetts Institute of
Technology, Boston.
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using
random forests. J. Amer. Statist. Assoc. 113(523):1228–1242.
Wang K, Goldfarb A (2017) Can offline stores drive online sales? J. Marketing Res. 54(5):706–
719.
Xu K, Chan J, Ghose A, Han SP (2017) Battle of the channels: The impact of tablets on digital
commerce. Management Sci. 63(5):1469–1492.
39

Table 1. Summary Statistics


Variable Mean Std. dev.
Frequency of purchases 0.82 1.34
Quantity of purchases 1.61 3.32
Value of purchases ($) 43.31 96.42
App failure/Failure experiencer 0.52 0.50
Recency of past purchases (in days) -45.68 68.83
Value of past purchases ($) 629.60 699.38
Frequency of past online purchases 0.66 1.97
Notes: These statistics of the variables are over pre- and post- 14 days of the failure. The past purchases are computed over a one-
year period. N = 273,378.

Table 2. Model-Free Evidence: Means of Outcome Variables for Treated and Control Groups
Variable Treated Treated Control Control
pre post pre post
period period period period
Frequency of purchases 0.74 0.89 0.75 0.93
Quantity of purchases 1.52 1.69 1.52 1.76
Value of purchases ($) 30.41 55.28 30.75 57.70
Frequency of purchases – Online 0.03 0.04 0.03 0.04
Quantity of purchases – Online 0.05 0.06 0.05 0.07
Value of purchases – Online ($) 1.34 2.93 1.50 3.17
Frequency of purchases – Offline 0.70 0.85 0.71 0.88
Quantity of purchases – Offline 1.47 1.63 1.47 1.69
Value of purchases – Offline ($) 29.07 52.35 29.25 54.53
Notes: These statistics are based on pre- and post- 14 days of the failures. N = 273,378.

Table 3. DID Model Results of Failure Shock for Purchases Across Channels
Variable Frequency of Quantity of Value of
purchases purchases purchases
Failure experiencer -0.024** -0.057** -2.181**
x Post shock (DID) (0.008) (0.020) (0.681)
Intercept 0.740*** 1.508*** 30.40***
(0.003) (0.005) (0.15)
R squared 0.004 0.001 0.018
Effect size -4.67% -4.93% -9.04%
Mean Y 0.82 1.61 43.31
Shopper fixed
effects YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 273,378.
40

Table 4. DID Model Results of Failure Shock for Purchases by Channel


Offline Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases Purchases purchases purchases Purchases
Failure
experiencer x
Post shock -0.022** -0.054** -2.088** -0.001 -0.002 -0.093
(DID) (0.008) (0.019) (0.660) (0.002) (0.003) (0.154)
Intercept 0.705*** 1.457*** 28.983*** 0.034*** 0.051*** 1.413***
(0.002) (0.005) (0.660) (0.0001) (0.001) (0.038)
R squared 0.0038 0.0001 0.0169 0.0016 0.0002 0.0016
Effect size -3.08% -3.74% -7.14% - - -
Mean Y 0.78 1.56 41.08 0.04 0.06 2.23
Shopper fixed
effects YES YES YES YES YES YES
Time fixed
effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 273,378.
Table 5. DID Model Results for Failures Occurring on Purchase and Non-Purchase Related Pages
Failure on purchase related page Failure on non-purchase related page
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure
experiencer x
Post shock 0.0003 -0.016 0.907 -0.053*** -0.108*** -4.627***
(DID) (0.013) (0.038) (1.195) (0.009) (0.022) (0.762)
Intercept 0.747*** 1.520*** 30.711*** 0.734*** 1.493*** 30.282***
(0.003) (0.007) (0.226) (0.002) (0.012) (0.497)
R squared 0.004 0.001 0.019 0.004 0.001 0.018
Mean Y 0.836 1.637 44.270 0.813 1.591 42.850
Shopper fixed
effects YES YES YES YES YES YES
Time fixed
effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 160,662 for failure on purchase related page. N= 217,418 for failure on non-purchase related page.

Table 6. DID Model Results for Value of Purchases and Basket Size by Channel for Shoppers Close to a Store
(< 2 Miles) at the Time of Failure
Offline Online
Variable Value of Basket size Value of Basket size
purchases purchases
Failure experiencer x 13.542* 0.134* 0.885 0.023
Post shock (DID) (5.306) (0.058) (1.178) (0.020)
Intercept 33.413*** 0.870*** 1.818*** 0.055***
(1.272) (0.014) (0.308) (0.005)
R squared 0.0395 0.0064 0.0027 0.0012
Mean Y 55.00 0.95 3.18 0.07
Shopper fixed effects YES YES YES YES
Time fixed effects YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. Two miles is the median distance from the retailer’s nearest store at the time of failure. N = 6,572.
41

Table 7. Contrast Analysis Based on Distance to Store at the Time of Failure for Failure Experiencers
Variable Offline value of Offline basket
purchases size
Close to store x Post 14.130* 0.083
shock (5.707) (0.065)
R squared 0.0432 0.0064
Mean Y 54.96 0.98
Shopper fixed effects YES YES
Time fixed effects YES YES
Note: Closeness to store is defined using the median distance of 2 miles. There are 1,298 failure-experiencers within 2 miles of
the store at the time of failure and 1,527 failure-experiences who are 2 miles or farther from the store among those who opt-in for
location sharing. *** p < 0.001, ** p < 0.01, * p < 0.05. N = 5,650.

Table 8. DID Model Results of Failure Shock for Purchases Across Channels:
Moderating Effects of Relationship with Retailer and Past Online Purchase Frequency
Variable Frequency of Quantity of Value of
purchases purchases purchases
Failure experiencer x Post shock (DID) -0.208*** -0.373*** -15.113***
(0.001) (0.031) (1.058)
DID x Past value of purchases 0.0001*** 0.0002*** 0.021***
(0.000) (0.000) (0.001)
DID x Recency of purchases -0.001*** -0.003*** -0.015**
(0.000) (0.000) (0.005)
DID x Past online purchase frequency -0.015** -0.017 -1.360***
(0.004) (0.012) (0.282)
Intercept 0.739*** 1.508*** 30.396***
(0.002) (0.004) (0.169)
R squared 0.0075 0.0006 0.0348
Mean Y 0.84 1.64 44.07
Shopper fixed effects YES YES YES
Time fixed effects YES YES YES
Notes: DID = Difference-in-Differences. Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p <
0.01, * p < 0.05. N = 273,378.

Table 9. Causal Forest Results: Summary of Individual Shopper Treatment Effect for Value of Purchases
Ntest Mean SD
τ 45,563 -1.660 1.136
τ| τ < 0 43,748 -1.739 1.089
τ| τ > 0 1,815 0.239 0.198
Note: 𝜏 represents the estimated Conditional Average Treatment Effect (CATE) for each individual in the test data.
42

Table 10. Results of Causal Forest Post-hoc CATE Regression for Value of Purchases
Variable Coefficient (Standard Error)
Intercept -0.958***(0.012)
Past value of purchases 0.000***(0.000)
Recency of purchases -0.005***(0.000)
Past online purchase frequency 0.037***(0.002)
Gender (female) -0.190***(0.008)
Loyalty program -0.340***(0.011)
R squared 0.493
Note: *** p < 0.001. N = 45,563.

Table 11. Results for November Failure: Failure Shock and Purchases Across Channels
Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure experiencer x -0.083*** -0.138*** -5.460***
Post shock (DID) (0.009) (0.023) (1.059)
Intercept 1.005*** 2.406*** 79.357***
(0.002) (0.006) (0.262)
R squared 0.0056 0.0004 0.0024
Mean Y 1.104 2.496 87.920
Shopper fixed effects YES YES YES
Time fixed effects YES YES YES
Notes: DID = Difference-in-Differences. Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p <
0.01, * p < 0.05. N = 286,976.

Table 12. Results for November Failure: Failure Shock and Purchases by Channel
(1) Offline (2) Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure
experiencer x
Post shock -0.082*** -0.138*** -5.448*** -0.001 0.000 -0.012
(DID) (0.008) (0.023) (1.036) (0.002) (0.003) (0.205)
Intercept 0.968*** 2.360*** 76.697*** 0.035*** 0.044*** 2.528***
(0.002) (0.006) (0.257) (0.001) (0.002) (0.100)
R squared 0.0059 0.0004 0.0024 0.0000 0.0000 0.0000
Mean Y 1.069 2.450 85.270 0.210 0.298 13.760
Shopper fixed
effects YES YES YES YES YES YES
Time fixed
effects YES YES YES YES YES YES
Notes: DID = Difference-in-Differences. Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p <
0.01, * p < 0.05. N = 286,976.
43

Figure 1. App Screenshots

Figure 2 . Comparison of Failure-Experiencers’ and Non-Experiencers’ Purchases 14 Days before Failure

Note: The red line represents failure experiencers, while the solid black line represents the failure non-experiencers.

Figure 3. Comparison of Failure-Experiencers and Non-Experiencers

Note: Loyalty program level represents whether shoppers were enrolled (=1) or not (=0) in an advanced reward program.
44

Figure 4. Causal Forest Results: Individual CATE

Figure 5. Causal Forest Results: Quintiles By CATE

Note: Segment 1 represents shoppers most adversely affected by failure and Segment 5 represents those least adversely affected.

Figure 6. App Failure Effects on Value of Purchases Over Four Weeks


45

Figure 7. Retailer’s Revenue Loss by Percentile of Shoppers Experiencing App Failure

Note: CATE = Conditional Average Treatment Effect.


i

Web Appendix A
Checks for Exogeneity of Failure

Table A1. Robustness of Table 3 Results to Propensity Score Matching Estimates


Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure experiencer -0.026** -0.067** -2.414**
x Post shock (DID) (0.008) (0.020) (0.699)
Intercept
0.747*** 1.521*** 30.776***
(0.002) (0.005) (0.175)
R squared 0.003 0.001 0.016
Mean Y 0.84 1.64 44.07
Shopper fixed effect YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 252,756.
ii

Figure A1. Pre-Period Purchase Trends for Failure Experiencers and Non-Experiencers

(a) Past Frequency of Purchases

(b) Past Quantity of Purchases

(c) Past Proportion of Online Purchases

Note: The unit of X axis is number of days before the failure event.
iii

Web Appendix B
Robustness Check for Table 3 (Main Treatment Effect) Results

In this section, we present the results for robustness checks for the main estimation in Table 3
relating to: (a) alternative models with covariates and using Poisson model (Tables B1-B2), (b)
outliers (Table B3), (c) existing shoppers (Table B4), (d) alternative measures for prior use of
digital channels (Table B5), (e) regression-discontinuity style analysis (Table B6), and (f)
falsification/placebo checks (Tables B7 and B8).

Table B1. Robustness of Table 3 Results to Inclusion of Covariates Across Channels


Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure
experiencer x
Post shock -0.024* -0.057* -2.181**
(DID) (0.010) (0.025) (0.731)
Failure -0.021** -0.030 -0.694
experiencer (0.007) (0.018) (0.517)
Gender -0.009 -0.006 -0.444
(0.007) (0.018) (0.522)
Loyalty 0.009 0.022 0.800
program (0.006) (0.015) (0.42)
Intercept 0.745*** 1.508*** 30.222***
(0.007) (0.017) (0.492)
R squared 0.004 0.001 0.018
Mean Y 0.82 1.61 43.31
Notes: Robust standard errors clustered by shoppers are in parentheses. Time fixed effects are included; *** p < 0.001, ** p <
0.01, * p < 0.05. DID = Difference-in-Differences. N = 273,378.

Table B2. DID Poisson Model Results Across Channels


Variable Frequency of Quantity of
purchases purchases
Failure experiencer x -0.021* -0.031*
Post shock (DID) (0.010) (0.012)
Log pseudo-likelihood -92,658.71 -167,663.92
Mean Y 0.82 1.61
Shopper fixed effects YES YES
Time fixed effects YES YES
Notes: Robust standard errors in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-in-Differences. N =
273,378.
iv

Table B3. Robustness of Table 3 Results to Outlier Spenders


Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure experiencer x Post -0.023* -0.055* -2.139**
shock (DID) (0.008) (0.019) (0.677)
Intercept 0.728*** 1.472*** 29.524***
(0.002) (0.005) (0.169)
R squared 0.0043 0.0013 0.0192
Mean Y 0.81 1.59 42.69
Shopper fixed effect YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 272,706.

Table B4. Robustness of Table 3 Results to Existing Shoppers Across Channels


Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure experiencer -0.025** -0.061** -2.283**
x Post shock (DID) (0.001) (0.020) (0.693)
Intercept 0.756*** 1.541*** 31.061***
(0.002) (0.005) (0.173)
R squared 0.0038 0.0010 0.0181
Mean Y 0.84 1.64 44.07
Shopper fixed effect YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001. DID = Difference-in-Differences. N =
267,534.

Table B5. Robustness of Table 3 Results to Alternative Measures of Digital Channel Use
Based on Past Online Purchase (or Not) Before Failure
Variable Frequency of Quantity of Value of
purchases purchases purchases
Failure experiencer x Post shock (DID) -0.193*** -0.389*** -12.935***
(0.012) (0.03) (0.879)
DID x Value of past purchases 0.000*** 0.000*** 0.017***
(0.000) (0.000) (0.001)
DID x Recency of purchases -0.001*** -0.003*** -0.022***
(0.000) (0.000) (0.006)
DID x Past online buyer or not -0.019*** -0.029*** -1.344***
(0.003) (0.007) (0.220)
Intercept 0.640*** 1.141*** 25.034***
(0.006) (0.015) (0.446)
R squared 0.1587 0.1217 0.0932
Mean Y 0.84 1.64 44.07
Shopper fixed effect YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; Each moderator interacts with the difference-in-
differences (DID) term failure experiencers x post shock; *** p < 0.001. The observations include those of shoppers with at least
one purchase in the past for computing recency. N = 267,534.
v

Table B6. Robustness of Table 3 Results to Regression Discontinuity Style Analysis


Variable Frequency of Quantity of Value of
purchases purchases purchases
Failure experiencer x Post shock (DID) -0.045*** -0.09** -3.169**
(0.012) (0.03) (1.112)
Intercept 0.725*** 1.478*** 30.218***
(0.002) (0.005) (0.195)
R squared 0.0031 0.0007 0.0160
Mean Y 0.80 1.56 42.07
Shopper fixed effects YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001. DID = Difference-in-Differences. N =
198,432.

Table B7. Falsification/Placebo Check (Re-assigned Treatment) for Failure Shock and Purchases Across Channels
Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure experiencer x -0.000 -0.009 0.399
Post shock (DID) (0.166) (0.019) (0.680)
Intercept 0.740*** 1.508*** 30.397
(0.002) (0.004) (0.170)
R squared 0.0038 0.0010 0.018
Shopper fixed effects YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 273,378.

Table B8. Falsification/Placebo Check (Re-assigned Timing) for Failure Shock and Purchases Across Channels
Variable Frequency Quantity of Value of
of purchases purchases purchases
Failure experiencer x 0.0003 0.005 -0.928
Post shock (DID) (0.018) (0.021) (0.660)
Intercept 0.895*** 1.815*** 38.026***
(0.002) (0.005) (0.164)
R squared 0.000 0.0005 0.0003
Shopper fixed effects YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 273,378.
vi

Web Appendix C
Robustness Check for Table 4 (By Channel) Results

In this section, we present the results for robustness checks for the cross-channel estimation in
Table 4 relating to (a) alternative models with covariates and using Poisson model (Tables C1-
C2), (b) outliers (Table C3), (c) existing shoppers (Table C4), (d) alternative measures for prior
use of digital channels (Table C5), and (e) regression-discontinuity style analysis (Table C6), and
(f) placebo falsification checks (Tables C7 and C8).

Table C1. Robustness of Table 4 Results to Inclusion of Covariates by Channel


Offline Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure
experiencer x
Post shock -0.022* -0.055* -2.088** -0.001 -0.003 -0.093
(DID) (0.010) (0.025) (0.709) (0.002) (0.004) (0.158)
Failure -0.018* -0.025 -0.527 -0.003* -0.005 -0.167
experiencer (0.007) (0.018) (0.501) (0.001) (0.003) (0.112)
Post shock 0.170*** 0.221*** 25.275*** 0.009*** 0.015*** 1.672***
(0.007) (0.018) (0.509) (0.001) (0.003) (0.113)
Gender -0.007 -0.003 -0.410 -0.002 -0.003 -0.033
(0.007) (0.018) (0.506) (0.001) (0.003) (0.113)
Loyalty 0.011 0.026 0.889* -0.002 -0.004 -0.089
program (0.006) (0.014) (0.407) (0.001) (0.002) (0.091)
Intercept 0.707*** 1.451*** 28.651*** 0.037*** 0.057*** 1.571***
(0.007) (0.017) (0.477) (0.001) (0.003) (0.106)
R squared 0.0040 0.0011 0.0169 0.0003 0.0003 0.0016
Mean Y 0.78 1.56 41.08 0.04 0.06 2.23
Notes: Robust standard errors clustered by shoppers are in parentheses. Time fixed effects are included; *** p < 0.001, ** p <
0.01. DID = Difference-in-Differences. N = 273,378.

Table C2. DID Poisson Model Results by Channel


Offline Online
Variable Frequency Quantity of Frequency Quantity of
of purchases purchases of purchases purchases

Failure experiencer x -0.020* -0.031* -0.019 -0.018


Post shock (DID) (0.010) (0.012) (0.043) (0.055)
Log pseudo-likelihood -88,907.82 -162,972.62 -6,395.99 -9,303.36
Mean Y 0.78 1.56 0.04 0.06
Shopper fixed effects YES YES YES YES
Time fixed effects YES YES YES YES
Notes: Robust standard errors in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-in-Differences. N =
273,378.
vii

Table C3. Robustness of Table 4 Results to Outlier Spenders


Offline Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure
experiencer x
Post shock -0.021* -0.053* -2.068** -0.001 -0.002 -0.071
(DID) (0.007) (0.019) (0.656) (0.002) (0.003) (0.152)
Intercept 0.694*** 1.423*** 28.185*** 0.033*** 0.049*** 1.339***
(0.002) (0.005) (0.3164) (0.001) (0.002) (0.038)
R squared 0.0041 0.0012 0.0179 0.0003 0.0000 0.0017
Mean Y 0.78 1.53 40.51 0.04 0.06 2.18
Shopper fixed
effects YES YES YES YES YES YES
Time fixed
effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, * p < 0.05. DID = Difference-in-
Differences. N = 272,706.

Table C4. Robustness of Table 4 Results to Existing Shoppers by Channel


Offline Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure
experiencer x
Post shock -0.024** -0.059** -2.166** -0.002 -0.003 -0.117
(DID) (0.007) (0.019) (0.672) (0.002) (0.003) (0.156)
Intercept 0.721*** 1.489*** 29.616*** 0.035*** 0.052*** 1.449***
(0.002) (0.005) (0.168) (0.000) (0.001) (0.039)
R squared 0.0037 0.0009 0.0169 0.0003 0.0002 0.0016
Mean Y 0.80 1.58 41.81 0.04 0.06 2.26
Shopper fixed
effects YES YES YES YES YES YES
Time fixed
effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, * p < 0.05. DID = Difference-in-
Differences. N = 267,534.
viii

Table C5. Robustness of Table 4 Results to Alternative Measure of Digital Channel Use Based on Past Online
Purchase (or Not) Before Failure
Offline Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases Purchases purchases purchases purchases
Failure
experiencer x
Post shock -0.186*** -0.373*** -12.123*** -0.007** -0.015** -0.812***
(DID) (0.011) (0.029) (0.852) (0.002) (0.005) (0.195)
DID x Value
of past 0.000*** 0.000*** 0.016*** 0.000*** 0.000*** 0.001***
purchases (0.000) (0.000) (0.001) (0.000) (0.000) (0.000)
DID x
Recency of -0.001*** -0.003*** -0.017** 0.000*** 0.000*** -0.005***
purchases (0.000) (0.000) (0.006) (0.000) (0.000) (0.001)
DID x Past -0.011*** -0.017* -1.089*** -0.008*** -0.012*** -0.255***
online buyer (0.003) (0.007) (0.213) (0.001) (0.001) (0.049)
Intercept 0.616*** 1.112*** 24.138*** 0.024*** 0.029*** 0.896***
(0.006) (0.015) (0.432) (0.001) (0.002) (0.099)
R squared 0.1584 0.1207 0.0927 0.0868 0.0647 0.0245
Mean Y 0.80 1.58 41.81 0.04 0.06 2.26
Shopper
fixed effect YES YES YES YES YES YES
Time fixed
effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001. DID = Difference-in-Differences. The
observations include those of shoppers with at least one purchase in the past for computing recency. N = 267,534.

Table C6. Robustness of Table 4 Results to Regression Discontinuity Style Analysis


Offline Online
Variable Frequency of Quantity of Value of Frequency Quantity of Value of
purchases purchases purchases of purchases purchases purchases
Failure experiencer -0.044*** -0.087** -3.105** -0.001 -0.003 -0.064
x Post shock (DID) (0.012) (0.030) (1.074) (0.003) (0.006) (0.253)
Intercept 0.692*** 1.429*** 28.841*** 0.033*** 0.048*** 1.376***
(0.002) (0.006) (0.189) (0.000) (0.001) (0.044)
R squared 0.0030 0.0006 0.0150 0.0001 0.0002 0.0014
Mean Y 0.76 1.51 39.94 0.04 0.05 2.12
Shopper fixed
effects YES YES YES YES YES YES
Time fixed effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, * p < 0.10. DID = Difference-in-
Differences. N = 198,432.
ix

Table C7. Falsification/Placebo Check (Re-assigned Treatment) for Failure Shock and Purchases by Channel
(1) Offline (2) Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure experiencer -0.001 -0.012 -0.349 0.001 0.003 0 .050
x Post shock (DID) (0.008) (0.019) (0.659) (0.002) (0.003) (0.153)
Intercept 0.705*** 1.457*** 28.983 0.034*** 0.51*** 1.414***
(0.002) (0.005) (0.165) (0.0004) (0.001) (0.038)
R squared 0.0037 0.0009 0.0168 0.0003 0.0002 0.0015
Shopper fixed
effects YES YES YES YES YES YES
Time fixed effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 273,378.

Table C8. Falsification/Placebo Check (Re-assigned Timing) for Failure Shock and Purchases by Channel
(1) Offline (2) Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure experiencer -0.0009 0.003 -0.916 0.001 0.001 -0.012
x Post shock (DID) (0.008) (0.021) (0.640) (0.002) (0.003) (0.151)
Intercept 0.858*** 1.762 36.510*** 0.036*** 0.052*** 1.515***
(0.002) (0.005) (0.159) (0.0004) (0.001) (0.038)
R squared 0.0001 0.0006 0.0045 0.0001 0.0001 0.0012
Shopper fixed
effects YES YES YES YES YES YES
Time fixed effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. N = 273,378.

Table C9. Robustness of Table 4 Results to Outlier Spenders


Offline Online
Variable Frequency of Quantity of Value of Frequency of Quantity of Value of
purchases purchases purchases purchases purchases purchases
Failure
experiencer x
Post shock -0.021* -0.053* -2.068** -0.001 -0.002 -0.071
(DID) (0.007) (0.019) (0.656) (0.002) (0.003) (0.152)
Intercept 0.694*** 1.423*** 28.185*** 0.033*** 0.049*** 1.339***
(0.002) (0.005) (0.3164) (0.001) (0.002) (0.038)
R squared 0.0041 0.0012 0.0179 0.0003 0.0000 0.0017
Mean Y 0.78 1.53 40.51 0.04 0.06 2.18
Shopper fixed
effects YES YES YES YES YES YES
Time fixed
effects YES YES YES YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, * p < 0.05. DID = Difference-in-
Differences. N = 272,706.
x

Web Appendix D
Other Robustness Checks

Table D1. Effects of App Failure on Average Value of Purchases Each Week
Variable Estimate
(Standard error)
Treat x Week 0 -0.52 (0.34)
Treat x Week 1 -1.40** (0.54)
Treat x Week 2 -0.28 (0.27)
Treat x Week 3 -0.23 (0.26)
Treat x Week 4 -0.33* (0.25)
Intercept 13.19*** (0.09)
Mean Y 15.92
Shopper fixed effects YES
Time fixed effects YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. N = 1,366,890.
DID = Difference-in-Differences.

Table D2. Results of DID Model with Stacked Online and Offline Purchases and Channel Dummies
Variable Frequency of Quantity of Value of
purchases purchases purchases
Failure experiencer x Post shock (DID) -0.001 -0.003 -0.093
(.002) (0.003) (0.154)
DID x Channel dummy -0.021** -0.052** -1.994**
(0.008) (0.019) (0.678)
Post shock x Channel dummy 0.161*** 0.206*** 23.602***
(0.006) (0.014) (0.495)
Intercept 0.037*** 0.754*** 15.198***
(0.001) (0.002) (0.085)
R squared 0.0636 0.0362 0.0644
Mean Y 0.82 1.61 43.31
Shopper fixed effects YES YES YES
Time fixed effects YES YES YES
Notes: Robust standard errors clustered by shoppers are in parentheses; *** p < 0.001, ** p < 0.01, * p < 0.05. DID = Difference-
in-Differences. Channel dummy is 1 for offline purchases and 0 for online purchases. N = 546,756.

You might also like