Video Content Marketing The Making of Clips

© 2018, American Marketing Association
Journal of Marketing
PrePrint, Unedited
All rights reserved. Cannot be reprinted without the express
permission of the American Marketing Association.
VIDEO CONTENT MARKETING: THE MAKING OF CLIPS
Xuan Liu1, Savannah Shi2, Thales Teixeira3, and Michel Wedel4
April 3, 2018
1
Senior Data Scientist, Netflix, Netflix Corporate Headquarters, 100 Winchester Circle, Los
Gatos, CA 95032, alexl@netflix.com.
2
Assistant Professor of Marketing, Leavey School of Business, Santa Clara University, 500 El
Camino Real, Santa Clara, CA95053, Tel. (408)554-4798, wshi@scu.edu
3
Lumry Family Associate Professor, Harvard Business School, Harvard University, Boston, MA
02163, USA, Tel. (617) 495-6125, tteixeira@hbs.edu.
4
Pepsico Professor of Consumer Science, Robert H. Smith School of Business, University of
Maryland, College Park, MD 20742, USA, Tel. (301) 405-2162,
mwedel@rhsmith.umd.edu.
Acknowledgement
The order of authors is alphabetical. The authors thank nViso for data collection and processing
by the web-based face tracking system, and Netflix for running the field experiment. This study
was supported by Robert H. Smith School of Business, Harvard Business School, and Leavey
School of Business.
1
VIDEO CONTENT MARKETING: THE MAKING OF CLIPS
ABSTRACT
Consumers have an increasingly wide variety of options available to entertain themselves. This
poses a challenge for content aggregators who want to effectively promote their video content
online via the original trailers of movies, sitcoms, and video games. Marketers are now seeking
to produce much shorter video clips to promote their content on a variety of digital channels.
This research is the first to propose an approach to produce such clips and to study their
effectiveness, focusing on comedy movies as an application. Web-based facial-expression is
used to study viewers’ real-time emotional responses when watching comedy movie trailers
online. These data are used to predict viewers’ intentions to watch the movie and its box office
success. The authors then propose an optimization procedure for cutting scenes from trailers to
produce clips and test it in an online experiment and in a field experiment. The results provide
evidence that the production of short clips using the proposed methodology can be an effective
tool to market movies and other online content.
Keywords: Video Content Marketing, Trailers, Clips, Emotions, Facial-Expression Tracking

2
The Internet has drastically reduced barriers to the distribution of video content. This has
caused an unprecedented proliferation of sitcoms, scripted series, documentaries, and long- and
short-format movies. Online content aggregators are making this vast array of video material
readily available to consumers for on-demand streaming. For short-format user-generated video,
there is YouTube. For video games, there is Twitch. For broadcast and cable shows, there is
Hulu. And for movies and web series, there are Netflix and Amazon.
Given that consumers have such a wide variety of options available to entertain
themselves, a challenge for online content aggregators is how to effectively promote their video
content. Synopses, critics’ reviews, and viewer ratings are important, but the best way for a
consumer to evaluate the quality of video content and to determine if she wants to see it, is for
her to watch a sample. For that reason, video content producers have historically used trailers as
their main marketing tool. This started around 1920 when movie theatres produced snippets of
upcoming films with simple text overlays and showed these “trailing” a feature film to entice
viewers to return to the theatre. The National Screen Service, a company that wrote scripts and
produced trailers on behalf of movie studios, was founded soon thereafter. It developed a
template for trailer design that included a montage and music, and held a monopoly over the
creation and distribution of movie trailers that lasted into the 1950s, when more competitors
entered the market. Movie trailer production has evolved into an industry with dozens of
independent production houses charging upwards of $500,000 for a trailer (Last 2004).
Nowadays there are trailers not only for movies, but also for sitcoms, for video games, and even
for books. These trailers are typically two- to three-minute videos created by selecting and
editing scenes from the original video content, and adding music and other sound effects. Their
purpose is to elicit a sample of the emotions that viewers will experience when watching the full
3
content (Kerman 2004). At least a dozen websites are uniquely devoted to showing trailers (e.g.,
traileraddict.com, booktrailersforreaders.com, IMDb.com, comingsoon.net), and trailers
themselves have become some of the most popular forms of entertainment on the web.
But, given consumers’ ever-shorter attention spans, the original trailers for movies,
sitcoms, and video games are becoming less effective marketing tools on some digital channels,
particularly those that do not support sound (for example, email and social media). Therefore,
online aggregators have cut the trailers that they obtain from production studios down to “clips”
of 30, 20, or sometimes as little as 10 seconds. The newfound marketing problem for content
aggregators has become not one of creating promotional trailers, but rather one of editing down
the trailer content provided by trailer production studios into formats that are suitable for digital
marketing channels. However, according to a manager at Netflix, “The current approach of
providing just the first few seconds of a trailer for online viewing is not always effective.” Thus,
marketers need better tools to produce these short clips from original trailers. Yet, despite its
importance, there is no academic research that assists in the development of these tools, or that
even helps to understand if and how these short clips can induce consumers to experience a
sample of emotions and, ultimately, watch the full content.
This paper is the first, to our knowledge, to look at how marketers of video-based content
should edit trailers to produce (shorter) clips that help consumers decide whether to watch the
content. While marketing departments at content aggregators often have limited control over the
specific content of the trailer, they do have control over which scenes to select from the trailer in
producing a shorter clip. Our conceptual framework and methodology are therefore founded on
the notion that the scene is the basic semantic building block of video content and also the
elementary unit for the production of trailers and clips.

4
To illustrate our method, we focus on the creation of clips for comedy movies as an
application. We collected and analyzed four different types of data and propose an optimization
algorithm for clip production. First, in an online facial-expression-tracking experiment, viewers
were shown movie trailers and their intentions to watch the full movie were measured. These
data were used to calibrate a model that explains viewing preferences based on the audio-visual
scene structure of trailers, as well as the real-time emotional responses evoked from each viewer.
Second, we collected information on various ratings and box office sales of the movies in
question. This allows us to validate in-market the role of the scene structure, under the control of
the marketer, and assess the intermediary role of the emotions evoked. We account for content-
specific effects by controlling for movie ratings and variations of trailers for the same movie (we
do not code and insert content variables because the online aggregator marketer has no control
over them: instead, we measure consumers’ emotional response to that content). Third, after
understanding how the scene structure is associated with the emotional response, intention to
watch, and box office sales, we optimized the editing of trailers to produce short film clips for
use on digital channels in which sound is (e.g., IMDb) or is not (e.g., Facebook) supported.
Fourth, we validated the proposed approach in an online experiment, as well as in a large-scale
field experiment with one of the world’s largest video content aggregators. We show that (a) it is
superior to the currently used heuristic approach for producing clips and (b) it can be automated
and, thus, is scalable.
Online content aggregators can apply our methodology to design clips that deliver an
optimal emotional experience and consequently induce higher watching intentions and sales for
their video content. The proposed framework applies not only to the comedy movies / trailers /
clips that we examined as a prototypical case, but to all movie genres, and more generally to
5
other types of digital content (news, books, TV shows, etc.) that are being marketed via clips.
Apart from contributing to the literature on content marketing and movie marketing in particular
(Eliashberg, Hui, and Zhang 2007; Faber and O’Guinn 1984; Litman 1983), this paper also aims
to contribute to the literature on online advertising (Teixeira, Wedel, and Pieters 2012), and fits
the recent trend, in practice, toward shorter advertising messages.1
This paper is structured as follows. First, we review the relevant literature and provide a
general conceptual framework for how the scene structure of trailers affects consumers’ viewing
experience and watching intentions. Then we develop an empirical model of watching intentions
and propose an optimization tool to produce clips. Subsequently, we describe the empirical
results of the model estimation for data on comedy movies and the prediction of their box office
success. We then show the managerial implications of optimal clip production by testing the
optimal clips through simulation, an online experiment, and a large-scale field experiment.
Lastly, we discuss the insights obtained and the potential usage of our tools by marketers of
online content, and reflect on future developments regarding automation and personalization.
LITERATURE REVIEW AND CONCEPTUAL FRAMEWORK
The movie industry and its box office performance has seen ample research in marketing.
Litman (1983) showed the relationships between box office success and determinants such as
time of release, distributor, movie genre, production costs, and Academy Awards. Faber and
O’Guinn (1984) then confirmed that the effect of movie previews and movie excerpts (such as
trailers) on movie-going behavior is stronger than the effects of word of mouth and critics’
reviews. Reviews from critics were shown to play a role by Eliashberg and Shugan (1997).
6
Sharda and Delen (2006) showed that the success of a movie is determined by the number of
screens on which the movie is shown during the initial launch and the stars featured in the movie.
Eliashberg, Hui, and Zhang (2007) demonstrated further that the scripts of trailers could be used
to forecast a movie’s return on investment. More recently, Boksem and Smidts (2015) showed,
using electroencephalography (EEG) measures, that emotions are important predictors of movie
preferences and box office success. Despite significant research being conducted to explain the
causes and drivers of successful movies, less work has been done on movie marketing per se.
Since around 50 percent of a major Hollywood studio’s movie budget is spent on marketing,
whereas the other half goes into production, this gap in the literature is rather puzzling.
Movie marketing is a big business today. According to a report by Statista, global cinema
advertising spending was $2.7 billion in 2015 and is expected to total $3.3 billion in 2020. The
main tool for movie and other video-based content marketers is currently the trailer. By
including scenes from the movie that elicit a sample of the emotions that viewers will experience
while watching the full movie, a trailer allows viewers to form expectations of the experience of
watching the entire film (Kerman 2004). Trailer music is used to support the emotional
experience elicited by the montage, but the majority of trailers use “library” music (Shannon-
Jones 2011) that is not from the movie’s soundtrack itself. Indeed, movie trailers have been
shown to be the most influential factor to impact consumers' intentions to watch a movie (Faber
and O’Guinn 1984).
Unfortunately, trailers for movies and other video content are no longer always directly
useful in their original format as marketing tools for online content aggregators and distributors,
given concerns over consumers’ ever-shorter attention spans. The present research focuses on the
scene as the elementary building block of trailers and as the basis for creating short “clips” for
7
online marketing. Figure 1 visualizes our conceptual framework. It reflects the fact that scenes
are the basic audio-visual building blocks of video content, and that the creative design of a
trailer involves a montage of selected scenes from that content. Marketers at online streaming
services have no influence on the plot, narrative, or script of the trailer, but rather need to edit
and cut scenes to produce an effective promotional clip.
Our framework in Figure 1, which applies not only to movies but generally to other
categories of video content as well, is therefore based on the recognition that the problem of
modern-day online content marketers is one of editing down content, as opposed to producing it
from scratch. The cutting of scenes from the trailer produces a clip that retains important
elements of the movie’s emotional experience, which aims to generate a positive intention to
watch the full content as a response to viewing the clip. Practitioners tend to view trailer
“cutting” more as an art than a science and tend to use ad-hoc methods in trailer design by, for
example, applying “a lot more cutting” or adding “an unexpected jolt of some kind or a
wonderful piece of music” (Hart 2014). Ultimately, to produce clips marketers often simply use
the first few scenes of the trailer. A more rigorous approach is called for.
[INSERT FIGURE 1 ABOUT HERE]
Emotions play an essential role in experiencing movies and shows, and thus in the trailers
and clips created to promote this content (Boksem and Smidts 2015). Movies draw audiences in
because they provide a concentrated emotional experience (Hewig et al. 2005; McGraw and
Warren 2010). Each movie genre is comprised of a prototypical narrative that is designed to
elicit a central emotion; for example, horror movies evoke fear, tragedies evoke sadness, and
comedies evoke happiness (Grodal 1997). In the present study, we focus on comedy movies and
happiness as their central emotion. Our conceptual framework in Figure 1 shows that the key
8
problem that needs to be addressed for marketers to create effective clips is to identify which
scenes of the original trailer evoke the highest level of that central emotion among viewers. Only
after knowing the intensity and timing of the central emotion can content marketers edit down
long-form trailers to shorter clips that are potentially even more effective than the original
trailers.
The psychology literature on events in film has focused on the scene as a unit of analysis
(Zacks, Speer, and Reynolds 2009), and has shown that viewers parse a film into events based on
the perceptual information that defines and delineates the scenes in the film (Cutting, Brunick,
and Candan 2012). Our framework therefore revolves around the audio-visual scene structure of
trailers (Figure 1). Marketers control the way consumers experience a trailer, and a clip in
particular, through the pacing and length of scene cuts. Consumer behavior research has shown
that pacing and sequencing (Galak, Kruger, and Loewenstein 2011; 2013; Ratner, Kahn, and
Kahneman 1999; Zauberman, Diehl, and Ariely 2006) — in the present context, the number, and
length of scenes — and delays and interruptions (Nelson and Meyvis 2008; Nowlis, Mandel, and
McCabe 2004) — in the present context, scene transitions and cuts — are prime components
affecting consumption experiences. A fast-paced consumption leads to a decrease in enjoyment
due to overly fast satiation (Galak et al. 2013), whereas a slower consumption, sometimes with
an interruption, slows down satiation and leads to a more enjoyable overall experience (Nelson
and Meyvis 2008). We therefore predict that the pacing of the scenes in a comedy movie trailer
will exert a similar impact: happiness levels will generally improve across the sequence of scenes
in a trailer, but a fast-paced trailer with a larger number of scenes results in a lower level of
happiness, and consequently in a lower watching intention. Prior research also addressed the
impact of the consumption sequence. As Nowlis et al. (2004) demonstrated, a delay in

9
consumption will lead to greater consumption enjoyment because the utility of anticipating a
pleasant consumption outweighs the utility of waiting. Loewenstein and Prelec (1993) also
showed that people prefer anticipating the best outcome at the end of a consumption experience.
We therefore predict that if the key scene (typically the longest) is placed later in a comedy
trailer, happiness levels and watching intention will be improved.
Of particular importance in producing clips is sound, which includes voice, special
effects, and especially music. Past literature has shown that sound plays a dual role in
experiences: It orients attention and intensifies emotions. The intensity (volume) of the sound has
been shown to particularly amplify the emotional experience if it occurs in a synchronized
manner (Bradley and Lang 2000; Lang 1995; Lang, Bradley, and Cuthbert 1997). Therefore, we
expect a positive impact of moment-to-moment overall and music volume in a comedy trailer on
happiness and consequently on watching intention. Moreover, because some of the digital media
in which clips are commonly placed (e.g., email and social media) do not support sound, it is
important to be able to predict consumers’ reaction to a clip that does not include any audio or
music. In our framework, we therefore allow for the possibility that several aspects of audio and
music volume (including start, peak, trend, and end volumes) can have an impact on emotions
and viewing intentions. The aspects that exert an influence on the experience depend on the
context (Zauberman et al. 2006) and given a lack of prior literature, we do not formulate specific
predictions on the effects of these specific measures on happiness or watching intentions.
Given that the goal of trailer and clip design is to produce a representative emotional
experience, it is necessary to characterize the emotional content of trailer scenes. While the entire
emotional experience throughout trailer consumption may be significant, research has shown that
the peak and end points of the emotional experience are disproportionately more important in the
10
overall evaluation of the experience by consumers (Baumgartner, Sujan, and Padgett 1984;
Fredrickson and Kahneman 1993). Hence, we predict that scenes with high peak and end
happiness result in higher watching intentions. In addition, research has shown that the general
trend of the emotional experience — whether it is increasing, stable, or decreasing — also
impacts the overall enjoyment (Elpers, Wedel, and Pieters 2003). We therefore predict that a
positive trend in happiness results in higher watching intentions.
In Table 1, we summarize the predicted relationships between the scene-level factors of
the trailers that we focus on, and their downstream impact on emotions and watching intentions
in the context of comedies and happiness as their central motion, as guided by the literature.
With respect to the visual scene structure, the number, length, and sequencing of scenes should
have a measurable impact on the viewer’s emotions. With respect to audio, total sound volume
as well as the music-only volume of scenes should have a direct impact on watching intentions,
as well as an indirect impact, because they evoke or intensify the central emotion of the genre.
Not all moments in the trailer are expected to have a significant impact, but rather the start, peak,
end, and trajectory of audio and emotions should matter.
[INSERT TABLE 1 ABOUT HERE]
In the next section, we explain the methodology employed to collect emotional reactions
to trailers in order to understand their role in consumers’ intentions to watch movies. We focus
on trailers for comedy movies as a prototypical case. Comedy has been the leading genre in the
last two decades, with a little more than 2,000 comedies produced and a market share of over 20
percent. The average gross revenue was about $20 million per movie. The main goal of comedy
movies is to elicit joy and laughter among the audience (McGraw and Warren 2010) and
effective trailers for comedy movies are designed to induce happiness as the central emotion
11
(Grodal 1997). We thus focus on happiness as the key emotion in the illustration of our
methodology, but also look at the role of surprise and disgust as secondary emotions. In the next
section, we explain the method used to parse out trailers into scenes and to measure emotions
moment to moment from a large sample of viewers’ reactions.
METHODOLOGY
An online experiment was conducted in collaboration with the company nViso2, in which
facial expressions and watching intentions were collected for participants watching 100 comedy
movie trailers. Each participant was asked to view a webpage that contained 12 comedy movie
trailers in a setup that mimicked what he or she would encounter on trailer websites from IMDb,
iTunes, or YouTube. Participants watched the trailer in their natural environment, at home, at
work, etcetera, which increases the external validity of data collection. Facial expressions were
recorded remotely through the webcams on participants' computers. At the end of the
experiment, the participants were asked to answer questions regarding their evaluations of the
trailers and the corresponding movies, as well as their intentions to watch the movies. To make
the study incentive-compatible, participants entered a lottery to win the DVD of the movie that
they most wanted to watch. Each participant also received $5 in the form of an Amazon gift card
if they completed the experiment.
Participants and Stimuli
A total of 122 paid participants were recruited online. The participants had a mean age of
24 and an age range from 18 to 68, with 28 percent being men. Participants had to have access to
a personal computer with a webcam and high-speed Internet connection and have near-perfect
12
vision without glasses or contact lenses. Male participants with a full mustache or beard were
excluded.
A total of 100 comedy movie trailers were taken from public access video channels.
Thirteen comedy subgenres were selected, including nine drama comedies, eight animation
comedies, seven action comedies, seven romantic comedies, four horror comedies, four indie
comedies, four parodies, two dark comedies, and one each from political comedy, sci-fi comedy,
slapstick, sports comedy, and late-night comedy. The trailer for a movie typically comes in
different versions with different lengths developed for different viewing situations and/or for
different audiences. Two different versions of the trailer for each movie were included in the
present study to separately identify trailer-specific features (e.g., scene usage, sound, volume)
from movie-specific features (e.g., stars, casting, plot). Taking the movie “Project X” as an
example, one trailer is one minute and 37 seconds and contains 29 scenes, while the second
trailer is two minutes and 26 seconds and contains 32 scenes and has louder sound on average.
Each participant was only exposed to one version of the trailer for the same movie. Overall, 100
trailers for 50 comedies were used in the study; these trailers were selected from a pool of 100
comedy trailers through a balanced incomplete block design. Data collection generated a massive
dataset of upwards of 1.5 million (participant × time × movie) emotional reactions to audio and
video content scenes. The design minimized spillover effects by randomizing the order of the
trailers shown to each participant. One randomly selected comedy trailer was used as a control
stimulus to form an individual-specific emotional baseline and was shown to all participants at
the beginning of the experiment.
Procedure
The participants were asked for consent to participate in the experiment and to be
13
recorded via their webcam. Participants needed to be in a well-lit environment and at most 60
centimeters (2 ft.) away from their webcams. Participants were requested to refrain from eating,
chewing, drinking, or talking. Although this request may have had some impact on the external
validity of the study, it was necessary to ensure accurate recording of facial expressions and
checked afterwards for compliance.
Each participant was shown a random series of 12 trailers. The length of each trailer was
between one and three minutes. After each trailer, participants were asked five questions about
their previous exposure and their evaluation of the trailer and the movie. Watching intention
(WatchMovie) was measured on a scale ranging from one to seven, with seven being the highest
intention, indicating how much participants would like to watch the movie after they had been
exposed to the trailer. After all trailers were shown, participants were asked to answer questions
about their demographics and their general movie-going behavior. At the end of the experiment,
participants were entered in a raffle in which they had a one in 10 chances to win a free DVD.
They were asked to choose one or more movies from any of the movies they had just watched in
the experiment. If they won, one movie was selected from the choices they made. The whole
experiment took up to 45 minutes.
Data Collection
The facial expressions of emotions were collected, calculated, and provided to the
researchers in raw data form by the company nViso, which provides real-time cloud computing
to measure consumers’ emotion reactions in online experiments. For each second that a
participant watched a trailer, a probability was calculated indicating the intensity of the emotion.
An emotional profile was created for each participant, containing the moment-to-moment
measures of happiness and other emotions. The original videos of participants’ expressions were
14
not retained because of privacy concerns, as outlined in IRB regulations for the study. There
were 122 participants in the online questionnaire data and 104 participants provided valid
emotion data. Ninety participants completed the entire questionnaire and had a valid emotion
profile, indicating full compliance with the instructions. Five participants did not provide valid
control data for the calibration trailer, and therefore the final sample consisted of 85 participants
from whom we obtained complete data, which is comparable to the sample sizes commonly used
by nViso in its online tests. For each of the 100 trailers, the data from participants who had seen
the movie previously were removed. Therefore, the number of participants per trailer varied from
three to 19, with an average of 8.43 (SD = 3.14) participants.
Emotion Measurements and Their Validity
Measuring emotions had been a long-standing problem (Mauss and Robinson 2009) until
Ekman and Friesen developed the Facial Action Coding System (FACS) to systematically
categorize emotions by coding instant facial muscular changes (Ekman and Friesen 1978). The
FACS decomposes facial movements into anatomically based “action units,” reflecting the
muscular activity that produces the facial appearance. For example, happiness is characterized by
two primary and three secondary action units. Recently, the Expression Descriptive Units that
measure the interactions among facial muscular movements (Antonini et al. 2006), as well as
Appearance Parameters that consider global facial features (Sorci et al. 2010), have been used to
augment emotion recognition. Although initially emotions had to be assessed by trained coders,
nowadays several off-the-shelf software solutions are available to provide automatic and
accurate moment-to-moment identification of emotions (Fasel and Luettin 2003).
This software has been used in previous marketing studies (Teixeira, Wedel, and Pieters
2012; Teixeira, Picard, and Kaliouby 2014). It has been proven to outperform the work of non-
15
expert coders and to be approximately as accurate as that of expert coders (Bartlett et al. 1999).
The automated algorithm used by nViso splits the video recording of the user’s face into separate
frames, and then uses the facial expression in each static frame to identify the probability of the
occurrence of six basic emotions (happiness, surprise, fear, disgust, sadness and anger) based on
a Multinomial logit model. The explanatory variables include the measurements from Ekman’s
FACS, the Expression Descriptive Units, and the Appearance Parameters (MacCallum and
Gordon 2011; Sorci et al. 2010; more details are provided in Web Appendix I).
The algorithm has been validated using 11-fold cross validation on a database of 1,271
images of facial expressions, manually coded for the expression of emotions by 33 human
coders. This cross-validation yielded a (normed) correlation of .76 (Sorci et al. 2010, Table 4, p.
800). This result is comparable to those for similar automated algorithms, for which
classification accuracies ranging from .78 to .88 have been reported (Brodny et al. 2016; McDuff
et al. 2013). Sorci et al. (2010) also reported other supportive evidence on the performance of the
nViso algorithm and compared it to neural networks, based on Histogram Intersection and
Kullback-Leibler measures.
Movie Trailer Video and Audio Variables
The movie trailer video and audio content of all 100 trailers was analyzed using image
and audio processing software, which yielded the following variables.
Scene cuts: Scene cuts in the movie trailers were detected automatically using the “Scene
Detector” (http://www.scene-detector.com), which is software that detects the scene boundaries
based solely on the frame image data. Based on the scene cuts, we calculated the following
variables for use in the analysis: the total number of scenes, the average length of scenes, and the
location of the longest scene in the trailer.

16
Audio Volume: We extracted two types of volume data. One is Total Volume: Amplitude
data were extracted every millisecond from MP3 audio files using the sound processing software
SoX (http://sox.sourceforge.net). The absolute values were averaged on a second-by-second
basis to match the video data. The other is Music Volume: By removing vocals utilizing SoX, the
music was separated from the audio files, and its volume was calculated as described above. For
both the total volume data and total music volume data, based on our conceptual framework, we
calculated the following variables to be used in the analysis: the moment-to-moment volume
across the trailer, the trend of volume over the course of the trailer, the average volume in the
start scene, the average volume in the end scene, and the scene with peak volume. Figure 2
shows an example of total volume and music volume from one movie trailer (the vertical dashed
lines indicate the scene cuts).
The trailers are comprised of, on average, 23 scenes (Min = 1, Max = 56, SD = 14.23),
with an average length of 11 seconds (SD = 17.6). The total volume is .053 dB (SD = .025),
while the peak volume is .097 dB (SD = .046). The music volume is substantially lower, .017 dB
(SD = .016), with an average peak volume of .041 dB (SD = .038).
Emotion Variables
Intensities of happiness were measured for each participant on a second-by-second basis.
Based on our conceptual framework, aggregate measures of this emotion for each trailer were
calculated as follows: Start, the total emotional intensity during the first scene; the Trend,
calculated via a linear fit to each emotion curve; Peak, the average happiness of the scene with
the highest average emotion level; PeakIndex, the location of that scene in the trailer; and End,
the total emotional intensity during the last scene. Figure 3 shows an example of moment-to-
17
moment happiness and its summary measures for the movie trailer for Men in Black 3. Several
comedy subgenres may rely on other, concomitant emotions of happiness, most importantly
surprise in spoof and action comedies, and disgust in dark, satire, and horror comedies (McGraw
and Warren 2010). Therefore, we retained moment-to-moment surprise and disgust as secondary
emotions in comedy trailers and calculated the Start, Peak, Trend, and End measures for these
two emotions as well. Tests based on a random effects linear model show that there is no
significant trend in any of the emotion variables across the sequence in which the trailers are
shown.
Control Variables
While there are countless variables that one could incorporate into the model, we chose to
incorporate the ones readily available to marketers of online video content aggregators. In that
spirit, we obtain several control variables for movies extracted from the online databases
“Internet Movie Database” (IMDb), owned by Amazon, “Rotten Tomatoes,” and “The
Numbers,” including MPAA Reviews, ratings (from IMDb), number of ratings above 3.5 (from
Rotten Tomatoes, log-transformed), and release time (whether the movie is released during the
summer or Christmas holiday season).
STATISTICAL MODEL
In our conceptual framework (see Figure 1 and Table 1): a) audio and video features
affect the moment-to-moment emotional experience; b) aggregate measures of emotions,
including start, peak, and end levels, affect intentions to watch the movie; c) in addition to their
18
indirect effects via emotions, some aggregate audio and video measures may affect watching
intentions directly; and d) finally, watching intentions, together with aggregate emotion measures
and high-level movie characteristics, such as reviews and ratings, impact box office revenue.
The statistical methodology reflects the postulated theoretical relations. Whereas prior
research has mostly used emotions as explanatory variables, here we model the moment-to-
moment emotional response jointly with the end-point behaviors of prime interest (watching
intention and box office revenues). There are three sub-models combined in this joint model, one
for the (longitudinal) happiness data, one for watching intention data, and one for box office
revenue data. The happiness and watching intention sub-models are connected through
individual-specific random effects (Tsiatis and Davidian 2004). We account for unobserved
individual differences though a hierarchical formulation. Given that there are over 40 predictor
variables, we simultaneously apply Bayesian variable selection to each of the three model
components to select the specific measures that predict watching intention.
First, the logit-transformed happiness probabilities for individual i watching trailer j at
time t are denoted as hijt , and are modeled as
(1) hijt = θijt + eijt .
Here, θijt is the underlying true emotion trajectory (modeled as described below). Whereas
previous research treated moment-to-moment measurements of emotions as fixed exogeneous
variables (for example, Teixeira, Wedel, and Pieters 2012), here measurement/classification
error, denoted as ξijt , is accommodated. It is obvious that ξijt is not separately identified from
other sources of error, say ςijt , and therefore subsumed in the model’s error term: eijt = ξijt +
ςijt . The error terms eijt are assumed to be independently Normally distributed, and because the
emotions are classified independently on a frame-by-frame basis, it is not unreasonable to

19
assume them to be uncorrelated over time. In equation (1), the underlying emotion trajectory
θijt is expressed as
(2) θijt = 𝐖1i (t) + 𝐗1j 𝛃 + ζ1 Sjt + ζ2 Vjt + ζ3 Mjt
Here, 𝐖1i (t) are subject-specific random effects (see below). As for the moment-to-moment
audio and video features, Sjt represents the index of the scene at time t; Vjt represents the total
audio volume of trailer j at time t; and Mjt represents the volume of music of trailer j at time t.
The matrix 𝐗1j contains the trailer-specific aggregate video and audio variables (start, peak, end,
and trend) described in the previous section.
Second, an ordered logit model is developed for the watching intentions, with yij
representing individual i’s intention to watch the movie j, modeled as a function of the latent
variable yij∗ as follows:
(3) yij∗ = 𝐗 2ij α + 𝐖2i ,
(4) 1, yij∗ < τ1

yij = {d, τd−1 < yij∗ < τd , d = 2, … , D − 1.
D, yij∗ > τD−1
Here, D = 7 and 𝐖2i contains subject-specific effects, similar to 𝐖1i (t). (The threshold
parameters satisfy the order constraint: τ1 < τ2 < ... < τD, and the first and last thresholds are fixed
for identification (Lenk, Wedel, and Böckenholt 2006). The matrix 𝐗 2i,j contains the predictor
variables, including the aggregate measures of emotions, and the aggregate video and audio
variables extracted from the movie trailers, which are our main explanatory variables. This
model is linked to the model for happiness through the dependence between 𝐖1i (t) and 𝐖2i .
These random effects capture unobserved individual-specific effects in the intercept and the
trend of happiness, respectively. Specifically, 𝐖1i (t) and 𝐖2i are expressed as:
20
(5) 𝐖1i (t) = u1i + u2i t
𝐖2i = ν1 u1i + ν2 u2i + u3i
The random effects for the intercept and the slope are denoted as u1i and u2i , and together with
u3i are assumed to follow Normal distributions. The parameters ν1 and ν2 capture the association
between the happiness and watching intention models.
Third, we model log-box office revenues at the movie level as a function of predicted
watching intentions and as such:
(6) ln(rj ) = ν0 + νy∙j∗ + 𝐗 3j λ + εj ,
Here, y∙j∗ = ∑i yij∗ /N, and 𝐗 3j contains the aggregate emotion measures and control variables
described in the previous section. Note that equations (1) to (6) constitute a system of
simultaneous equations that are jointly estimated. Gross box office revenue for each of the
movies corresponding to the trailers used in the study was obtained from the IMDb database for
the year in which the study was conducted.
To identify a parsimonious model that has fewer explanatory variables, we apply the
Gibbs Variable Selection (GVS) procedure developed by Dellaportas, Forster, and Ntzoufras
(2000) to efficiently search for the best subset of predictor variables in 𝐗1ij , 𝐗 2ij , and 𝐗 3j for each
of the three model components, respectively. We use a variable selection approach because this
allows us to use a limited subset of predictor variables in hold-out data collection and model
validation. In the GVS approach, the coefficients of the regression model are assumed to have
spike-and-slab prior distributions with a mixture of a point mass at 0 and a diffuse distribution
elsewhere. Specifically, an auxiliary indicator variable Ik is introduced for each covariate in
equations (2), (3), and (6), with Ik = 0 indicating the absence of the covariate k in the model and
Ik = 1 indicating its presence. A (generic) regression coefficient βk in any of these equations is

21
then specified as:
0, if Ik = 0 (spike)
(7) βk = { .
ηk , if Ik = 1 (slab)
The joint density P(Ik , βk ) = P(βk |Ik )P(Ik ). The effect size parameter ηk is assumed to have a
mixture prior: P(ηk |Ik ) = (1 − Ik ) × N(μ̃, τ̃2 ) + Ik × N(0, σ

̃2 ), where (μ̃, τ̃2 ) requires tuning.
̃2 , is the fixed prior variance of ηk . A Bernoulli prior distribution is assumed for

The parameter, σ
the indicator: Ik ~Bern(.5).
The model is estimated with Markov Chain Monte Carlo (MCMC), using the JAGS
software, with the code provided in Web Appendix II.
OPTIMAL DESIGN OF CLIPS
The integrated model described above is used to produce optimal short movie clips that
can be inserted in emails, messages, and social media, and in the apps, landing pages, and user
interfaces of content providers. Online advertising channels are idiosyncratic in the video
formats they accept, specifically regarding whether or not they support audio. For example,
YouTube plays videos with sound by default, while on Facebook, 85 percent of videos are
watched without sound (Patel 2016), and Netflix only allows GIF format clips without audio and
subtitles in its promotional emails. We produce clips of about 30 seconds in length (but, in
principle, any desired length is possible) and design optimal trailers both with and without audio.
For the former, we use the full model described above and for the latter, we recalibrate the model
while excluding the audio variables.
Let Sj = {Sj,1 , … , SjTj } be the sequence of scene indicators across Tj , the length of trailer j,
22
and let there be K j = n(Sj ) scenes for trailer j. The criterion optimized is the mean of the
posterior distribution of the predicted watching intention of the movie:
(8) yj (Sj∗ ) = ∫ y∙j (Sj∗ |Φ)f(Φ|data)dΦ ,
where Φ contains all model parameters, f(Φ|data) denotes the posterior distribution of the
parameters, and Sj∗ = {Sj,1 , … , Sj,T∗j } denotes the sequence of scene indicators across the length of
the clip, Tj∗ . The algorithm we propose to find an optimal trailer j of approximately 30 seconds
(or any other length) is a backward elimination algorithm that one by one, eliminates scenes that
reduce yj (Sj∗ ) least. The algorithm works as follows:
1. Start with the complete trailer, Sj = {Sj,1 , … , SjTj };
∗
2. Eliminate all Sj,t = k, for k = 1, … , K ∗j and in turn, delete the corresponding elements
from Vj , Mj , and hij , and calculate: yj (Sj−k ) =∑Rr=1 ∑N −k r

i=1 yij (Sj |Φ )/NR, with r a
draw from the Gibbs sampler;
3. Retain the clip without scene ℓ = arg min[yj (Sj−k )] by removing all time periods t
k
∗
for which Sj,t = ℓ, and set K ∗j = K j − 1; and
4. If |Tj∗ − Δ| < ϵ, stop and if not, return to 2. In the application, we use ϵ = 1 second
and Δ is set to be 30 seconds.
The proposed backward elimination algorithm is part of a class of “greedy” algorithms
that make a locally optimal decision at each stage s of the algorithm. For example, at step 1 for
trailer j, it eliminates the most redundant scene from the trailer to produce an optimal clip with
K j − 1 scenes. This simple backward selection algorithm is computationally attractive because
for a trailer j, it will provide a solution in less than K j steps. It avoids the need to enumerate all
possible configurations of scenes, which would be required to find the globally optimal solution.
23
In some cases, the proposed backward elimination strategy may thus not produce a globally
optimal solution, but it will yield a locally optimal approximation of that solution (Couvreur and
Bresler 2000). For online content aggregators, this approach has two benefits. It allows the
movie marketer to optimally create clips of any length shorter than the original trailer, and it can
be done very fast.
RESULTS
Model Comparison
We first test alternative specifications of the joint model with the purpose of investigating
the contribution of the happiness and secondary emotion measures, the video characteristics, the
audio characteristics, and the intention as a predictor of box office revenue. We calculate several
measures of model fit, including Akaike Information Criterion (AIC), the Deviance Information
Criterion (DIC; Spiegelhalter et al. 2002), and the Watanabe-Akaike Information Criterion
(WAIC). The latter was proposed by Gelman et al. (2014) as a computationally convenient
predictive measure that is based on the entire posterior distribution rather than a point estimate
(as is the case for the AIC and DIC statistics). A smaller value for these statistics indicates a
better fitting or predicting model.
We examine six models: 1) the full model, and five models for which we remove the
following sets of predictor variables from the full model: 2) the measures of the secondary
emotions surprise and disgust, 3) the measures of all emotion variables of happiness, surprise,
and disgust, 4) all video variables, 5) all audio variables, and 6) intention as a predictor of box
office sales. Table 2 shows the AIC, BIC, and WAIC statistics for each of the models. The model
without the audio variables shows the largest reduction in (predictive) fit relative to the full
24
model, followed by the model without the video variables. The importance of audio that this
reveals has ramifications for the design of clips for media that do not support audio. Dropping
watching intention as a predictor of box office success significantly reduces the (predictive) fit of
the model. However, while dropping all emotion measures simultaneously reduces model fit
significantly, the model without the (start, trend, peak, and end) measures of surprise and disgust
fits better than the full model. Apparently, for our sample of movie trailers, these two secondary
emotions do not play a significant role in the formation of watching intentions. We therefore
report the results of the model without the secondary emotions (model 2, main model) below.
Bayesian Variable Selection
Table 3 displays the posterior means of the inclusion probabilities obtained from the
Bayesian variable selection. The table shows that for the happiness model, only the music peak
has a very low inclusion probability (.046). For the watching intention model, variables that have
very low inclusion probabilities are the sequence number of the scenes (.030), the average scene
length (.053), the peak volume (.016), the end volume (.028), the music peak (.029), the end
volume (.029), and, to a lesser extent, the trend in music volume (.142). Thus, while most of the
audio and video variables affect moment-to-moment happiness, only a few (longest scene, total
and music volume in the first scene, and the trend in volume) affect watching intention directly.
Almost all emotion measures affect watching intention, but the inclusion probability of average
happiness in the watching intention model is relatively low (.196). For the box office model, only
the IMDb ratings have a low inclusion probability (.055).
In our application, a very large number of models is searched over and even the set of
most promising models may be large. We therefore use a heuristic cutoff on the inclusion
25
probabilities to select the “best” model and based on Table 3, a standard cutoff of .2 is employed.
We investigate the sensitivity to the cutoff by varying it from .10 to .35 in steps of .05 and re-
estimating the model, including the variables, based on that cutoff. All coefficients of predictor
variables in the emotion and box office models that are significant (have a credible interval that
does not cover zero) stay the same, regardless of the cutoff used, but for the watching intention
model, as the cutoff changes there are some relatively minor variations in the significance of
coefficients.3 We discuss the estimates of the final model next.
Interpretation of the Parameter Estimates
In Table 4, we present the estimates of the final model. The table shows that video
features directly impact momentary feelings of happiness. The level of happiness increases with
increasing scene sequences (Scene). The number of scenes in a trailer (SceneNum) has a negative
effect on happiness, confirming that fast-paced comedy trailers tend to result in a lower level of
happiness. We find that longer scenes placed later in the trailers (SceneLongestInd) increase
happiness significantly. These findings confirm our predictions (see Table 1). As for audio, we
find that its moment-to-moment volume has a significant positive instantaneous effect on
happiness, as predicted (Table 1). We did not have specific predictions on the effects of the
sound volume measures, but peak volume (VolumePeak) and increasing trend in volume
(VolumeTrend) decrease happiness. End volume (VolumeEnd) has a positive effect. Music
volume (Music) has a negative moment-to-moment effect on happiness, but louder music at the
start of the trailer improves happiness (MusicStart).
The moment-to-moment experience of happiness throughout the trailer positively affects
watching intentions. As predicted by peak-end theory (Fredrickson and Kahneman 1993), both
26
the peak happiness (HappinessPeak) and the happiness experienced at the end of the trailer
(HappinessEnd) have a positive effect on watching intentions (Table 1). In line with our
prediction, an increasing trend in happiness (HappinessTrend) also affects watching intentions
positively. Finally, the association between the random intercepts in the happiness and watching
intention models is significant, with a higher variation in the level of happiness being associated
with lower watching intentions. These findings confirm our predictions (Table 1).
Alongside the indirect effects of video and audio variables on watching intentions via the
happiness experienced, there are also direct effects. Longer scenes placed later in the trailers
(SceneLongestInd) increase watching intentions significantly, over and above their effect on
happiness, as predicted (Table 1). Further, increasing volume (VolumeTrend) decreases not only
happiness but also watching intentions directly. However, louder music at the start of the trailer
(MusicStart), while improving happiness, has a negative direct effect on watching intentions.
In the box office revenue model, several of the control variables have a significant
impact, including ratings from Rotten Tomatoes (NumRatingabove3.5; positive effect) and
MPAA reviews (MPAA = PG, PG13, and R; negative effect). Finally, and importantly, watching
intentions (WatchIntention), as predicted by the watching intention model, have a significant
positive impact on box office revenues.
MANAGERIAL IMPLICATIONS: OPTIMAL CLIP PRODUCTION AND TESTING
Optimal Clip Production and Predictions
The parameter estimates in Table 4 were used as inputs to the stepwise scene selection
algorithm to produce an optimal movie clip of about 30 seconds in length for each of the 50 pairs
27
of trailers. Because some media for which these clips are intended do not allow for sound, clips
were produced both with and without sound. For this purpose, two models with and without the
sound variables were used. As a benchmark for comparison, we use the current practice to
produce clips by selecting the first 30 seconds of the trailer. The results are shown in Table 5.
Optimal movie clips with audio: Table 5 shows that the optimal clips consist on average
of 3.6 scenes, while the benchmark has more and thus shorter scenes, 4.8 on average. The
predicted average watching intention of the optimal clips (7-point scale) is considerably higher
(3.83) than that of the benchmark clips (2.91). The predicted watching intention of the original
trailer is 3.32 (SD = .55), and thus the shorter optimal clip results in an even higher intention to
watch the movie than the original trailer. The average difference in watching intention between
the optimal and benchmark clips is almost a full point (.92) on the seven-point scale, and over 90
percent of the optimal clips have higher watching intentions than the corresponding benchmark
clips. These watching intentions translate to a predicted 3.17 percent improvement in expected
box office revenue for the optimal clips.
Recall that the data contains two versions of the trailer for each movie. For each of the
two versions of a trailer, we produced a clip using our algorithm. On average, the difference in
predicted watching intention between these two clips for the same movie was 1.29 (SD = .76).
From each pair of clips, we selected the one with the highest predicted watching intention. These
clips had an average predicted watching intention across all movies of 4.21 (SD = .92), which
translates to a 4.80 percent predicted increase in box office revenue. Thus, because the two
different trailers have a wider range of scenes from the movie, selecting the best clip from the
pair results in considerably higher watching intentions and predicted box office success.

28
Optimal silent movie clips: For clips produced without audio (“silent clips”), the optimal
clip contains 3.6 scenes, on average, while the benchmark contains 4.8 scenes. These results are
not noticeably differ from those for clips with audio. The predicted watching intentions for the
optimal silent clips are 3.80 (seven-point scale), which is only somewhat lower than those for the
optimal clips with audio (Table 5). Yet, 42 percent of the silent clips result in a higher watching
intention than their counterparts with audio. Note that this does not reflect a lack of contribution
of audio to watching intentions, but reveals that it is possible to eliminate scenes from the trailer
in such a way that even the resulting silent clips still have a high intention of being watched.
The optimal silent clips result in higher watching intentions than the original trailer (3.32)
and the benchmark silent clips (3.28). The average predicted difference in watching intention
between the optimal and benchmark silent clips is about a half point (.5), and over 90 percent of
the optimal silent clips have higher watching intentions than the silent benchmark clips. These
higher watching intentions result in a predicted 1.75 percent improvement in expected box office
revenue. We conducted a similar analysis using the best of the two versions of the clip for each
movie. On average, the difference in predicted watching intention between the two silent clips
for the same movie was .79 (SD = .45). For the best silent clip in each pair, the average predicted
watching intention is 4.07 (SD= .66), which translates to a 2.45 percent increase in box office
revenue.
We thus demonstrate the beneficial effects of optimizing movie clips via simulation. To
investigate consumers’ response to the actual clips in hold-out validations, we conduct two
experiments. The first is an online experiment and the second is a large-scale field experiment.
Evaluation of Optimal Clips with an Online Experiment
We selected the five best-performing clips with audio and the five best-performing silent
29
clips from the simulation analyses. The movie titles included “Dark Shadow,” “Mirror Mirror,”
“The Odd Life of Timothy Green,” “Project X,” “Rock of Ages,” “Some Guy Who Kills
People,” “Wanderlust,” and “What to Expect When You Are Expecting.” Two clips overlapped
between the two sets of five: “Project X” and “Mirror Mirror.” For each clip, we produced the
actual benchmark and optimal movie clips by editing the digital video file of the trailers based on
the proposed procedure. The clips were produced in GIF format and were about 30 seconds long.
One-hundred and seventy-five undergraduate and graduate students were recruited for the
experiment and participated for extra course credit. To make the study incentive-compatible,
participants were entered in a lottery for the chance to win a $50 gift card to be used to go and
see the movie they liked the most. Some platforms that do not allow for sound (such as
Facebook.com) do allow clips to show subtitles. We therefore also added subtitles to the
optimized silent clips, as this may increase comprehension of the narrative. We showed each
participant five clips in a randomized order. To avoid spillover effects, we only showed one
(randomly selected) version of a clip (optimized or benchmark) to each participant. After
watching each clip online, the participants were asked to answer three questions based on seven-
point scales to assess their evaluation of the clip (How much do you like this movie clip?), the
movie (How would you rate the movie based on this trailer?), and their intention to watch the
movie (Would you like to watch this movie?).
We obtained usable data on 169 respondents. The average of the three evaluation
measures was analyzed with MANOVA, which reveals strong evidence of the performance of
the optimized clips over the benchmark clips with audio (p < .001, partial eta-squared ƞ2 = .08)
and without audio (p < .001, ƞ2 = .08), and of the effect of adding subtitles to silent clips (p
< .001, ƞ2 = .13). Relative to the benchmark, the optimization procedure significantly improves
30
the measures for each type of movie clip, with moderate effect sizes. The results for the average
of the three evaluation measures are presented in Table 6 for each of the five movie clips
separately. For all three types of clips, improvement over the benchmark is among the largest for
“Mirror Mirror”: For the clip with audio, evaluations increase by 21.9 percent (p = .003, ƞ2
= .16), while for silent clips without subtitles, they increase by 17.1 percent (p = .056, ƞ2 = .16),
and for silent clips with subtitles, they increase by 32.3 percent (p = .002, ƞ2 = .21). The benefits
of the proposed procedure are smallest for silent clips, and adding subtitles may thus be
important when auto-play videos are muted, which is something companies currently do not
seem to do. This hold-out study, in which actual clips were produced and presented to a new
sample of respondents, thus provides evidence of the effectiveness of the proposed model and
optimization procedure.
Evaluation of Optimal Clips with a Field Experiment
To test whether optimized clips indeed improve the actual viewing behavior of actual
customers, we worked with Netflix’s messaging team and conducted a field experiment to test
our approach on an email campaign for one of Netflix’s original romantic comedy movies, just
before its launch in August 2017. First, we collected data in a facial-tracking experiment
(through nViso) with a sample of 41 participants viewing the trailer of this movie, using the
procedure described in the methodology section. With our model estimates reported in Table 4
and usable facial-expression data from 40 participants and from scene and other characteristics
of the trailer, we produced silent optimized benchmark clips of 19 seconds in GIF format
(without subtitles), according to Netflix’s requirements. In addition to comparing the optimized
clip to the benchmark clip, we also compared it to a static image as these are still frequently
31
used in Netflix’s email campaigns.
Using a stratified sampling procedure, Netflix users were allocated to strata based on a
unique combination of their region, device type (e.g., iPhone, Android, Apple TV, etc.),
payment type (e.g., debit, credit, etc.), tenure (1, 2, … years), and plan (basic, standard,
premium). Then, using machine-generated random numbers, the participants in each stratum
were randomly assigned to one of three conditions: 1) the baseline with a static image, 2) the
benchmark clip, and 3) the optimized clip. Each condition has an equal number of participants
from each of the strata. In total, 40,000 Netflix customers from non-U.S., English-speaking
countries were involved. Each participant received a promotional email from Netflix with the
optimized clip, benchmark clip, or static image embedded. The emails had the same subject line
and supporting text, and the clips looped. Next, we report Netflix’s standard statistics on the
variables directly related to streaming behavior, as well as effect sizes.4
First, compared to the static image, the optimized and benchmark clips perform
significantly better in terms of the average number of Streaming Hours with a moderate effect
size (an average lift of 1.60 percent, p < .001, Cohen’s h = .253), showing that customers are
more receptive to clips than to static images. In addition, while a .30 percent higher Watching
Percentage (customers who watched at least 70 percent of the movie) for the optimized clip
relative to the benchmark is not significant, the optimized clip reduces the percentage of Short
Viewers (customers who viewed less than 6 minutes of the movie) by 10.5 percent (p = .058,
Odds ratio = 1.121), and reduces the Bad Player Ratio (percentage of Short Viewers divided by
Watching Percentage) by 12.5 percent (p < .000, Odds ratio = 1.170), compared to the
benchmark clip. Although the effect sizes are relatively small, the optimized clip enhanced
streaming behavior compared to the benchmark clip.

32
The results of the field test, while preliminary, are encouraging. This is especially the
case because the results rely only on a single silent movie clip, because the Netflix original
comedy movie was not part of the model calibration data, because the samples came from very
different populations, and because streaming behaviors occur far downstream from exposure to
the clips. Nevertheless, the field test showed that streaming behavior was meaningfully
impacted by optimizing the clip with the proposed approach.
DISCUSSION
Movie trailers have long been regarded as the movie industry's most effective marketing
tool (Faber and O’Guinn 1984), but original two- to three- minute trailers, not only for movies
but also for sitcoms and video games, are becoming less effective in new digital media.
Marketers are therefore seeking to produce much shorter video clips to promote their content in
these media. Film clips are ads for movies, akin to those for video games and TV shows, but
different from food, car, and electronics commercials in that they are made up of samples of the
product that is being promoted. Viewers of clips thus experience a sample of the emotions that
they will experience when they go see the movie. The challenge for marketers resides in
identifying how many and which scenes of the trailer to show in a short video clip. The goal of
this research is to support movie marketers in this effort by investigating how to cut the trailers
provided by trailer production houses down to short clips that are suitable for today’s electronic
media, while eliciting an emotional experience that is representative of the movie and stimulates
people to go and watch it.
But how to optimally sample the trailer content remains an open question. The
33
contribution of this research lies in the development of a theoretical and methodological
framework for moment-to-moment emotions, watching intentions, and box office success
(Figure 1) to support this goal. The framework centers on the scene as the basic building block of
movies, trailers, and clips. The proposed method helps marketers to select those scenes from a
trailer that renders short clips the most effective. The findings of the analyses, simulations, and
online and field tests show that our approach enables the design of short clips that not only
increase consumers’ intentions to watch the movie, but that also improve predicted box office
success and streaming behavior.
This research marks a first attempt to investigate the effectiveness of clips, with the
application focusing on clips for the comedy movie genre. While happiness, as the central
emotion of the genre, has strong effects, we do not find an effect of concomitant emotions, such
as surprise and disgust that might be significant in spoof, action, dark, satire, and horror
comedies. This result might be caused by the sample of participants and movies in the present
study, and future research should further examine the role of such concomitant emotions. We do
expect the proposed approach to be directly applicable to other movie genres which elicit a
different central emotion (Grodal 1997). Further refinement of the approach for that purpose may
be useful.
In a broader context, our approach involves content marketing. In content marketing, a
sample of the product is marketed, and applications arise not only for movies, but also for
immersive games, for TV shows on HBO and Netflix, for news items shown on news sites and
news aggregators, such as Flipboard, and even for books (Arons 2013). The manner in which our
approach can be extended to support the marketing of these other types of products requires
further study. In such future research, face tracking may be combined with measures such as
34
those derived from EEG, which have recently been shown to be predictive of movie preferences
and box office success (Boksem and Smidts 2015).
Online ads have a short “shelf life” compared to traditional forms of advertising, such as
TV commercials. As such, online marketers need to constantly create new content for these ads
to attract and retain consumers’ attention online. The traditional production process of ads is
expensive and often slow and therefore marketers are increasingly considering automation to
produce variations of ads as quickly as possible and with low budgets. Our approach to
advertising online content via short clips can be automated, scaled up, and personalized. Once
representative calibration data is available on which the models in question have been trained,
film clips can be automatically produced using the proposed algorithm. Taking this one step
further, using customer-level data, our procedure could be utilized to customize the selection of
scenes to produce personalized clips that maximize the elicited response from each individual
customer. The pursuit of the automation and personalization of the content of movie clips holds
promise to greatly enhance marketing effectiveness (Wedel and Kannan, 2016). We hope the
present study provides a starting point for these future research avenues, and consequently
improves the effectiveness of marketing for movies and the marketing of content.
35
References
Arons, Rachel (2013), “The Awkward Art of Book Trailers,” The New Yorker, (accessed
February 5, 2015), [available at http://www.newyorker.com/books/page-turner/the-
awkward-art-of-book-trailers]
Bartlett, Marian Stewart, Joseph C. Hager, Paul Ekman, and Terrence J. Sejnowski (1999),
“Measuring Facial Expressions by Computer Image Analysis,” Psychophysiology, 36 (2),
253–63.
Baumgartner, Hans, Mita Sujan, and Dan Padgett (1997), “Patterns of Affective Reactions to
Advertisements: The Integration of Moment-To-Moment Responses into Overall
Judgments,” Journal of Marketing Research, 34 (2), 219–232.
Boksem, Maarten A. S. and Ale Smidts (2015), “Brain Responses to Movie Trailers Predict
Individual Preferences for Movies and Their Population-Wide Commercial Success,”
Journal of Marketing Research, 52(4), 482-492.
Bradley, Margaret M., and Peter J. Lang (2000), “Affective Reactions to Acoustic Stimuli,”
Psychophysiology, 37(02), 204-215.
Brodny, Grzegorz, Agata Kołakowska, Agnieszka Landowska, Mariusz Szwoch, Wioleta
Szwoch, and Michał R. Wróbel (2016), “Comparison of Selected Off-The-Shelf Solutions
for Emotion Recognition Based on Facial Expressions,” In: 9th International Conference
on Human System Interactions (HSI), 397-404.
Cornfield, Jerome (1951), “A Method for Estimating Comparative Rates from Clinical Data.
Applications to Cancer of the Lung, Breast, and Cervix,” Journal of the National Cancer
Institute, 11, 1269–1275.

36
Couvreur, Christophe, and Yoram Bresler (2000), “On the Optimality of the Backward Greedy
Algorithm for the Subset Selection Problem,” SIAM Journal on Matrix Analysis and
Applications, 21 (3), 797-808.
Cutting, James E., Kaitlin L. Brunick, and Ayse Candan (2012), “Perceiving Event Dynamics
and Parsing Hollywood Films,” Journal of Experimental Psychology: Human Perception
and Performance, 38 (6), 1476-1490
Dellaportas, P., J. J. Forster, and Ioannis Ntzoufras (2000), “Bayesian Variable Selection Using
the Gibbs Sampler,” Biostatistics-Basel, 5 (May), 273-286.
Ekman, Paul and Wallace V. Friesen (1978), Facial Action Coding System: A Technique for the
Measurement of Facial Movement, Consulting Psychologists Press, Palo Alto.
Eliashberg, Jehoshua and Steven M. Shugan (1997), “Film Critics: Influencers or Predictors?”
Journal of Marketing, 61 (2), 68-78.
-------, Sam K. Hui, and Z. John Zhang (2007), “From Story Line to Box Office: A New
Approach for Green-Lighting Movie Scripts,” Management Science, 53 (6), 881–893.
Elpers, Josephine L.C.M. Woltman, Michel Wedel, and Rik G. M. Pieters (2003), “Why Do
Consumers Stop Viewing Television Commercials? Two Experiments on the Influence of
Moment-to-Moment Entertainment and Information Value,” Journal of Marketing
Research, 40(4), 437-453.
Faber, Ronald J. and Thomas C. O’Guinn (1984), “Effect of Media Advertising and Other
Sources on Movie Selection,” Journalism Quarterly, 61 (2), 371–377.
Fasel, Beat and Juergen Luettin (2003), “Automatic Facial Expression Analysis: A Survey,”
Pattern Recognition, 36, 259-275.

37
Fredrickson, Barbara L., and Daniel Kahneman (1993), “Duration Neglect in Retrospective
Evaluation of Affective Episodes,” Journal of Personality and Social Psychology, 65 (1),
44-55.
Galak, Jeff, Justin Kruger, and George Loewenstein (2011), “Is Variety the Spice of Life? It All
Depends on The Rate of Consumption,” Judgment and Decision Making, 6(3), 230-238.
-------, (2013), “Slow down! Insensitivity to Rate of Consumption Leads to Avoidable Satiation,”
Journal of Consumer Research, 39(5),993-1009.
George, Edward I., and Robert E. McCulloch (1997), “Approaches for Bayesian Variable
Selection,” Statistica Sinica, 7, 339-373
Gelman, Andrew, Jessica Hwang, and Aki Vehtari (2014), “Understanding Predictive
Information Criteria for Bayesian Models,” Statistics and Computing, 24(6), 997-1016.
Grodal, Torben (1997), “Moving Pictures: A New Theory of Film Genres, Feeling, and
Cognition,” Oxford: Clarendon Press.
Hart, Hugh (2014), “9 (Short) Storytelling Tips from A Master of Movie Trailers.” Fast
Company (accessed January 29, 2015), [available at
http://www.fastcocreate.com/3031012/9-short-storytelling-tips-from-a-master-of-movie-
trailers]
Hewig, Johannes, Dirk Hagemann, Jan Seifert, Mario Gollwitzer, Ewarld Naumanna and Dieter
Bartussek (2005), “A Revised Film Set for The Induction of Basic Emotions,” Cognition
and Emotion, 19 (7), 1095–1109.
Hui, Sam K, Tom Meyvis and Henry Assael (2014), “Analyzing Moment-To-Moment Data
Using a Bayesian Functional Linear Model: Application to TV Show Pilot Testing,”
Marketing Science, 33(2), 222–240.

38
Kellaris, James J. and Rice, Ronald C. (1993), “The Influence of Tempo, Loudness, and Gender
of Listener on Responses to Music,” Psychology and Marketing, 10: 15–29.
Kerman, Lisa (2004), “Coming Attractions: Reading American Movie Trailers,” in Texas Film
and Media Studies Series. Austin: University of Texas Press, 1st Edition.
Lang, Peter. J. (1995), “The Emotion Probe: Studies of Motivation and Attention,” American
Psychologist, 50(5), 372.
-------, Margaret M. Bradley, and Bruce N. Cuthbert (1997), “Motivated attention: Affect,
Activation, and Action,” in Attention and Orienting: Sensory and Motivational Processes,
Mahwah, NJ: Lawrence Erlbaum, 97-135.
Last, J. (2004), “Opening Soon,” Wall Street Journal, (accessed May 1, 2004), [available
http://wwww.opinionjournal.com].
Lenk, Peter, Wedel, Michel and Böckenholt, Ulf (2006), “Bayesian Estimation of Circumplex
Models Subject to Prior Theory Constraints and Scale-Usage Bias,” Psychometrika,
71(1), 33—55.
Litman, Barry R. (1983), “Predicting Success of Theatrical Movies: An Empirical Study,”
Journal of Popular Culture, 16 (9), 159–175.
Loewenstein, George F., and Dražen Prelec, (1993), “Preferences for Sequences of Outcomes,”
Psychological Review, 100(1), 91-108.
MacCallum, David and Alistair Gordon (2011), “Say It to My Face! Applying Facial Imaging to
Understanding Consumer Emotional Response,” AMSRS Conference 2011.
McDuff, Daniel, Rana El Kaliouby, David Demirdjian, and Rosalind Picard (2013), “Predicting
Online Media Effectiveness Based on Smile Responses Gathered Over the Internet,” In
39
Automatic Face and Gesture Recognition (FG), 10th IEEE International Conference and
Workshops, 1-7.
McGraw, A. Peter and Caleb Warren (2010), “Benign Violations: Making Immoral Behavior
Funny,” Psychological Science, 21 (8), 1141-1149.
Mauss, Iris B. and Michael D. Robinson (2009), “Measures of Emotions: A Review,” Cognition
and Emotion, 23 (2), 209–237.
Nelson, Leif D., and Tom Meyvis (2008), “Interrupted Consumption: Disrupting Adaptation to
Hedonic Experiences,” Journal of Marketing Research, 45(6), 654-664.
Newcombe, Robert G. (1998), “Interval Estimation for the Difference between Independent
Proportions: Comparison of Eleven Methods,” Statistics in Medicine,17, 873–890.
Nowlis, Stephen M., Naomi Mandel, and Deborah Brown McCabe (2004), “The Effect of a
Delay Between Choice and Consumption on Consumption Enjoyment.” Journal of
Consumer Research, 31(3), 502-510.
Patel, Sahil (2016), “85 Percent of Facebook Video Is Watched without Sound,” Digiday,
(accessed February 13, 2018), [available at https://digiday.com/media/silent-world-
facebook-video/]
Ratner, Rebecca K., Barbara E. Kahn, and Daniel Kahneman (1999), “Choosing Less-Preferred
Experiences for The Sake of Variety,” Journal of Consumer Research 26(1), 1-15.
Shannon-Jones, Samantha (2011), “Trailer Music: A Look at The Overlooked,” The Oxford
Student, (accessed October 26, 2011), [available at http://oxfordstudent.com/
2011/10/26/trailermusi/].
Sharda, Ramesh and Dursun Delen (2006), “Predicting Box-Office Success of Motion Pictures
with Neural Networks,” Expert Systems with Applications, 30 (2), 243–254.

40
Shiv, Baba, and Stephen M. Nowlis, (2004), “The Effect of Distractions While Tasting a Food
Sample: The Interplay of Informational and Affective Components in Subsequent
Choice,” Journal of Consumer Research, 31(3), 599-608.
Sorci, Matteo, Gianluca Antonini, Javier Cruz, Thomas Robin, Michel Bierlaire, and J-Ph Thiran
(2010), “Modelling Human Perception of Static Facial Expressions,” Image and Vision
Computing, 28, 790–806.
Spiegelhalter David J., Nicola G. Best, Bradley P. Carlin, and Angelika Van Der Linde (2002),
“Bayesian Measures of Model Complexity and Fit,” Journal of the Royal Statistical
Society Series B-Statistical Methodology, 64, 583-616.
Teixeira, Thales, Michel Wedel and Rik Pieters (2012), “Emotion induced engagement in internet
video ads,” Journal of Marketing Research, 49 (2), 144–159.
-------, Rosalind Picard and Rana El Kaliouby (2014), “Why, When, And How Much to Entertain
Consumers in Advertisements? A Web-Based Facial Tracking Field Study,” Marketing
Science, 33(6), 1– 19.
Tsiatis, Anastasios A. and Marie Davidian (2004), “Joint Modeling of Longitudinal and Time-To-
Event Data: An Overview,” Statistica Sinica, 14, 809–834.
Wedel, Michel and P.K. Kannan (2016), “Marketing Analytics for Data-Rich Environments,”
Journal of Marketing, 80 (6), 97-121.
Wilson, Edwin B. (1927), “Probable Inference, the Law of Succession, and Statistical Inference,”
Journal of the American Statistical Association, 22(158), 209-212.
Zacks, Jeffrey M., Nicole K. Speer, and Jeremy R. Reynolds (2009), “Segmentation in Reading
and Film Understanding,” Journal of Experimental Psychology: General, 138, 307–327.

41
Footnotes:
1. Six-Second Commercials Are Coming to N.F.L. Games on Fox, by Sapna Maheshwari,
Aug 30, 2017 (accessed Sep 21, 2017), [available at
https://www.nytimes.com/2017/08/30/business/media/nfl-six-second-commercials.html]
2. nViso: Artificial Intelligence Emotion Recognition Software, www.nViso.ch
3. The Random Intercept, Index of the Longest Scene, the Audio Trend, Happiness Peak,
and Happiness Trend are “significant” for all cutoffs. The signs of the coefficients are
stable, but Average Happiness (c.o. < .2), Happiness End (c.o. = .2), Music Start (c.o. =
.2), and Volume Start (c.o. > .3) are significant only for some of the cutoffs.
4. We calculate the p-value based on tests on the relationship between proportions of two
groups (Newcombe 1998; Wilson 1927), using the stats package in R (prop.test) and set
the alternative hypothesis as greater or less. We calculate the Odds ratio as an effect-size
measure (e.g., Cornfield 1951), and calculate Cohen’s h for the lift measure of Streaming
Hours because the Odds ratio cannot be calculated in this case.

42
TABLE 1: THEORETICAL PREDICTIONS FOR THE EFFECTS OF VIDEO AND

AUDIO CHARACTERISTICS ON HAPPINESS AND WATCHING INTENTIONS
Variables affecting Variables affecting

Primitives References
happiness watching intentions
Video
Pacing and sequence Galak et al. 2011;2013; Number of scenes (-); Number of scenes (-);
Ratner et al.1999. Scene sequence (+). Scene sequence (+).
Delays and Nelson and Meyvis Average scene length Average scene length
interruptions 2008; Nowlis et al. (-); longest scene (-); longest scene
2004; Shiv and Nowlis index number (+). index number (+).
2004.
Audio
Moment-to-moment Bradley and Lang Volume level (+); Volume level (+);
level 2000; Lang 1995; Lang Music level (+). Music level (+).
et al. 1997.
Start, peak, end, and Zauberman et al. 2006. Volume start, peak, Volume start, peak,
trend end, and trend (+/-); end, and trend (+/-);
Music start, peak, end Music start, peak, end
and trend (+/-). and trend (+/-).
Emotions
Happiness Baumgartner et al. Happiness start (+/-),
1984; Fredrickson and peak (+), end (+), and
Kahneman 1993; trend (+).
Elpers et al. 2003.
Note: Expected direction (+ , - or +/-) of the effects in parenthesis.
43
TABLE 2: MODEL COMPARISON STATISTICS FOR THE FULL MODEL AND FIVE
MODELS THAT ARISE BY REMOVING VARIABLES FROM THE FULL MODEL
Model AIC DIC WAIC

1. Full Model 233519.2 233459.6 253212.9
2. Without Surprise and Disgust 233505.3 233444.4 253009.8
3. Without Emotion Variables 233592.9 233524.1 253406.2
4. Without Video Variables 233709.6 233640.5 254027.4
5. Without Audio Variables 233760.5 233675.9 257211.0
6. Without Intention 233603.9 233543.6 253536.8
44
TABLE 3: INCLUSION PROBABILITIES OF VARIABLES IN THE JOINT MODEL,

ESTIMATED WITH BAYESIAN VARIABLE SELECTION
Happiness Model Watch Intention Model Box Office Model

Variables Prob. Variables Prob. Variables Prob.
Scene 1.000 SceneNum .030 Holiday .481
SceneNum 1.000 SceneLengthAvg .053 MPAA=PG .637
SceneLengthAvg .395 SceneLongestInd .452 MPAA=PG13 .622
SceneLongestInd 1.000 Volume Peak .016 MPAA=R 1.000
Volume 1.000 Volume End .028 Ratings (from IMDb) .055
Volume Peak 1.000 Volume Start .416 NumRatingAbove3.5 1.000
Volume End 1.000 Volume Trend .714 Happiness Avg .265
Volume Start 1.000 Music Peak .029 Happiness Peak .226
Volume Trend 1.000 Music End .029 Happiness End .209
Music 1.000 Music Start .271 Happiness Start .225
Music Peak .046 Music Trend .142 Happiness Trend .230
Music End .346 Happiness Avg .196
Music Start .352 Happiness Peak .984
Music Trend .546 Happiness End .223
Happiness Start .898
Happiness Trend .625
Notes: The posterior means of inclusion in the model are reported.
Estimates that are greater than the cutoff of .2 are in bold.
45
TABLE 4: PARAMETER ESTIMATES CAPTURING THE EFFECTS OF

VARIABLES ON HAPPINESS, WATCHING INTENTION AND BOX OFFICE
PERFORMANCE IN THE JOINT MODEL
Model Posterior Posterior
Variables Mean SD
Component 2.50% 97.50%
Happiness Video Scene .002 .000 .001 .003
SceneNum -.023 .004 -.030 -.016
SceneLengthAvg -.005 .002 -.010 .000
SceneLongestInd .027 .002 .022 .031
Audio Volume .210 .066 .078 .341
Volume Peak -.032 .003 -.037 -.027
Volume End .010 .003 .003 .017
Volume Start -.005 .003 -.012 .001
Volume Trend -.008 .002 -.013 -.004
Music Music -.351 .104 -.553 -.144
Music End -.007 .004 -.015 .002
Music Start .029 .005 .019 .038
Music Trend .006 .004 -.002 .014
Linkage Random Intercept -.743 .188 -1.097 -.381
Random Slope .619 9.864 -18.660 19.650
Watch Intention Video SceneLongestInd .165 .065 .037 .293
Audio Volume Start -.126 .083 -.286 .035
Volume Trend -.221 .065 -.351 -.093
Music Music Start -.161 .082 -.321 -.003
Happiness Happiness Peak .514 .124 .273 .762
Happiness End .358 .139 .083 .631
Happiness Start -.111 .122 -.354 .126
Happiness Trend .237 .090 .064 .414
τ1 -.249 .383 -.999 .502
τ2 .692 .383 -.059 1.442
Threshold τ3 1.236 .384 .483 1.989
Parameters τ4 1.944 .388 1.183 2.705
τ5 2.794 .396 2.017 3.571
τ6 4.155 .416 3.340 4.970
Box Office Intercept -2.951 1.745 -6.407 .442
WatchIntention .562 .270 .033 1.089
Holiday .930 1.068 -1.205 2.953
MPAA=PG -2.666 1.052 -4.726 -.562
MPAA=PG13 -2.914 1.049 -5.009 -.888
MPAA=R -3.792 .928 -5.615 -1.974
NumRatingAbov3.5 1.986 .097 1.802 2.186
Happiness Happiness Avg 1.645 1.550 -1.450 4.684
Happiness Peak -.589 .852 -2.265 1.099
Happiness End .397 1.184 -1.884 2.751
Happiness Start -1.004 .996 -2.958 .925
Happiness Trend -1.375 .746 -2.847 .077
Notes: Parameters for which the 95% credible interval does not cover zero are in bold.
46
TABLE 5: RESULTS OF OPTIMAL CLIPS RESULTING FROM STEPWISE

REMOVAL OF SCENES FROM THE TRAILER, COMPARED TO A BENCHMARK
WITH THE FIRST SCENES OF THE TRAILER
Movie clips with audio Optimal clip Benchmark clip

Average number of scenes 3.57 (2.07) 4.76 (3.20)
Predicted watching Intention 3.83 (.92) 2.91 (.68)
Difference in watching intentions .92 (.77)
Percentage of clips with positive improvement 90.91%
Improvement in Box office revenue 3.17%
Movie clips without audio Optimal clip Benchmark clip
Average number of scenes 3.56 (2.07) 4.76 (3.20)
Predicted watching Intention 3.80 (.68) 3.28 (.55)
Difference in watching intentions .49 (.43)
Percentage of clips with positive improvement 90.91%
Improvement in box office revenue 1.75%
Average difference between clips with and without audio .04 (.53)
Percentage of clips with audio better than those without 57.58%
Notes: SD in parentheses.
One trailer only had one scene, so no optimization is performed.
The optimization of movie clips without audio is based on the model without sound variables.
47
TABLE 6: AVERAGES OF THE THREE EVALUATION MEASURES FOR THE

OPTIMIZED AND BENCHMARK CLIPS IN THE ONLINE VALIDATION
EXPERIMENT
Clips With Sound

Odd Life of Dark Mirror Project Rock of
Timothy Shadow Mirror X Ages
Benchmark Average 3.86 3.61 4.16 3.58 3.24
Optimized Average 4.42 3.97 5.07 4.38 3.74
% Difference 14.70 9.94 21.92 22.55 15.48
Clips Without Sound
What to Some Mirror Project Wander
Expect Guy Mirror X Lust
Benchmark Average 3.66 2.96 3.86 3.52 3.78
Optimized (no subtitles) Average 4.02 3.33 4.52 4.00 4.03
% Difference 9.75 12.49 17.10 13.65 6.38
Optimized (subtitles) Average 4.52 3.82 5.11 4.35 4.58
% Difference 23.40 29.16 32.27 23.67 20.98
48
FIGURE 1: CONCEPTUAL FRAMEWORK:

EMOTIONAL IMPACT OF SCENE STRUCTURE OF MOVIES, TRAILERS AND
CLIPS ON WATCHING INTENTIONS
Notes: Scenes are the basic building blocks of video content. The audio-visual scene structure of
movies elicits an intended emotional response. Scenes from the movie are selected for the trailer,
and scenes from the trailer are selected to produce a clip. The clip provides a representative
emotional experience and results in the intention to watch the full movie.
49
FIGURE 2: TOTAL SOUND VOLUME AND MUSIC VOLUME FOR ONE TRAILER (MEN IN BLACK 3)
Volume (in dB)
Time (in sec)
Note: The solid line indicates the total sound volume (in dB). The dotted line indicates the music volume (in dB) with vocals removed.
Vertical dashed lines indicate scene cuts.
50
FIGURE 3: HAPPINESS PROFILE AND HAPPINESS AND SCENE MEASURES FOR ONE INDIVIDUAL FOR A
SAMPLE TRAILER (MEN IN BLACK 3)
Happiness
Time (in sec)
Notes: The happiness measure ranges from 0 to 1 (the algorithm assigns a probability based on three sets of facial expression
measurements; details about the measurement are provided in Web Appendix I). Vertical dashed lines indicate scene cuts. The middle
shaded area is the region of the scene with the happiness peak; left and right shaded areas are start and end scenes. The horizontal
dashed line indicates 75% of the peak value; the dotted line is a linear fit used to represent the happiness trend.
51
Video Content Marketing: The Making of Clips
Xuan Liu, Savannah Shi, Thales Teixeira, and Michel Wedel
WEB APPENDIX
WEB APPENDIX I: FACIAL EXPRESSION RECOGNITION ALGORITHM IN SORCI ET

AL. 2010.
The facial recognition algorithm developed by Sorci et al (2010) and used by nViso
specifies the probability of six basic emotions (Ekman and Friesen 1971), namely, “happiness”,
“surprise”, “fear”, “disgust”, “sadness”, and “anger”. Using a multinomial logit model, the
algorithm assigns a probability to each of emotions based on three sets of facial expression
measurements: those based on the Facial Action Coding Systems (FACS); based on the
Expression Descriptive Units (EDU); and based on Appearance Parameters (AP).
The FACS developed by Ekman and Friesen (1978) is the leading standard for measuring
facial expressions. All visible movement of muscular activities on a face are categorized into
“action units” (AUs) and emotions are identified based on a unique combination of these AUs.
For example, happiness is characterized by two primary and three secondary action units. Zhang
et al. (2005) validated the classification of emotions based on the AUs and supplemented the
classification with auxiliary AUs and transient facial features, such as wrinkles and furrows.
The EDU was developed by Antonini et al. (2006), after recognizing that face
recognition also involves spatial configuration of facial features (Cabeza and Kato 2000; Farah et
al. 1998). The EDU encodes the interactions among facial features (e.g., the interactions between
eyebrows and mouth), in addition to the isolated AUs identified in FACS. Lastly, AP was
developed by Sorci et al. (2010) to provide a description of a face as a global entity.
Sorci et al. (2010) compared three different models: (1) the model with only measures
from FACS as explanatory variables; (2) the model with EDU and significant measures from
FACS in model 1 as explanatory variables; (3) the model with AP, and significant measures from
EDU and FACS in model 2 as explanatory variables. The model is specified as:
Emotion j  Intercept j   k 1 I kjK1  kjK1 FACSkK1 ,
K1
Model 1

K2 K
(W1)  I 2
k 1 kj
 kjK2 ( FACS  EDU ) kK2 , Model 2

K3 K
I 3
k 1 kj
 kjK3 ( FACS  EDU  AP) kK3 , Model 3
in which the Emotionj includes “happiness”, “surprise”, “fear”, “disgust”, “sadness”, “anger”,
“neutral”, “other”, and “I don’t know”; K1, K2, and K3 represent the total number of
measurements from FACS, FACS and EDU, and FACS, EDU and AP respectively; I is the
indicator variable, which equals 1 if the k-th is included for emotion j, and 0 otherwise; and the
intercept captures the average effect of factors that are not included.
The models are estimated by maximum likelihood, and model comparison statistics show
52
that the full model (model 3) performs best, and is used as the final model for facial expression
recognition algorithms.
References:
Antonini, Gianluca, Matteo Sorci, Michel Bierlaire, and Jean-Philippe Thiran (2006), "Discrete
Choice Models for Static Facial Expression Recognition," In Advanced Concepts for Intelligent
Vision Systems, 710-721. Springer Berlin/Heidelberg.
Cabeza, Roberto, and Takashi Kato (2000), "Features Are Also Important: Contributions of
Featural And Configural Processing To Face Recognition," Psychological Science 11(5), 429-
433.
Ekman, Paul, and Wallace V. Friesen (1971), "Constants across Cultures in the Face and
Emotion," Journal of Personality and Social Psychology 17(2), 124-129.
Ekman, Paul, and Wallace V. Friesen (1978), Facial Action Coding System Investigator’s Guide.
Consulting Psycologist Press, Palo Alto, CA.
Farah, Martha J., Kevin D. Wilson, Maxwell Drain, and James N. Tanaka (1998), "What is"
Special" about Face Perception?," Psychological Review, 105(3), 482-498.
Zhang, Yongmian, and Qiang Ji (2005) "Active and Dynamic Information Fusion for Facial
Expression Understanding from Image Sequences," IEEE Transactions on Pattern Analysis and
Machine Intelligence, 27(5), 699-714.
53
WEB APPENDIX II: JAGS CODE
model{
for (j in 1:numTrailer){
for (i in 1:nObsMatrix[,j]){
### moment-to-moment emotion model

y[index[i,j], 1, j] ~ dnorm(0, 0.01)
for (t in 2:tMatrix[,j]){
theta[index[i,j],t,j] <- zeta[1] * s[t,j] +
zeta[2]*x4_eq1[index[i,j],7,j]+zeta[3]*x4_eq1[index[i,j],15,j]+zeta[4]*x4_eq1[index[i,j],16,
j]+zeta[5] * Vol[t,j]
+zeta[6]*x4_eq1[index[i,j],32,j]+zeta[7]*x4_eq1[index[i,j],34,j]+zeta[8]*x4_eq1[index[i,j],
35,j]+zeta[9]*x4_eq1[index[i,j],14,j]+ zeta[10] *
Music[t,j]+zeta[11]*x4_eq1[index[i,j],38,j]+zeta[12]*x4_eq1[index[i,j],39,j]+zeta[13]*x4_e
q1[index[i,j],17,j]+u[index[i,j], 1] + u[index[i,j], 2] * t
y[index[i,j],t,j] ~ dnorm(theta[index[i,j],t,j], sigma)
}
## end-point watch intention model

mu_c[index[i,j],j] <- alpha[1]*x4_eq1[index[i,j],16,j]+
alpha[2]*x4_eq1[index[i,j],35,j]+alpha[3]*x4_eq1[index[i,j],14,j]+
alpha[4]*x4_eq1[index[i,j],39,j]+
alpha[5]*x4_eq1[index[i,j],21,j]+alpha[6]*x4_eq1[index[i,j],30,j]+alpha[7
]*x4_eq1[index[i,j],12,j]+ v[1] * u[index[i,j], 1] + v[2] * u[index[i,j], 2]
+u[index[i,j], 3]
logit(Q[index[i,j],j,1]) <- tau[1] - mu_c[index[i,j], j]

p[index[i,j],j,1] <- Q[index[i,j],j,1]
for (d in 2:(D-1)) {
logit(Q[index[i,j],j,d]) <- tau[d] - mu_c[index[i,j],j]
p[index[i,j],j,d] <- Q[index[i,j],j,d] - Q[index[i,j],j,(d-1)]
}
p[index[i,j],j,D] <- 1 - Q[index[i,j],j,(D-1)]
C[index[i,j],j] ~ dcat(p[index[i,j],j,1:D])
C_Pred[index[i,j],j] <-sum(p[index[i,j],j,1:D]*(1:D))
} #i
## end-point (log) Box-office revenue model

WatchMovieTemp[j]<-sum(C_Pred[index[1:nObsMatrix[,j],j],j])/nObsMatrix[,j]
happiness_peak_numtemp[j]<-sum(x4_eq1[index[1:nObsMatrix[,j],j],2,j])/nObsMatrix[,j]
happiness_peak_durationtemp[j]<-
sum(x4_eq1[index[1:nObsMatrix[,j],j],5,j])/nObsMatrix[,j]
happiness_avg_temp[j]<-sum(x4_eq1[index[1:nObsMatrix[,j],j],9,j])/nObsMatrix[,j]
happiness_coftemp[j]<-sum(x4_eq1[index[1:nObsMatrix[,j],j],12,j])/nObsMatrix[,j]
54
happiness_peaktemp[j]<-sum(x4_eq1[index[1:nObsMatrix[,j],j],21,j])/nObsMatrix[,j]
happiness_peakIndextemp[j]<-sum(x4_eq1[index[1:nObsMatrix[,j],j],24,j])/nObsMatrix[,j]
happiness_endtemp[j]<-sum(x4_eq1[index[1:nObsMatrix[,j],j],27,j])/nObsMatrix[,j]
happiness_starttemp[j]<-sum(x4_eq1[index[1:nObsMatrix[,j],j],30,j])/nObsMatrix[,j]
mu_bo[j]<-vbo[1]*WatchMovieTemp[j]+vbo[2]*Holiday[j]
+vbo[3]*mpaaPG[j]+vbo[4]*mpaaPG13[j]+vbo[5]*mpaaR[j]+vbo[6]*log(Numratingabov
e3_5[j]+1)+vbo[7]*happiness_avg_temp[j]+vbo[8]*happiness_peaktemp[j]+vbo[9]*happi
ness_endtemp[j]+vbo[10]*happiness_starttemp[j]+vbo[11]*happiness_coftemp[j]
+vbo[12]
LGBoxOffice[j] ~ dnorm(mu_bo[j], sigma_bo)
} #j
for (i in 1:nRespondent){
for (f in 1:3){
u[i,f] ~ dnorm(0, sigma_m[f])
}
}
for (r1 in 1:3){
sigma_m[r1] ~ dgamma(0.01,0.01)
}
for (d in 1:D){
tau0[d] ~ dnorm(0, 0.001)
}
tau <- sort(tau0)
for (m in 1:nAlpha){
alpha[m] ~ dnorm(0, 0.01)
}
for (l in 1:nV){
v[l] ~ dnorm(0, 0.01)
}
sigma ~ dgamma(0.01, 0.01)
for (ll in 1:nZeta){

zeta[ll] ~ dnorm(0, 0.01)
}
sigma_bo ~ dgamma(0.01, 0.01)
for (ll in 1:nvbo){

vbo[ll] ~ dnorm(0, 0.01)
}
}"

Video Content Marketing The Making of Clips

Uploaded by

Copyright:

Available Formats

Video Content Marketing The Making of Clips

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Video Content Marketing The Making of Clips

Uploaded by

Copyright:

Available Formats

© 2018, American Marketing Association

VIDEO CONTENT MARKETING: THE MAKING OF CLIPS

Xuan Liu1, Savannah Shi2, Thales Teixeira3, and Michel Wedel4

VIDEO CONTENT MARKETING: THE MAKING OF CLIPS

effectiveness, focusing on comedy movies as an application. Web-based facial-expression is

tool to market movies and other online content.

Keywords: Video Content Marketing, Trailers, Clips, Emotions, Facial-Expression Tracking

traileraddict.com, booktrailersforreaders.com, IMDb.com, comingsoon.net), and trailers

marketing channels. However, according to a manager at Netflix, “The current approach of

sample of emotions and, ultimately, watch the full content.

elementary unit for the production of trailers and clips.

algorithm for clip production. First, in an online facial-expression-tracking experiment, viewers

Fourth, we validated the proposed approach in an online experiment, as well as in a large-scale

and, thus, is scalable.

the recent trend, in practice, toward shorter advertising messages.1

LITERATURE REVIEW AND CONCEPTUAL FRAMEWORK

and O’Guinn 1984).

and cut scenes to produce an effective promotional clip.

[INSERT FIGURE 1 ABOUT HERE]

affecting consumption experiences. A fast-paced consumption leads to a decrease in enjoyment

impact of the consumption sequence. As Nowlis et al. (2004) demonstrated, a delay in

trailer, happiness levels and watching intention will be improved.

Of particular importance in producing clips is sound, which includes voice, special

been shown to particularly amplify the emotional experience if it occurs in a synchronized

predictions on the effects of these specific measures on happiness or watching intentions.

trend of the emotional experience — whether it is increasing, stable, or decreasing — also

positive trend in happiness results in higher watching intentions.

In Table 1, we summarize the predicted relationships between the scene-level factors of

end, and trajectory of audio and emotions should matter.

[INSERT TABLE 1 ABOUT HERE]

moment to moment from a large sample of viewers’ reactions.

if they completed the experiment.

Participants and Stimuli

the beginning of the experiment.

checked afterwards for compliance.

experiment took up to 45 minutes.

three to 19, with an average of 8.43 (SD = 3.14) participants.

Emotion Measurements and Their Validity

accurate moment-to-moment identification of emotions (Fasel and Luettin 2003).

Movie Trailer Video and Audio Variables

and audio processing software, which yielded the following variables.

Detector” (http://www.scene-detector.com), which is software that detects the scene boundaries

location of the longest scene in the trailer.

SoX (http://sox.sourceforge.net). The absolute values were averaged on a second-by-second

lines indicate the scene cuts).

[INSERT FIGURE 2 ABOUT HERE]

(SD = .016), with an average peak volume of .041 dB (SD = .038).

Intensities of happiness were measured for each participant on a second-by-second basis.

[INSERT FIGURE 3 ABOUT HERE]

summer or Christmas holiday season).

affect the moment-to-moment emotional experience; b) aggregate measures of emotions,

components to select the specific measures that predict watching intention.

First, the logit-transformed happiness probabilities for individual i watching trailer j at

time t are denoted as hijt , and are modeled as

(1) hijt = θijt + eijt .

previous research treated moment-to-moment measurements of emotions as fixed exogeneous

emotions are classified independently on a frame-by-frame basis, it is not unreasonable to

(2) θijt = 𝐖1i (t) + 𝐗1j 𝛃 + ζ1 Sjt + ζ2 Vjt + ζ3 Mjt