Testing Chapter 3
Testing Chapter 3
Testing Chapter 3
Traditional test automation approaches for video quality involve code that combs through every frame of a video during
playback, under various network, device, and encoder conditions, and tries to detect known types of failure modes through the
image pixel patterns. Such approaches fall short in several ways, and their impact on the end user is difficult to quantify. The
failure modes of video streaming range from simple blank screens to horizontal vertical lines to freezing frames, low frame rates,
and more. But knowing how all these functional issues detract from the user experience is difficult. What if the issues happen
during the credits versus the core of an action sequence? What if multiple failure modes occur at the same time? Automation
today can find some of the issues, but it can’t map them back to the true oracle—the end user.
Subjective measures of quality are made even more difficult because videos can be riddled with issues that go unnoticed because
they are in noisy or less relevant areas of the screen. For example, Figure 3-1 shows a scene in a television series with people
talking. During a given scene, these videos may have issues in the furniture, windows, props, or even people in the background.
However, because the human eye is likely focused on the areas of the video where the subjects are speaking in the foreground,
the background issues may not be noticed at all and the video segment is given a perfect 10 out of 10. To be great at assessing
video quality, test automation would have to simulate all of this!
1. Gather training data in the form of video clips and their associated human MOS ratings.
2. Train AI bots to analyze the video clips, including teaching them to recognize the following:
1. Interesting or attention-grabbing objects in a scene
2. Video quality glitches within the bounds of the interesting objects
3. Train the bots to map all of the above video outputs to corresponding MOS values.
4. Leverage the bots to predict the video quality for streams it has not previously seen.
By creating a set of internal video benchmarks and working with crowd-sourced human judges, my R&D team at test.ai was able
to use the aforementioned approach to implement a video quality MOS predictor. Although their first model was naive, the
training results were relatively good, with a 65% correlation between the AI-predicted scores and the human MOS values. Note
that correlations of 70% or higher are considered strong. However, after enhancing the naive model with features that accounted
for video categories and scenes, they saw the correlation rise to 78% and 89%, respectively. Being perfectionist data nerds, they
worked further to train the model on larger, independent, academic datasets of videos with MOS scores.1 Figure 3-2 shows the
results of one of the best-performing categories, Asian Fusion, which yielded a correlation of 93%.
Figure 3-2. AI predictions versus human ratings on video quality for the Asian Fusion category
With AI on its way to replicating human MOS judgment scores, perhaps engineers developing video-streaming solutions will
soon be able to get near real-time estimates of video quality that are just as good as human ratings. Although there is more work
to be done, the progress so far in this area is promising and serves as a good example of how to apply AI to today’s test
automation challenges.
AI has been playing games for as long as we can probably remember, so why not test them too? Let’s investigate the exciting
world of video game testing as a final look at how AI is bridging the automation gap.
With a lack of automated tool support for testing video games end to end, you’ll find that many gaming companies do not even
have smoke or sanity tests in place for their gaming applications. A smoke or sanity test is a very basic test that helps you to
determine whether a new version of the application is stable enough for further testing. For example, a video game smoke test
may launch the game app, navigate through the menus to start a new game session, and interact with the gameplay elements once
the game has loaded. Recall from “Limitations of Traditional Approaches” that the reason there is a lack of support is that most
functional UI automation tools require applications to have a DOM. However, due to their highly visual and graphical nature,
many parts of gaming applications lack such a model. Fortunately, AI-powered testing tools are able to use computer vision to
interact with the game menu icons and other widgets to make this type of automation possible. Furthermore, AI for game
application testing extends beyond smoke tests, as this technology is enabling gaming companies to define complete functional
testing scenarios for locally hosted games and those streamed from remote servers.
Figure 3-3. An add-on weapon rotating in the game store menu of a popular title
On the other hand, AI-based solutions work great to classify objects under these types of circumstances. You can train the AI
using images captured from a video of the animated axe and, just like a human, the machine is able to recognize the axe from
almost any angle or in combination with different types of lighting and artistic effects. AI therefore makes it possible to identify
dynamically rendered items in video games—a feat that foils conventional test automation approaches. This testing capability is
particularly important for in-game stores since they have a direct revenue stream. However, being able to classify these game
assets is just the first step. Next, we’ll learn how AI verifies game assets to make sure that gamers aren’t surprised by cosmetic
bugs after purchasing items from the store.
1. Visually isolate the baseline asset that represents the ground truth by removing the background. You can do this by
training an ML model yourself or using a third-party tool that does the work for you.
2. On the new version of the software, run a test that uses image-based UI element selection to navigate to the asset
in the game store and visually capture the test observation. You will have to run background removal on the test
observation since the bot will capture the raw image of the asset from the game.
3. Generate an image mask highlighting the differences between the baseline and test observation images. You can
train an ML model to do this, but first you may have to read up on how to compare two images using OpenCV and
Python.2
Figure 3-4. Detecting visual differences in game assets during AI-driven test automation
Aside from visual game asset verification, you may also want to test the audio assets of your game. For audio, let’s look at a
different but equally powerful way to carry out the task of comparing gaming assets during testing. In this case, instead of trying
to detect differences, you can frame the audio verification problem as one where you are looking for similarities between a
baseline audio file and one observed during testing. Figure 3-5 illustrates one way to verify gaming sound effects. It shows three
spectrograms, which are graphical representations of the amplitude and frequency of a sound wave over time. Spectrogram A on
the left of Figure 3-5 is the baseline sound of an axe being swung, while Spectrogram B on the far right is the sound of the same
axe being swung in a subsequent build of the game. The approach to automatically testing these sound effects is to compute the
cross-correlation, also called a sliding dot product, represented by Spectrogram C in the middle of the figure. The general idea is
that, accounting for any differences in amplitude and time, the model keeps one signal fixed and determines if the other signal
sweeps through it.
Figure 3-5. Testing sound effects in video games by using cross-correlation to determine similarity
Now that I’ve walked you through two asset verification approaches, one for visual assets and the other for audio, you should have a
good understanding of how this technology can be applied to game testing. Although at times it feels like magic, AI and subfields like
computer vision are rooted in mathematics, computation, logic, and other disciplines including philosophy, psychology, biology, and
evolution.3 As such, there are several other ways to leverage AI to tackle these problems.
Figure 3-6 provides a snapshot of this approach in action, which involves describing test steps as high-level goals for an AI bot,
and then, through the process of reinforcement learning, the AI is left to play the game until it achieves the goal. A test for an
online shooting game may be to explore a location; collect weapons, ammo, and equipment upgrades along the way; and then
score a kill. While the bot is autonomously executing the gameplay steps from the test, it is also checking for cosmetic bugs and
audio glitches. If the bot is able to accomplish the tasks without discovering any bugs, then you know that those features are
working as intended for the given scenario—all without human intervention. Furthermore, you can give the bot goals that seek to
intentionally explore the boundaries of the gameplay environment, or you can reward it for actions such as jumping on walls and
ceilings, in hopes of breaking the game physics. As the application evolves and gameplay changes, your AI-based testing
approach autonomously relearns how to accomplish the same tasks in the new environment.
Figure 3-6. Gameplay testing using goal-based, reinforcement learning and asset verification
Conclusion
AI is enabling the automated user experience testing of graphically intensive, multimedia applications, an area that conventional
tools still do not support well. Video streaming and gaming are examples of two industries where this technology is proving to be
a game changer, no pun intended. However, several additional opportunities exist for automating multimedia applications.
Computer-aided design (CAD) software used in the automotive, shipbuilding, aerospace, architecture, and prosthetics industries,
among others, is eagerly awaiting better automated testing solutions. I truly hope that, as with video streaming and gaming, AI
will soon be giving test automation in those areas the “eyes” and “ears” it needs to succeed.
1 “Image and Video Quality Assessment at LIVE,” Laboratory for Image & Video Engineering, The University of Texas at
Austin, accessed June 3, 2021.
2 Adrian Rosebrock, “Image Difference with OpenCV and Python,” PyImageSearch, June 19, 2017.
3 John A. Bullinaria, “IAI: The Roots, Goals, and Sub-Fields of AI,” University of Birmingham, 2005.
4 “OpenAI Five: 2016–2019,” OpenAI, 2019.