Testing Chapter 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Chapter 3.

Automating Graphically Intensive Apps


Recall from Chapter 1 that one of the drawbacks of traditional UI testing tools is the lack of support for automating highly
graphical applications. In this chapter, you’ll catch a few glimpses of how AI is pushing test automation into uncharted
territories—video streaming and game testing.

AI for Video Stream Testing


Whether watching a movie or show on demand on Netflix, getting live match updates on NBA.com, logging on to a Zoom
meeting for work, or playing games via PlayStation Now, people are becoming more and more dependent on video-streaming
technologies for business and entertainment. Video streaming is now a core experience in many familiar applications and
next-generation gaming systems. Automating the testing of video streams is quite challenging. In this section, you’ll learn about
the current state of the practice and then dive into how AI is improving the level of automation.

Video Stream Testing Practices


Best practices for testing video stream quality today involve asking humans to watch videos and rate them using a mean opinion
score (MOS). The MOS is a value in the range of 1 to 5, where 1 is the lowest perceived quality and 5 is the highest perceived
quality. Not only is this manual approach an expensive and slow way to test video streaming quality, but it doesn’t scale. When
network engineers at streaming companies make changes to networks, tweak their decoders, or switch to new devices or carriers,
they often can’t do a full manual pass at testing the new or modified configurations. The traffic on the network, the variety of
devices and even the types of videos make it difficult to measure the user experience.

Traditional test automation approaches for video quality involve code that combs through every frame of a video during
playback, under various network, device, and encoder conditions, and tries to detect known types of failure modes through the
image pixel patterns. Such approaches fall short in several ways, and their impact on the end user is difficult to quantify. The
failure modes of video streaming range from simple blank screens to horizontal vertical lines to freezing frames, low frame rates,
and more. But knowing how all these functional issues detract from the user experience is difficult. What if the issues happen
during the credits versus the core of an action sequence? What if multiple failure modes occur at the same time? Automation
today can find some of the issues, but it can’t map them back to the true oracle—the end user.

Subjective measures of quality are made even more difficult because videos can be riddled with issues that go unnoticed because
they are in noisy or less relevant areas of the screen. For example, Figure 3-1 shows a scene in a television series with people
talking. During a given scene, these videos may have issues in the furniture, windows, props, or even people in the background.
However, because the human eye is likely focused on the areas of the video where the subjects are speaking in the foreground,
the background issues may not be noticed at all and the video segment is given a perfect 10 out of 10. To be great at assessing
video quality, test automation would have to simulate all of this!

Automating Video Quality Testing with AI


Can you already see how to leverage AI and ML for automating video stream testing? By now you’ve probably noticed two
distinct patterns in AI-driven testing solutions. The first is that you use computer vision to give the bots “eyes” like humans.
Next, you transform the aspects of the problem that require human judgment in ML training exercises. Here’s what an AI-based
approach for video stream test automation looks like:

1. Gather training data in the form of video clips and their associated human MOS ratings.
2. Train AI bots to analyze the video clips, including teaching them to recognize the following:
1. Interesting or attention-grabbing objects in a scene
2. Video quality glitches within the bounds of the interesting objects
3. Train the bots to map all of the above video outputs to corresponding MOS values.
4. Leverage the bots to predict the video quality for streams it has not previously seen.

By creating a set of internal video benchmarks and working with crowd-sourced human judges, my R&D team at test.ai was able
to use the aforementioned approach to implement a video quality MOS predictor. Although their first model was naive, the
training results were relatively good, with a 65% correlation between the AI-predicted scores and the human MOS values. Note
that correlations of 70% or higher are considered strong. However, after enhancing the naive model with features that accounted
for video categories and scenes, they saw the correlation rise to 78% and 89%, respectively. Being perfectionist data nerds, they
worked further to train the model on larger, independent, academic datasets of videos with MOS scores.1 Figure 3-2 shows the
results of one of the best-performing categories, Asian Fusion, which yielded a correlation of 93%.

Figure 3-2. AI predictions versus human ratings on video quality for the Asian Fusion category

With AI on its way to replicating human MOS judgment scores, perhaps engineers developing video-streaming solutions will
soon be able to get near real-time estimates of video quality that are just as good as human ratings. Although there is more work
to be done, the progress so far in this area is promising and serves as a good example of how to apply AI to today’s test
automation challenges.

AI for Game Testing


Testing modern video games is no small feat! Today’s video games are highly visual, graphical, customizable, updatable,
streamable, and cross-platform compatible. Game testers must cover multiple dimensions of an uncountably infinite state space in
a constantly evolving application. These dimensions include testing the functionality of the gaming application itself, visual
aesthetics of graphics and animations, and trying to break the gameplay mechanics.

AI has been playing games for as long as we can probably remember, so why not test them too? Let’s investigate the exciting
world of video game testing as a final look at how AI is bridging the automation gap.

Video Game–Testing Practices


Writing automation to test gameplay is extremely challenging. Any approach that hardcodes assumptions will constantly break as
the underlying game engine and its elements are usually in flux. The content, characters, items, maps, and storylines of games are
continually evolving. Some games even let the user change these elements in real time as they play. Software development
toolkits for games generally come with a low-level internal test-scripting language. This scripting language enables developers to
write unit tests for verifying core game logic, and that layer of the application tends to be quite stable. However, the language
lacks the ability to drive the user interface and runtime rendering. As a result, automated testing at this level cannot catch the
issues that real-world players encounter during gameplay or when navigating the gaming application.

Testing Game Applications with AI

With a lack of automated tool support for testing video games end to end, you’ll find that many gaming companies do not even
have smoke or sanity tests in place for their gaming applications. A smoke or sanity test is a very basic test that helps you to
determine whether a new version of the application is stable enough for further testing. For example, a video game smoke test
may launch the game app, navigate through the menus to start a new game session, and interact with the gameplay elements once
the game has loaded. Recall from “Limitations of Traditional Approaches” that the reason there is a lack of support is that most
functional UI automation tools require applications to have a DOM. However, due to their highly visual and graphical nature,
many parts of gaming applications lack such a model. Fortunately, AI-powered testing tools are able to use computer vision to
interact with the game menu icons and other widgets to make this type of automation possible. Furthermore, AI for game
application testing extends beyond smoke tests, as this technology is enabling gaming companies to define complete functional
testing scenarios for locally hosted games and those streamed from remote servers.

Testing Game Stores with AI


Many video games now have an in-game store that allows players to purchase add-ons in the form of downloadable content.
Although you can view this type of testing as an extension of functional UI testing, it raises some unique testing challenges that
make it an apt illustrative example. Items in the game store’s UI tend to be animated, constantly changing position and orientation
on the screen. They may also appear on the screen in several different orderings as a player can pick any given item at a moment
in time. Figure 3-3 provides three screenshots taken of an axe from the game store UI of a popular game. In the menu, this
particular axe rotates around the y-axis. As you can see from the illustration, as the axe rotates, it becomes more difficult to
recognize that it is the same axe or even an axe at all. Now imagine that this axe glows, pulses, and sparkles periodically as it
rotates. When games render objects like this at runtime with varying visual effects, traditional image-seeking or pixel-scanning
automation techniques fail to identify them correctly.

Figure 3-3. An add-on weapon rotating in the game store menu of a popular title

On the other hand, AI-based solutions work great to classify objects under these types of circumstances. You can train the AI
using images captured from a video of the animated axe and, just like a human, the machine is able to recognize the axe from
almost any angle or in combination with different types of lighting and artistic effects. AI therefore makes it possible to identify
dynamically rendered items in video games—a feat that foils conventional test automation approaches. This testing capability is
particularly important for in-game stores since they have a direct revenue stream. However, being able to classify these game
assets is just the first step. Next, we’ll learn how AI verifies game assets to make sure that gamers aren’t surprised by cosmetic
bugs after purchasing items from the store.

Testing Gaming Assets with AI


Figure 3-4 illustrates one approach to automatically detecting cosmetic bugs in gaming assets with AI. The approach involves
training ML models to visually identify and classify each type of visual asset (e.g., character model, weapon, etc.) on the screen
and compare these ground truth images to visuals from subsequent software releases. These are the key steps of the approach as
they relate to Figure 3-4:

1. Visually isolate the baseline asset that represents the ground truth by removing the background. You can do this by
training an ML model yourself or using a third-party tool that does the work for you.
2. On the new version of the software, run a test that uses image-based UI element selection to navigate to the asset
in the game store and visually capture the test observation. You will have to run background removal on the test
observation since the bot will capture the raw image of the asset from the game.
3. Generate an image mask highlighting the differences between the baseline and test observation images. You can
train an ML model to do this, but first you may have to read up on how to compare two images using OpenCV and
Python.2
Figure 3-4. Detecting visual differences in game assets during AI-driven test automation

Aside from visual game asset verification, you may also want to test the audio assets of your game. For audio, let’s look at a
different but equally powerful way to carry out the task of comparing gaming assets during testing. In this case, instead of trying
to detect differences, you can frame the audio verification problem as one where you are looking for similarities between a
baseline audio file and one observed during testing. Figure 3-5 illustrates one way to verify gaming sound effects. It shows three
spectrograms, which are graphical representations of the amplitude and frequency of a sound wave over time. Spectrogram A on
the left of Figure 3-5 is the baseline sound of an axe being swung, while Spectrogram B on the far right is the sound of the same
axe being swung in a subsequent build of the game. The approach to automatically testing these sound effects is to compute the
cross-correlation, also called a sliding dot product, represented by Spectrogram C in the middle of the figure. The general idea is
that, accounting for any differences in amplitude and time, the model keeps one signal fixed and determines if the other signal
sweeps through it.

Figure 3-5. Testing sound effects in video games by using cross-correlation to determine similarity

Now that I’ve walked you through two asset verification approaches, one for visual assets and the other for audio, you should have a
good understanding of how this technology can be applied to game testing. Although at times it feels like magic, AI and subfields like
computer vision are rooted in mathematics, computation, logic, and other disciplines including philosophy, psychology, biology, and
evolution.3 As such, there are several other ways to leverage AI to tackle these problems.

Gameplay Testing with AI


Machines have been playing games for as long as I can remember, from brute force approaches that exhaustively search the state
space of games like tic-tac-toe and chess to the more recent deep ML techniques capable of playing online, cooperative,
multiplayer games like Dota 2.4 Neural networks like Open AI Five leverage reinforcement learning at a large scale to train bots
that play together as a team, and they are able to win back-to-back games against Dota world champions. Bringing together
everything you’ve learned so far about functional UI and game asset testing with AI and combining it with the ability for AI to
reach expert levels of gameplay via self-play and reinforcement yields a powerful approach to testing modern video games.

Figure 3-6 provides a snapshot of this approach in action, which involves describing test steps as high-level goals for an AI bot,
and then, through the process of reinforcement learning, the AI is left to play the game until it achieves the goal. A test for an
online shooting game may be to explore a location; collect weapons, ammo, and equipment upgrades along the way; and then
score a kill. While the bot is autonomously executing the gameplay steps from the test, it is also checking for cosmetic bugs and
audio glitches. If the bot is able to accomplish the tasks without discovering any bugs, then you know that those features are
working as intended for the given scenario—all without human intervention. Furthermore, you can give the bot goals that seek to
intentionally explore the boundaries of the gameplay environment, or you can reward it for actions such as jumping on walls and
ceilings, in hopes of breaking the game physics. As the application evolves and gameplay changes, your AI-based testing
approach autonomously relearns how to accomplish the same tasks in the new environment.

Figure 3-6. Gameplay testing using goal-based, reinforcement learning and asset verification

Conclusion
AI is enabling the automated user experience testing of graphically intensive, multimedia applications, an area that conventional
tools still do not support well. Video streaming and gaming are examples of two industries where this technology is proving to be
a game changer, no pun intended. However, several additional opportunities exist for automating multimedia applications.
Computer-aided design (CAD) software used in the automotive, shipbuilding, aerospace, architecture, and prosthetics industries,
among others, is eagerly awaiting better automated testing solutions. I truly hope that, as with video streaming and gaming, AI
will soon be giving test automation in those areas the “eyes” and “ears” it needs to succeed.

1 “Image and Video Quality Assessment at LIVE,” Laboratory for Image & Video Engineering, The University of Texas at
Austin, accessed June 3, 2021.
2 Adrian Rosebrock, “Image Difference with OpenCV and Python,” PyImageSearch, June 19, 2017.
3 John A. Bullinaria, “IAI: The Roots, Goals, and Sub-Fields of AI,” University of Birmingham, 2005.
4 “OpenAI Five: 2016–2019,” OpenAI, 2019.

You might also like