Project3-Arc1 1
Project3-Arc1 1
Project3-Arc1 1
This research discusses the use of reinforcement learning and convolutional neural networks
(CNNs) to train agents (computer algorithms) to play video games. The hit mobile game 2048 is
the example used. This research describes how reinforcement learning is used to train the neural
network and how agents interact with their environment using non-cooperative Game Theory
principles. The article also explains the use of different neural network architectures as a factor
Arc 1
The entirety of this research is based on the implementations of neural networks. A neural
network is a computational model inspired by the structure and function of the human brain. It
consists of interconnected nodes or neurons that are organized into layers and use a set of
mathematical functions to process information. By learning from data, neural networks can be
trained to recognize patterns and make predictions, and have found applications in various fields,
including computer vision, natural language processing, and speech recognition. Reinforcement
learning is the intended method used to train the neural network. Reinforcement learning is a
method of teaching a computer program, known as an agent, to learn on its own by interacting
with its environment. It's analogous to instructing a robot to do something without explicitly
telling it what to do. The agent learns by attempting different tasks in its surroundings and
receives rewards or penalties according on its performance. The agent learns from its mistakes in
the same way that organisms do, and seeks to do better in the next attempt. The agent improves
at attaining its goal over time, similarly to how humans improve at a game the more they play it.
Reinforcement relies on the principles of Game Theory to function. According to Dr. Başar1,
Game Theory, in the context of Machine Learning, is an agent’s interaction with its non-
function3 to ultimately approach the most optimal solution for a given scenario. (Zhang, B. J.-F.,
2020). Combining Reinforcement Learning with Game Theory means agents use action-
generating algorithms coupled with the data they collect through observations. The environment
in turn reflects on these actions using ‘rewards’ they receive during the decision-making process.
Reinforcement Learning itself is modeled from behaviors in nature. The general principle was
modeled by Edward Thorndike4. This experiment began with cats placed in boxes. While
exploring their environment, they would step on a level by chance that resulted in an escape
route (a positive reward in the context of machine learning). The cats would then begin to
correlate the level with a positive reward, and increase their speed and efficiency at escaping.
Today, deep neural networks are used as the “agents” in reinforcement learning. These models
are the ones processing observations, generating reward values, and ultimately finding the
1 Dr. Tamar Başar is a Professor at University of Illinois at Urbana-Champaign whose research focuses
on topics including applications of control and game theory in economics and mean-field game theory.
(Başar, 2020).
2 Regression algorithms estimate a function from the input variables to continuous output variables.
3 Classification algorithms estimate a function from the input variables to discrete output variables.
4 A function that maps the error or “cost” of an agent's interactions with its environment.
Video games are a very popular tool in training reinforcement learning algorithms. This is due to
the fact that each action the agent takes will generate state data from the game, thus resulting in a
dynamic dataset able to train the agent until it resets the environment. A prime example of a
game like this is 2048, a single-player stochastic sliding puzzle game. 2048 is played on a 4 by 4
grid and begins with two randomly placed tiles— the initial state. Each tile on the puzzle is either
an empty tile or a tile numbered with a power of two (2, 4, 8, 16, 32, 64, etc.). The player selects
a direction, up, down, left, or right, to slide the puzzle. Tiles of the same number slid together
will combine to form the next value in the 2n sequence. As seen in Figure 1, there are multiple
methods of feature extraction. By creating tuples, graduate students at National Chiao Tung
University were able to submit a series of neural networks with win rates ranging from 85% to
The types of neural networks used in Video Games with a visual state, 2048 included, are called
Convolutional Neural Networks (CNNs). The network architecture of CNNs was initially
inspired by connections of neurons and synapses in the brain. The first part of the CNN is the
convolution layer. This serves as a feature extraction5 layer, using kernels to identify features in
the images such as edges. After every convolution operation, the output data is transformed using
an activation function, typically ReLU6, which is defined as y={x< 0 :0 , x ≥ 0: x }. The next type
of layer is a pooling layer, which is used to reduce the number of parameters from previous
layers. There are three common types of pooling methods: max, min, and average pooling. Max
pooling, in most cases, has better performance than the other two methods. Using this method,
pooling is done by finding the maximum values as shown in Figure 2. Lastly, a fully connected
layer is implemented which is connected directly to the output layers (Yamashita, 613).
For every neural network, data is used to minimize or optimize the cost or utility function,
respectively. For a traditional CNN, a dataset of images with a variable amount of outputs will be
used. The goal of the CNN is to predict the label(s) based on the image. Image preprocessing
methods, such as skewing7, cropping8, and greyscaling9 may also be used to train the model as
the data retains the same dimensionality. In reinforcement learning, there is not a static dataset.
5 A function that assigns “utility” values of the agents actions within its environment. A greater outputted
utility indicates further progress towards the agent completing its goal.
6 Rectified Linear Unit (ReLU) is an activation function that is a piecewise linear function.
7 Skewing refers to slanting an image, typically changing its viewpoint through mathematical operations.
8 Cropping is the removal of sections in an image.
9 Grayscaling is the process of converting an image from a color space to a one-dimensional matrix
consisting of shades of grey.
Instead, a new image is generated each time the agent makes an action. After the action is
performed, a new state representation of 2048 is returned to the agent, for it to again draw
To integrate the reinforcement learning field and the CNN, a type of model is used called a Deep
Q-Network (DQN). A DQN model has the same architecture as the CNN in that it contains
convolutional, pooling, and fully connected layers. The output layer of the DQN, however, is
size n where n is the total number of possible actions that the agent can take. The output layer
returns Q-values, or that are scalars as the DQN is a regression algorithm10, rather than a
classification algorithm11). The algorithm with the highest normalized scalar value becomes the
Python13. Using this library, a user can load in environments— usually video games that are
simulated through Python, and allow agent interaction. The environment returns pixel-state14 or
RAM data15, and the agent then uses this data to return an action which is passed back into the
OpenAI Gym. From this action, a new information state is returned, and the process repeats
(McElwee, 3).
10 Feature extraction reduces the dimensions of the data by creating new features from previously
existing information, patterns, features, etc.
11 Edward Thorndike (1874-1949) was a psychologist who spent most of his time researching and
developing reinforcement theory and behavior analysis at Columbia University.
12 Pixel State data is the pixel values of the current frame in the environment (usually a video game).
13 RAM data will output the RAM values of the Atari machine of the current step in the environment.
14 Python is a high-level programming language that will be used to implement reinforcement learning
15 OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
The question this research aims to answer is, “Will a naive approach to a Deep Neural Network
achieve a higher in-game score than a naive approach to a Convolutional Neural Network after
multiple trials in the video game 2048?” A naive approach is defined as a simple method of
implementing a neural network to interact with the raw data of the environment. The networks
will receive no prior information, algorithm, indication, or other forms of help established to
make assessing and learning the workings of the 2048 environment easier. This is implemented
to make sure that the results of the Neural Network are due solely to the network, and not any
other functions or data-preprocessing methods. The hypothesis that this research aims to test is,
“The Convolutional Neural Network will outperform the traditional Neural Network, as it has
historically performed better on matrix observations while the Neural Network is intended for
single-dimensional observations.”
Works Cited
Başar, Tamer. “Tamer Başar: Introduction.” Tamer Basar, University of Illinois Urbana-
Champaign, 20 Feb. 2014,
Guei, Hung, et al. “Using 2048-like Games as a Pedagogical Tool for Reinforcement Learning.”
ICGA Journal, vol. 40, no. 3, 1 May 2019, pp. 281–293.,
Levine, Zachariah. “Learning 2048 with Deep Reinforcement Learning.” David R. Cheriton
School of Computer Science, University of Waterloo, 3 Mar. 2017,
Knight, Will. “Reinforcement Learning.” MIT Technology Review, MIT Technology Review, 17
Sept. 2021,
McElwee, Steven, et al. “Deep Learning for Prioritizing and Responding to Intrusion Detection
Alerts.” MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM), 11 Dec.
Yang, Yanwei, et al. “Application of Scikit and Keras Libraries for the Classification of Iron Ore
Data Acquired by Laser-Induced Breakdown Spectroscopy (LIBS).” Sensors, vol. 20, no. 5, Mar.
2020, p. 1393. Crossref,