QASE An Integrated Api For Imitation and

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

QASE: AN INTEGRATED API FOR IMITATION AND GENERAL AI RESEARCH

IN COMMERCIAL COMPUTER GAMES

Bernard Gorman Martin Fredriksson Mark Humphrys


Dublin City University Blekinge Institute of Technology Dublin City University
Glasnevin, Dublin 9 Ronneby Glasnevin, Dublin 9
Rep. of Ireland Sweden Rep. of Ireland
+353 1 4902714 +46 457 385825 +353 1 700 8059
bernard.gorman@computing.dcu.ie martin.fredriksson@bth.se mark.humphrys@computing.dcu.ie

KEYWORDS In the initial stages of our research, however, it became clear


Imitation, machine learning, artificial intelligence, API, that the available testbeds and resources were often
game bots, intelligent agents, education. scattered, frequently incomplete, and consistently ad hoc.
Existing APIs were unintuitive, unreliable and lacking in
functionality. Network protocol and file format
ABSTRACT specifications were generally unofficial, more often than not
Computer games have belatedly come to the fore as a serious the result of reverse-engineering by adventurous fans
platform for AI research. Through our own experiments in (Girlich 2000). Documentation was sketchy, with even the
the fields of imitation learning and intelligent agents, it most rudimentary information spread across several disjoint
became clear that the lack of a unified, powerful yet intuitive sources. Above all, it was evident that the absence of a
API was a serious impediment to the adoption of commercial unified, low-level yet easy-to-use development platform and
games in both research and education. Parallel to our own experimental testbed was a major impediment to the
specialised work, we therefore decided to develop a general- adoption of commercial games in both academic research
purpose library for the creation of game agents, in the hope and education.
that the availability of such software would help stimulate As a result, we decided to adopt a two-track approach. We
further interest in the field. Though geared towards machine- would develop approaches to imitation learning in games,
learning, the API would be flexible enough to facilitate while simultaneously building a comprehensive
multiple forms of artificial intelligence, making it suitable programming interface designed to provide all the
for application in research and in undergraduate courses functionality necessary for others to engage in this work.
centring upon traditional AI and agent-based systems. This interface should be powerful enough to facilitate high-
end research, while at the same time being suitable for use in
In this paper, we present the result of our efforts; the Quake undergraduate courses geared towards classic AI and agent-
2 Agent Simulation Environment (QASE) API. We first based systems.
describe the theme of our work, the reasons for choosing
Quake 2 as our testbed, and the necessity for an API of this
nature. We then outline its most important features, before
Choosing a Testbed - Quake 2
presenting an experiment from our own research to Our first task was to decide which game to use as a testbed.
demonstrate QASE’s practical capabilities. We opted to investigate the first-person shooter genre, in
which players control a single character exploring a three-
INTRODUCTION dimensional environment littered with weapons, bonus
items, traps and pitfalls, with the objective of defeating as
In recent years, commercial computer games have gained many opponents as possible within a predetermined time
increasing recognition as an ideal platform for research in limit. This particular genre was chosen in preference to
various fields of artificial intelligence (Larid & van Lent, others due to the fact that it provides a comparatively direct
2000; Naraeyek 2004). The vast majority, however, still mapping of human decisions onto agent actions; this is in
utilize AI techniques that were developed several decades contrast to many other game types, where the agent’s
ago, and which often produce mechanical, repetitive and behaviours are determined in large part by factors other than
unsatisfying game agents. Given that games provide a the player’s decision-making process. In sports simulations,
convenient means of recording the complex, fluent for instance, only a single character is usually under the
behaviors of human players, some researchers (Sklar et al control of the human player - the interactions of his
1999; Bauckhage et al 2003; Thurau et al 2004) have teammates are managed from one timestep to the next by the
speculated that approaches based on the analysis and computer. While other genres do offer many interesting
imitation of human demonstrations may produce more challenges for AI research, as outlined by both (Laird 2001)
challenging and believable artificial agents than can be and (Fairclough et al 2001), the attraction of first-person
realised using traditional techniques; indeed, imitation shooters - to researchers and gamers alike - lies in the
learning is already employed quite extensively in the minimal degree of abstraction they impose between the
robotics community (Atkeson & Schaal 1997, Schaal 1999, human player and his/her virtual avatar. The same qualities
Jenkins & Mataric 2000). Building upon this premise, the make them ideal for use in undergraduate courses; the
primary focus of our work lies in investigating imitation student creates the AI for a single agent, which can then be
learning in games which involve cognitive agents. deployed in competition against those written by others.
With this in mind, we chose ID Software’s Quake 2 as our
test environment - it was prominent in the literature, existing
resources were more substantial than for other games, and
thanks to Laird it had become the de facto standard for
research of this nature. Figure 1 shows a typical Quake 2
environment, with various features labelled.

THE QASE API


The Quake 2 Agent Simulation Environment was created to
meet the requirements identified earlier; namely, it is a fully-
featured, integrated API, designed to be as intuitive, modular
and transparent as possible. It is Java-based, ensuring an
easily extensible object-oriented architecture and allowing it
to be deployed on many different hardware platforms and
operating systems. It amalgamates and improves upon the
functionalities of several existing applications, removing the
need to rely on ad-hoc software combinations or to comb Figure 1 - Typical Quake 2 environment
through a multitude of different documentations; QASE of component objects representing the current gamestate.
consolidates all relevant information into a single source. It
is geared towards machine and imitation learning, but is An important point to note is that, because the network layer
equally appropriate for use with more traditional forms of is separated from the higher-level classes in the QASE
agent-based AI. Put simply, QASE is intended to provide all architecture, it is highly portable. Adapting the QASE API
the functionality the researcher or student will require in to games with similar network protocols, such as Quake 3
their experiments with cognitive agents. and its derivatives, therefore becomes a relatively
straightforward exercise; by extending the existing classes
In the following sections we will outline the major and rewriting the data-handling routines, they could
components of the QASE architecture, highlighting its conceivably be adapted to any UDP-based network game.
potential for application in research and education. Thus, QASE’s network structures can be seen as providing a
Network Layer template for the development of artificial game clients in
general.
Quake 2’s multi-player mode is a simple client-server model.
One player starts a server and other combatants connect to it, Gamestate Augmentation
entering whatever environment (known as a map) the Rather than simply providing a bare-bones implementation
instigating player has selected. Every hundred milliseconds, of the client-side protocol, QASE also performs several
the server transmits an update frame to all connected clients, behind-the-scenes operations upon receipt of each update,
containing information about the game world and the status designed to present an augmented view of the gamestate to
of each entity; each client merges the update into its existing the agent. In other words, QASE transparently analyses the
gamestate record, and then responds by sending its desired information it receives, makes deductions based on what it
movement, aiming and action back to the server. Thus, in finds, and exposes the results to the agent. As such, it may
order to realize artificial agents (also known as bots), a be seen as representing a virtual extension of the standard
means of handling the game’s network traffic is required. Quake 2 network protocol.
QASE accomplishes this via its Proxy class, which
encapsulates an implementation of the Quake 2 client-side For instance, the standard protocol has no explicit item
network protocol. It is responsible for establishing game pickup notification; when the agent collects an object, the
sessions with the server, receiving inbound data and server takes note of it but does not send a confirmation
converting it into a human-readable format, and transmitting message to the client, since under normal circumstances the
the agent’s subsequent actions back to the server, as shown human player will be able to identify the item visually.
in Figure 2 below. All this is transparent to the agent itself; QASE compensates for this by detecting the sound of an
at each interval, the bot is simply notified that an update has item pickup, examining which entities have just become
occurred, and receives a World object containing a hierarchy inactive, finding the closest such entity to the player, and
thereby deducing the entity number, type and inventory
index of the newly-acquired item. Building on this, QASE
records a full list of which items the player has collected and
when they are due to respawn (reappear), automatically
flagging the agent whenever such an event occurs.

Similarly, recordings of Quake 2 matches (see below) do not


encode the full inventory of the player at each timestep - that
is, the list of how many of which items the player is
currently carrying. For research models which require
Figure 2 - The QASE API and its role in realising Quake agents
knowledge of the inventory, such as that outlined in the
QASE and Imitation Learning section below, this is a major ObserverBot and PollingBot
drawback. QASE circumvents the problem by monitoring The highest level of the Bot hierarchy consists of two
item pickups and weapon discharges, ‘manually’ building up classes, ObserverBot and PollingBot, which
an inventory representation from each frame to the next. represent fully-realised agents. Each of these provides a
This can also be used to track the agent’s inventory in online means of detecting changes to the gamestate (implemented
game sessions, removing the need to explicitly request a full as indicated by their names), as well as a single point of
inventory listing from the server on each update. insertion - the programmer needs only to supply the AI
routine in the runAI method defined by the Bot interface.
Bot Hierarchy Each has its own advantages; the ObserverBot allows
several different objects to be attached to a single Proxy,
In order to facilitate the rapid creation of different types of whereas the multithreaded PollingBot offers slightly
game agents, QASE implements a structured hierarchy of more efficient performance.
bot classes, allowing users to develop agents from a number Beyond this, several convenience classes are available,
of levels of abstraction. These range from a simple interface which provide extended bot implementations tailored to
class, to full-fledged bots incorporating an exhaustive range specific purposes. The NoClipBots allow the user to
of user-accessible functions. The bot hierarchy comprises ‘noclip’ the agent (i.e. move it through otherwise solid
three major levels; these are summarised below. walls) to any arbitrary point in the environment before
starting the simulation; the MatLabBot branches will be
Bot explained later. The full hierarchy is shown in Figure 3
A template which specifies the interface to which all bots below.
must conform, but does not provide any functionality; the
programmer is entirely responsible for the implementation of The DM2 Parser and Recorder
the agent, and may do so in any way (s)he chooses.
Quake 2’s inbuilt client, used by human players to connect
BasicBot to the game server, facilitates the recording of matches from
An abstract bot which provides most of the functionality the perspective of each individual player. These demo or
required by Quake 2 agents, such as the ability to determine DM2 files contain an edited copy of the network packet
whether the bot has died, to respawn (re-enter the game) stream received by the client during the game session,
after the agent has been defeated, to create an agent given capturing the player’s actions and the state of all entities at
minimal profile information, to set the agent’s movement each discrete time step. For the purposes of imitation
direction, speed and aim and send these to the server, to learning, then, a means of parsing these files and extracting
obtain sensory information about the virtual world, and to the gameplay samples is needed. QASE’s DM2Parser fulfils
record itself to a demo file. All that is required of the this requirement.
programmer is to extend the class, write the AI routine in the The DM2Parser treats the demo file as a virtual server,
predefined runAI method, and to supply a means of “connecting” to it and reading blocks of data in exactly the
handling the server traffic according to whatever interaction same manner as it receives network packets during an online
paradigm he wishes to use. The third level of the bot game session. A copy of the gamestate is returned for each
hierarchy provides ready-to-use implementations of two recorded frame, and the programmer may query it to retrieve
such paradigms. whatever information (s)he requires.

Bot
interface

BasicBot
abstract

ObserverBot PollingBot
abstract abstract

NoClipBot MatLabObserverBot MatLabPollingBot


abstract abstract abstract

MatLabNoClipBot MatLabGeneralObserverBot MatLabGeneralPollingBot


abstract concrete final concrete final

MatLabNoClipGeneralBot
concrete final Figure 3 - The complete QASE Bot Hierarchy
Figure 4 - BSP traces with line, sphere and box. Collision occurs at different points.
For examples of the type of data that can be obtained and agent’s in-game model will fit through an opening. Figure 4
analysed, see the sections MatLab Integration and QASE above shows the operation of each different trace type.
and Imitation Learning below.
Inbuilt Cognitive & Other Facilities
Furthermore, QASE incorporates a DM2Recorder, allowing
the agent to automatically record a demo of itself during For education purposes, QASE incorporates
play; this actually improves upon Quake 2’s standard implementations of both a neural network and a genetic
recording facilities, by allowing demos spanning multiple algorithm generator. These are designed to be used in
maps to be recorded in playable format. The incoming tandem - that is, the genetic algorithms gradually cause the
network stream is sampled, edited as necessary, and saved to neural network’s weights to evolve towards a given fitness
file when the agent disconnects from the server or as an function. A KMeans calculator class is also included; aside
intermediate step whenever the map is changed. from serving as an illustration of clustering techniques, it is
also used in QASE’s waypoint map generator (see below).
Environment Sensing These features are included primarily to allow students to
experiment with some AI constructs commonly found in
The network packets received by game clients from the undergraduate curricula - for more demanding research
Quake 2 server do not encode any information about the applications, QASE allows MatLab to be used as a back-end.
actual environment in which the agent finds itself, beyond its
current state and those of the various game entities present. One of QASE’s most useful features, particularly from an
This information is contained in Binary Space Partition educational point of view, is the aforementioned waypoint
(BSP) files stored locally on each client machine; thus, in map generator. Drawing on concepts developed in the
order to provide the bot with more detailed sensory course of our work in imitation learning (see QASE and
information (such as determining its proximity to an Imitation Learning), this requires the user to supply a
obstacle, or whether an enemy is visible), a means of prerecorded DM2 file; it will then automatically find the set
locating, parsing and querying these map files is required. of all positions occupied by the player during the game
QASE’s BSPParser and PAKParser fulfil this need. session, cluster them to produce a smaller number of
indicative waypoints, and draw edges between these
The BSP file corresponding to the active map in the current waypoints based on the observed movement of the
game session may be stored in the default game directory, a demonstrator. The items collected by the player are also
custom game directory, or in any of Quake 2’s PAK recorded, and Floyd’s algorithm (Floyd, 1962) is applied to
archives; its filename may or may not match the name of the find the matrix of distances between each pair of points. The
map, which is the only information possessed by the client. map returned to the user at the end of the process can thus be
If the user sets an environment variable pointing to the queried to find the shortest path from the agent’s current
location of the base Quake 2 folder, QASE can automatically position to any needed item, to the nearest opponent, or to
find the relevant BSP by searching each location in order of any random point in the level. Rather than manually building
likelihood. This is done transparently from the agent’s a waypoint map from scratch, then, all the student needs to
perspective; as soon as any environment-sensing method is do in order to create a full navigation system for their agent
invoked, the map is silently located, loaded and queried. is to record themselves moving around the environment as
Once loaded, the BSPParser can be used to sweep a line, necessary, collect whatever items their bot will require, and
box or sphere in any arbitrary direction through the game present the resulting demo file to QASE.
world, starting from the agent’s current location; the distance
and/or position at which the first collision with the MatLab Integration
environment’s geometry occurs is returned. This allows the
agent to “perceive” the world around it on a pseudo-visual For the purposes of our work in imitation learning, we need
level - line traces can be used to determine whether entities a way to not only obtain, but also statistically analyse the
are visible from the agent’s perspective, sphere traces can be observed in-game actions of human players. Rather than
used to check whether projectiles will reach a certain point if hand-coding the required structures from scratch, we opted
fired, and box traces can be used to determine whether the instead to integrate the API with the Mathworks™ MatLab®
Figure 5 - MatLab/QASE integration. MatLab acts as a back-end in the AI cycle; the agent’s body and brain are separated
programming environment. Given that it provides a rich set filters the gamestate, presenting only the minimal required
of built-in toolboxes for neural computation, clustering and information to MatLab; QASE thus enables both MatLab
other classification techniques and is already widely used in and Java to process as much data as possible in their
research, MatLab seemed an ideal choice to act as an respective native environments. This has proven extremely
optional back-end for QASE agents. effective, both in terms of computational efficiency and ease
of development.
Bots can be instantiated and controlled via MatLab in one of
two ways. For simple AI routines, one of the standalone Aside from creating game agents, MatLab can also use the
MatLabGeneralBots shown in Figure 3 is sufficient. A various supporting functions of the QASE API. From our
MatLab function is written which creates an instance of the perspective, one of the most important of these is the ability
agent, connects it to the server, and accesses the gamestate at to read and process demonstrations of gameplay using the
each update, all entirely within the MatLab environment. DM2Parser. Figure 8 shows an example of this; see the
The advantage of this approach is that it is intuitive and very section QASE and Imitation Learning for details.
straightforward; a template of the MatLab script is provided
with the QASE API. In cases where a large amount of Of course, the fact that we integrated QASE with MatLab
gamestate and data processing must be carried out on each specifically to facilitate our work in imitation learning does
frame, however, handling it exclusively through MatLab can not diminish its potential for use in other areas; as stated
prove somewhat inefficient. earlier, QASE is designed for broad AI research.

For this reason, we developed an alternative paradigm QASE AND IMITATION LEARNING
designed to offer greater efficiency. As outlined in the Bot
Hierarchy section above, QASE agents are usually created In this section, we outline an experiment conducted in the
by extending either the ObserverBot or PollingBot course of our work. While it by no means demonstrates the
classes, and overloading the runAI method in order to add full extent of QASE’s faculties, this example does provide a
the required behaviour. In other words, the agent’s AI good indication of its potential in the field of research.
routines are atomic, and encapsulated entirely within the
derived class. Thus, in order to facilitate MatLab, a new One of the first questions which arises when considering the
branch of agents - the MatLabBots - was created; each of problem of imitation learning is, quite simply, “what
these possesses a three-step AI routine as follows: behaviours does the demonstration encode?” To this end,
(Thurau et al 2004a) propose a model of in-game behaviour
1. On each server update, QASE first pre-processes based closely on Hollnagel’s COCOM (Hollnagel 1993), as
the data required for the task at hand; it then flags shown in Figure 6 below.
MatLab to take over control of the AI cycle.
2. The MatLab function obtains the agent’s input data,
processes it using its own internal structures, passes
the results back to the agent, and signals that the
agent should reassume control.
3. This done, the bot applies MatLab’s output in a
postprocessing step.

This framework is already built into QASE’s


MatLabBots; the programmer need only extend
MatLabObserver / Polling / NoClipBot to define
the handling of data in the preprocessing and postprocessing
steps, and change the accompanying MatLab script as
necessary. By separating the agent’s body (QASE) from its
brain (MatLab) in this manner, we ensure that both are
modular and reusable, and that cross-environment Figure 6 - Thurau’s adaptation of Hollnagel's COCOM
communications are minimised. The preprocessing step
Strategic behaviours refer to actions the player takes with To do so, we first read the player’s inventory from the demo
long-term goals in mind; these include maximising the at each timestep, again using QASE’s DM2Parser and the
number of weapons or items he possesses, controlling inventory-tracking system described earlier. In our
certain areas of the map, and so forth. Tactical behaviours experiments, we construct an inventory state vector of 18
are mostly concerned with localised tasks such as evading or elements, specifying the player’s health and armour values
engaging opponents. Reactive behaviours involve little or no together with the weapons he has collected and the amount
planning; the player simply reacts to stimuli in his immediate of ammo he has for each. The set of unique state vectors is
surroundings. Motion modelling refers to the imitation of the then obtained; these state prototypes represent the varying
player’s movement; in theory, this should produce situations faced by the player during the game session.
humanlike motion along the bot’s path, and should also
prevent the agent from performing actions which are We can now construct a set of paths which the player
impossible for the human player’s mouse-and-keyboard followed while in each inventory state. These paths consist
interface (instantaneous 180˚ turning, perfect aim, etc). of a series of transitions between clusters:

Goal-Oriented Strategic Behaviour t i = [ci ,1 , ci , 2 ,..., ci ,k ]

The following is drawn largely from our paper “Towards where ti is a transition sequence (path), and ci,j is a single
Integrated Imitation of Strategic Planning and Motion node along that sequence. Each path begins at the point
Modelling in Interactive Computer Games” (Gorman & where the player enters a given state, and ends where he
Humphrys 2005). exits that state - in other words, when an item is collected
that causes the player’s inventory to shift towards a different
In order to learn long-term strategic behaviours from human prototype. See Figure 8 for an illustration of one such path.
demonstration, we developed a model designed to emulate Assigning Rewards
the notion of program level imitation discussed in (Byrne
and Russon 1998); in other words, to identify the Having obtained the different paths pursued by the player in
demonstrator’s intent, rather than simply reproducing his each inventory state, we turn to reinforcement learning to
precise actions. (Thurau et al, 2004a) present an approach to reproduce his behaviour. In this scenario, the MDP’s actions
such behaviours based on artificial potential fields; here we are considered to be the choice to move to a given node from
consider the application of reinforcement learning and fuzzy the current position. Thus, the transition probabilities are

P ( s' = j | s = i, a = j ) = Eij
clustering techniques.

Topology Learning
To guide the agent along the same routes taken by the
As mentioned earlier, in the context of Quake, strategic
player, we assign an increasing reward to consecutive nodes
planning is mostly concerned with the efficient collection
in each path taken in each prototype, such that

R ( pi , ci , j ) = j
and monopolisation of items and the control of certain
important areas of the map. With this in mind, we first read
the set of all player locations l = {x, y, z} from the DM2
r

recording into MatLab via QASE’s DM2Parser, and the


points are clustered to produce a reduced set of positions,
called nodes. We initially employed the Neural Gas
algorithm in this step, since it has been demonstrated to
perform well in topology-learning tasks (Martinez et al
1993); however, we later developed a custom modification
of Elkan’s fast k-means (Elkan 2003) designed to treat the
positions at which items were collected as immovable
“anchor” centroids, thereby deriving a goal-oriented
clustering of the dataset. By examining the sequence of
player positions, we also construct an n x n matrix of edges
E, where n is the number of clusters, and Eij = 1 if the player
was observed to move from node i to node j and 0 otherwise.

Deriving Movement Paths


Because the environment described above may be seen as a
Markov Decision Process, with the nodes corresponding to Figure 7 - An illustration of program-level imitation; items are
represented as green squares. The player (blue) descends and
states and the edges to transitions, we chose to investigate
re-ascends a staircase, with no objective benefit. The agent
approaches to goal-oriented movement based on concepts (red) ignores this non-goal-oriented movement, passing the
from reinforcement learning, in particular the value iteration stairs and heading directly towards the final item pickup.
algorithm.
Figure 8 - An example of a path followed by the player while in a Figure 9 - The ascending rewards assigned to this path (blue/red),
particular inventory state. The path originates in the lower part of and the results of the value iteration algorithm (green &
the level, and ends at the point where the player picked up an item magenta). The y-axis deontes the values associated with each
that caused his inventory to shift towards another prototype. waypoint in the topological map.

d ( s , p) −1
r r
m p (s) = P
∑i =1 d (s , i ) −1
where pi is a prototype, and ci,j is the jth cluster in the
associated movement sequence. Each successive node along r r
the path’s length receives a reward greater than the last, until
the final cluster (at which an inventory state change
occurred) is assigned the highest reward. If a path loops where s is the current inventory state, p is a prototype
back or crosses over itself en route to the goal, then the inventory state, P is the number of prototypes, d -1 is an
higher values will overwrite the previous rewards, ensuring inverse-distance or proximity function, and mp(s) is the
that the agent will be guided towards the terminal node while degree to which state vector s is a member of prototype p,
ignoring any non-goal-oriented diversions. Thus, as relative to all other prototypes. The utility configurations
mentioned above, the agent will emulate the player’s associated with each prototype are then weighted according
program-level behaviour, instead of simply duplicating his to the membership distribution, and the adjusted
exact actions. See Figure 7 above for an example. configurations superimposed; we also apply an online
discount to prevent the possibility of backtracking. The

U (c) = γ e ( c ) ∑ p =1V p (c)m p ( s )


Learning Utility Values formula used to compute the final utilities is thus:
P
With the transition probabilities and rewards in place, we

c t +1 = max U ( y ), y ∈ {x | E ct x = 1}
can now run the value iteration algorithm in order to
compute the utility values for each node in the topological

where U(c) is the final utility of node c, γ is the online


y
map under each inventory state prototype. The value
iteration algorithm iteratively propagates rewards outwards
from terminal nodes to all others, discounting them by discount, e(c) is the number of times the player has entered
distance from the reward signal; once complete, these utility cluster c since the last state transition, Vp(c) is the original
values will represent the “usefulness” of being at that node value of node c in state prototype p, and E is the edge
while moving to the goal. matrix.

In our case, it is important that every node in the map should Object Transience
possess a utility value under every state prototype by the end
Another important element of planning behaviour is the
of the learning process, thereby ensuring that the agent will
human’s understanding of object transience. A human
always receive strong guidance towards its goal. We adopt
player intuitively tracks which items he has collected from
the game value iteration approach outlined in (Hartley et al
which areas of the map, can easily estimate when they are
2004) - the algorithm is applied until all nodes have been
scheduled to reappear, and adjusts his strategy accordingly.
affected by a reward at least once. Figure 9 above shows the
In order to capture this, we introduce an activation variable
results of the value iteration algorithm on a typical path.
in the computation of the membership values; inactive items
are nullified, and the membership values are redistributed
Multiple Weighted Objectives among those items which are still active.

a (o p )d ( s , p) −1
r r
m p (s) = P
∑i=1 a(oi )d (s , i ) −1
Faced with a situation where several different items are of
strategic benefit, a human player will intuitively weigh their r r
respective importance before deciding on his next move. To
model this, we adopt a fuzzy clustering approach. On each
update, the agent’s current inventory is expressed as a where a, the activation of an item, is 1 if the object o at the
membership distribution across all prototype inventory terminal node of the path associated with prototype state p is
states. This is computed as: present, and 0 otherwise.
Figure 10 - The agent returns to a previously-visited point before some ammo items have respawned (1.1), and since they are inactive it
initially passes by (1.2); however, their sudden re-emergence (1.2) causes the utilities to reactivate, and the agent is drawn to collect
them (1.3) before continuing (1.4). Later, the agent returns once again (2.1). The items are now active, but since the agent has already
collected several shotgun pickups, the relevant membership values are insignificant; as a result, the agent ignores the pickups (2.2,
2.3), and continues on towards more attractive objectives (2.4)
Imitation Learning. In this way, further developments in our
Deploying the Agent research will guide future development of the API.
With the DM2 data extracted and the required values
computed, we can now deploy the agent. We extend any of QASE has already attracted some attention in academia;
the MatLabBots, overloading preMatLab to extract the researchers at Kyushu University in Japan expressed interest
player’s current position and inventory and pass these to in adopting it for use in their work, and more recently a PhD
MatLab. We then rewrite the MatLab template to instantiate student in California has contacted us with the same intent.
the agent and connect it to the server. On each update, As more individuals and institutions discover QASE, the
MatLab determines the closest matching state prototype and resulting feedback will aid us in continually improving the
node, extracts the relevant utility configuration, finds the set API. We hope that this paper will help to stimulate further
of nodes connected to the current node by examining the interest in QASE, in imitation learning, and in the potential
edge matrix, and selects the successor with the highest utility of games in AI research and education in general.
value; the position of this node is passed back to QASE. The To download the API and accompanying documentation,
agent’s postMatLab method is also overloaded, to please visit the QASE homepage: http://qase.vze.com
determine the direction between its current position and the
REFERENCES
¬
next node, and to set the agent’s movement accordingly. As Bauckhage C, Thurau C. & Sagerer G. (2003): Learning Humanlike Opponent
the agent traverses its environment, item pickups and in- Behaviour for Interactive Computer Games, Pattern Recognition, Vol 2781
game events will cause its inventory to change, resulting in a ¬ Byrne, R.W. and Russon, A.E. "Learning by Imitation: A Hierarchical
Approach", Behavioral and Brain Sciences (1998) 21, 667-721
corresponding change in the utility values and attracting the ¬ Elkan, C. "Using the Triangle Inequality to Accelerate k-Means", Proc. 20th
International Conference on Machine Learning 2003
¬
agent towards its next objective. Figure 10 shows the QASE
Fairclough, C., Fagan, M., MacNamee, B and Cunningham, P. Research
agent in action. Directions for AI in Computer Games. Technical report, 2001.
¬ Floyd, R.W., Algorithm 97, Shortest path, Comm. ACM. 5(6), 1962, 345
CONCLUSION ¬ Girlich, U., Unofficial Quake 2 DM2 Format Description, 2000
¬ Gorman, B & Humphrys, M: “Towards Integrated Imitation of Strategic
In this paper, we identified the lack of a fully-featured, Planning and Motion Modelling in Interactive Computer Games”, Proc. 3rd
Intl. Conf. in Computer Game Design & Technology, GDTW05, pages 92-99
¬
consolidated API as a major impediment to the adoption of Hartley, T, Mehdi, Q and Gough, N. “Applying Markov Decision Processes to
commercial games in AI education and research. We then 2D Real-Time Games”, Proc. CGAIDE 2004: p55-59
presented our QASE API, which has been developed to meet ¬ Hollnagel, E. (1993) Human reliability analysis: Context and control. L:AP
¬ Jenkins, OC and Mataric, MJ "Deriving Action and Behavior Primitives from
these requirements. Several of its more important features Human Motion Data". Proc. IEEE/RSJ IROS-2002, pages 2551-2556
were described, and their usefulness highlighted. A practical ¬ Laird, J. E. and v. Lent, M. (2000). Interactive Computer Games: Human-
Level AI’s Killer Application. AAAI, pages 1171-1178.
¬
demonstration of QASE as it has been used in our own Laird, J.E. Using a Computer Game to develop advanced AI. IEEE Computer,
research closed this contribution. pages 70 -75, July 2001.
¬ Martinez, T. and Schulten, K. (1991). A neural gas network learns topologies.
In Artificial Neural Networks. Elseviers Science Publishers
¬
FUTURE WORK Naraeyek, A. Computer Games - Boon or Bane for AI Research. Künstliche
Although we regard it as being very feature-rich and entirely Intelligenz, pages 43-44, February 2004
¬ Schaal, S. Is imitation learning the route to humanoid robots? Trends in
stable at this point, QASE will continue to develop as we Cognitive Sciences, 3(6):233-242, 1999
progress in our research. The two tracks of our work - that of ¬ Sklar, E., AD Blair, P Funes & J. Pollack, 1999: Training Intelligent Agents
Using Human Internet Data, 1st Asia-Pacific IAT
¬
investigating approaches to imitation learning and of Thurau, C., C. Bauckhage & G. Sagerer 2004a: Learning Humanlike
building an accompanying API - have thus far informed each Movement Behaviour for Computer Games, in Proc. 8th Intl. SAB Conf.
other; as mentioned earlier, QASE’s waypoint generator is ¬ Thurau, C., C. Bauckhage & G. Sagerer 2004b: Synthesising Movement for
Computer Games, in Pattern Recognition, Vol. 3175 of LCNS Springer
derived from the approach outlined in the section QASE and

You might also like