QASE An Integrated Api For Imitation and
QASE An Integrated Api For Imitation and
QASE An Integrated Api For Imitation and
Bot
interface
BasicBot
abstract
ObserverBot PollingBot
abstract abstract
MatLabNoClipGeneralBot
concrete final Figure 3 - The complete QASE Bot Hierarchy
Figure 4 - BSP traces with line, sphere and box. Collision occurs at different points.
For examples of the type of data that can be obtained and agent’s in-game model will fit through an opening. Figure 4
analysed, see the sections MatLab Integration and QASE above shows the operation of each different trace type.
and Imitation Learning below.
Inbuilt Cognitive & Other Facilities
Furthermore, QASE incorporates a DM2Recorder, allowing
the agent to automatically record a demo of itself during For education purposes, QASE incorporates
play; this actually improves upon Quake 2’s standard implementations of both a neural network and a genetic
recording facilities, by allowing demos spanning multiple algorithm generator. These are designed to be used in
maps to be recorded in playable format. The incoming tandem - that is, the genetic algorithms gradually cause the
network stream is sampled, edited as necessary, and saved to neural network’s weights to evolve towards a given fitness
file when the agent disconnects from the server or as an function. A KMeans calculator class is also included; aside
intermediate step whenever the map is changed. from serving as an illustration of clustering techniques, it is
also used in QASE’s waypoint map generator (see below).
Environment Sensing These features are included primarily to allow students to
experiment with some AI constructs commonly found in
The network packets received by game clients from the undergraduate curricula - for more demanding research
Quake 2 server do not encode any information about the applications, QASE allows MatLab to be used as a back-end.
actual environment in which the agent finds itself, beyond its
current state and those of the various game entities present. One of QASE’s most useful features, particularly from an
This information is contained in Binary Space Partition educational point of view, is the aforementioned waypoint
(BSP) files stored locally on each client machine; thus, in map generator. Drawing on concepts developed in the
order to provide the bot with more detailed sensory course of our work in imitation learning (see QASE and
information (such as determining its proximity to an Imitation Learning), this requires the user to supply a
obstacle, or whether an enemy is visible), a means of prerecorded DM2 file; it will then automatically find the set
locating, parsing and querying these map files is required. of all positions occupied by the player during the game
QASE’s BSPParser and PAKParser fulfil this need. session, cluster them to produce a smaller number of
indicative waypoints, and draw edges between these
The BSP file corresponding to the active map in the current waypoints based on the observed movement of the
game session may be stored in the default game directory, a demonstrator. The items collected by the player are also
custom game directory, or in any of Quake 2’s PAK recorded, and Floyd’s algorithm (Floyd, 1962) is applied to
archives; its filename may or may not match the name of the find the matrix of distances between each pair of points. The
map, which is the only information possessed by the client. map returned to the user at the end of the process can thus be
If the user sets an environment variable pointing to the queried to find the shortest path from the agent’s current
location of the base Quake 2 folder, QASE can automatically position to any needed item, to the nearest opponent, or to
find the relevant BSP by searching each location in order of any random point in the level. Rather than manually building
likelihood. This is done transparently from the agent’s a waypoint map from scratch, then, all the student needs to
perspective; as soon as any environment-sensing method is do in order to create a full navigation system for their agent
invoked, the map is silently located, loaded and queried. is to record themselves moving around the environment as
Once loaded, the BSPParser can be used to sweep a line, necessary, collect whatever items their bot will require, and
box or sphere in any arbitrary direction through the game present the resulting demo file to QASE.
world, starting from the agent’s current location; the distance
and/or position at which the first collision with the MatLab Integration
environment’s geometry occurs is returned. This allows the
agent to “perceive” the world around it on a pseudo-visual For the purposes of our work in imitation learning, we need
level - line traces can be used to determine whether entities a way to not only obtain, but also statistically analyse the
are visible from the agent’s perspective, sphere traces can be observed in-game actions of human players. Rather than
used to check whether projectiles will reach a certain point if hand-coding the required structures from scratch, we opted
fired, and box traces can be used to determine whether the instead to integrate the API with the Mathworks™ MatLab®
Figure 5 - MatLab/QASE integration. MatLab acts as a back-end in the AI cycle; the agent’s body and brain are separated
programming environment. Given that it provides a rich set filters the gamestate, presenting only the minimal required
of built-in toolboxes for neural computation, clustering and information to MatLab; QASE thus enables both MatLab
other classification techniques and is already widely used in and Java to process as much data as possible in their
research, MatLab seemed an ideal choice to act as an respective native environments. This has proven extremely
optional back-end for QASE agents. effective, both in terms of computational efficiency and ease
of development.
Bots can be instantiated and controlled via MatLab in one of
two ways. For simple AI routines, one of the standalone Aside from creating game agents, MatLab can also use the
MatLabGeneralBots shown in Figure 3 is sufficient. A various supporting functions of the QASE API. From our
MatLab function is written which creates an instance of the perspective, one of the most important of these is the ability
agent, connects it to the server, and accesses the gamestate at to read and process demonstrations of gameplay using the
each update, all entirely within the MatLab environment. DM2Parser. Figure 8 shows an example of this; see the
The advantage of this approach is that it is intuitive and very section QASE and Imitation Learning for details.
straightforward; a template of the MatLab script is provided
with the QASE API. In cases where a large amount of Of course, the fact that we integrated QASE with MatLab
gamestate and data processing must be carried out on each specifically to facilitate our work in imitation learning does
frame, however, handling it exclusively through MatLab can not diminish its potential for use in other areas; as stated
prove somewhat inefficient. earlier, QASE is designed for broad AI research.
For this reason, we developed an alternative paradigm QASE AND IMITATION LEARNING
designed to offer greater efficiency. As outlined in the Bot
Hierarchy section above, QASE agents are usually created In this section, we outline an experiment conducted in the
by extending either the ObserverBot or PollingBot course of our work. While it by no means demonstrates the
classes, and overloading the runAI method in order to add full extent of QASE’s faculties, this example does provide a
the required behaviour. In other words, the agent’s AI good indication of its potential in the field of research.
routines are atomic, and encapsulated entirely within the
derived class. Thus, in order to facilitate MatLab, a new One of the first questions which arises when considering the
branch of agents - the MatLabBots - was created; each of problem of imitation learning is, quite simply, “what
these possesses a three-step AI routine as follows: behaviours does the demonstration encode?” To this end,
(Thurau et al 2004a) propose a model of in-game behaviour
1. On each server update, QASE first pre-processes based closely on Hollnagel’s COCOM (Hollnagel 1993), as
the data required for the task at hand; it then flags shown in Figure 6 below.
MatLab to take over control of the AI cycle.
2. The MatLab function obtains the agent’s input data,
processes it using its own internal structures, passes
the results back to the agent, and signals that the
agent should reassume control.
3. This done, the bot applies MatLab’s output in a
postprocessing step.
The following is drawn largely from our paper “Towards where ti is a transition sequence (path), and ci,j is a single
Integrated Imitation of Strategic Planning and Motion node along that sequence. Each path begins at the point
Modelling in Interactive Computer Games” (Gorman & where the player enters a given state, and ends where he
Humphrys 2005). exits that state - in other words, when an item is collected
that causes the player’s inventory to shift towards a different
In order to learn long-term strategic behaviours from human prototype. See Figure 8 for an illustration of one such path.
demonstration, we developed a model designed to emulate Assigning Rewards
the notion of program level imitation discussed in (Byrne
and Russon 1998); in other words, to identify the Having obtained the different paths pursued by the player in
demonstrator’s intent, rather than simply reproducing his each inventory state, we turn to reinforcement learning to
precise actions. (Thurau et al, 2004a) present an approach to reproduce his behaviour. In this scenario, the MDP’s actions
such behaviours based on artificial potential fields; here we are considered to be the choice to move to a given node from
consider the application of reinforcement learning and fuzzy the current position. Thus, the transition probabilities are
P ( s' = j | s = i, a = j ) = Eij
clustering techniques.
Topology Learning
To guide the agent along the same routes taken by the
As mentioned earlier, in the context of Quake, strategic
player, we assign an increasing reward to consecutive nodes
planning is mostly concerned with the efficient collection
in each path taken in each prototype, such that
R ( pi , ci , j ) = j
and monopolisation of items and the control of certain
important areas of the map. With this in mind, we first read
the set of all player locations l = {x, y, z} from the DM2
r
d ( s , p) −1
r r
m p (s) = P
∑i =1 d (s , i ) −1
where pi is a prototype, and ci,j is the jth cluster in the
associated movement sequence. Each successive node along r r
the path’s length receives a reward greater than the last, until
the final cluster (at which an inventory state change
occurred) is assigned the highest reward. If a path loops where s is the current inventory state, p is a prototype
back or crosses over itself en route to the goal, then the inventory state, P is the number of prototypes, d -1 is an
higher values will overwrite the previous rewards, ensuring inverse-distance or proximity function, and mp(s) is the
that the agent will be guided towards the terminal node while degree to which state vector s is a member of prototype p,
ignoring any non-goal-oriented diversions. Thus, as relative to all other prototypes. The utility configurations
mentioned above, the agent will emulate the player’s associated with each prototype are then weighted according
program-level behaviour, instead of simply duplicating his to the membership distribution, and the adjusted
exact actions. See Figure 7 above for an example. configurations superimposed; we also apply an online
discount to prevent the possibility of backtracking. The
c t +1 = max U ( y ), y ∈ {x | E ct x = 1}
can now run the value iteration algorithm in order to
compute the utility values for each node in the topological
In our case, it is important that every node in the map should Object Transience
possess a utility value under every state prototype by the end
Another important element of planning behaviour is the
of the learning process, thereby ensuring that the agent will
human’s understanding of object transience. A human
always receive strong guidance towards its goal. We adopt
player intuitively tracks which items he has collected from
the game value iteration approach outlined in (Hartley et al
which areas of the map, can easily estimate when they are
2004) - the algorithm is applied until all nodes have been
scheduled to reappear, and adjusts his strategy accordingly.
affected by a reward at least once. Figure 9 above shows the
In order to capture this, we introduce an activation variable
results of the value iteration algorithm on a typical path.
in the computation of the membership values; inactive items
are nullified, and the membership values are redistributed
Multiple Weighted Objectives among those items which are still active.
a (o p )d ( s , p) −1
r r
m p (s) = P
∑i=1 a(oi )d (s , i ) −1
Faced with a situation where several different items are of
strategic benefit, a human player will intuitively weigh their r r
respective importance before deciding on his next move. To
model this, we adopt a fuzzy clustering approach. On each
update, the agent’s current inventory is expressed as a where a, the activation of an item, is 1 if the object o at the
membership distribution across all prototype inventory terminal node of the path associated with prototype state p is
states. This is computed as: present, and 0 otherwise.
Figure 10 - The agent returns to a previously-visited point before some ammo items have respawned (1.1), and since they are inactive it
initially passes by (1.2); however, their sudden re-emergence (1.2) causes the utilities to reactivate, and the agent is drawn to collect
them (1.3) before continuing (1.4). Later, the agent returns once again (2.1). The items are now active, but since the agent has already
collected several shotgun pickups, the relevant membership values are insignificant; as a result, the agent ignores the pickups (2.2,
2.3), and continues on towards more attractive objectives (2.4)
Imitation Learning. In this way, further developments in our
Deploying the Agent research will guide future development of the API.
With the DM2 data extracted and the required values
computed, we can now deploy the agent. We extend any of QASE has already attracted some attention in academia;
the MatLabBots, overloading preMatLab to extract the researchers at Kyushu University in Japan expressed interest
player’s current position and inventory and pass these to in adopting it for use in their work, and more recently a PhD
MatLab. We then rewrite the MatLab template to instantiate student in California has contacted us with the same intent.
the agent and connect it to the server. On each update, As more individuals and institutions discover QASE, the
MatLab determines the closest matching state prototype and resulting feedback will aid us in continually improving the
node, extracts the relevant utility configuration, finds the set API. We hope that this paper will help to stimulate further
of nodes connected to the current node by examining the interest in QASE, in imitation learning, and in the potential
edge matrix, and selects the successor with the highest utility of games in AI research and education in general.
value; the position of this node is passed back to QASE. The To download the API and accompanying documentation,
agent’s postMatLab method is also overloaded, to please visit the QASE homepage: http://qase.vze.com
determine the direction between its current position and the
REFERENCES
¬
next node, and to set the agent’s movement accordingly. As Bauckhage C, Thurau C. & Sagerer G. (2003): Learning Humanlike Opponent
the agent traverses its environment, item pickups and in- Behaviour for Interactive Computer Games, Pattern Recognition, Vol 2781
game events will cause its inventory to change, resulting in a ¬ Byrne, R.W. and Russon, A.E. "Learning by Imitation: A Hierarchical
Approach", Behavioral and Brain Sciences (1998) 21, 667-721
corresponding change in the utility values and attracting the ¬ Elkan, C. "Using the Triangle Inequality to Accelerate k-Means", Proc. 20th
International Conference on Machine Learning 2003
¬
agent towards its next objective. Figure 10 shows the QASE
Fairclough, C., Fagan, M., MacNamee, B and Cunningham, P. Research
agent in action. Directions for AI in Computer Games. Technical report, 2001.
¬ Floyd, R.W., Algorithm 97, Shortest path, Comm. ACM. 5(6), 1962, 345
CONCLUSION ¬ Girlich, U., Unofficial Quake 2 DM2 Format Description, 2000
¬ Gorman, B & Humphrys, M: “Towards Integrated Imitation of Strategic
In this paper, we identified the lack of a fully-featured, Planning and Motion Modelling in Interactive Computer Games”, Proc. 3rd
Intl. Conf. in Computer Game Design & Technology, GDTW05, pages 92-99
¬
consolidated API as a major impediment to the adoption of Hartley, T, Mehdi, Q and Gough, N. “Applying Markov Decision Processes to
commercial games in AI education and research. We then 2D Real-Time Games”, Proc. CGAIDE 2004: p55-59
presented our QASE API, which has been developed to meet ¬ Hollnagel, E. (1993) Human reliability analysis: Context and control. L:AP
¬ Jenkins, OC and Mataric, MJ "Deriving Action and Behavior Primitives from
these requirements. Several of its more important features Human Motion Data". Proc. IEEE/RSJ IROS-2002, pages 2551-2556
were described, and their usefulness highlighted. A practical ¬ Laird, J. E. and v. Lent, M. (2000). Interactive Computer Games: Human-
Level AI’s Killer Application. AAAI, pages 1171-1178.
¬
demonstration of QASE as it has been used in our own Laird, J.E. Using a Computer Game to develop advanced AI. IEEE Computer,
research closed this contribution. pages 70 -75, July 2001.
¬ Martinez, T. and Schulten, K. (1991). A neural gas network learns topologies.
In Artificial Neural Networks. Elseviers Science Publishers
¬
FUTURE WORK Naraeyek, A. Computer Games - Boon or Bane for AI Research. Künstliche
Although we regard it as being very feature-rich and entirely Intelligenz, pages 43-44, February 2004
¬ Schaal, S. Is imitation learning the route to humanoid robots? Trends in
stable at this point, QASE will continue to develop as we Cognitive Sciences, 3(6):233-242, 1999
progress in our research. The two tracks of our work - that of ¬ Sklar, E., AD Blair, P Funes & J. Pollack, 1999: Training Intelligent Agents
Using Human Internet Data, 1st Asia-Pacific IAT
¬
investigating approaches to imitation learning and of Thurau, C., C. Bauckhage & G. Sagerer 2004a: Learning Humanlike
building an accompanying API - have thus far informed each Movement Behaviour for Computer Games, in Proc. 8th Intl. SAB Conf.
other; as mentioned earlier, QASE’s waypoint generator is ¬ Thurau, C., C. Bauckhage & G. Sagerer 2004b: Synthesising Movement for
Computer Games, in Pattern Recognition, Vol. 3175 of LCNS Springer
derived from the approach outlined in the section QASE and