1 PB

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

The Thirty-Second AAAI Conference

on Artificial Intelligence (AAAI-18)

Computational Results for


Extensive-Form Adversarial Team Games
Andrea Celli, Nicola Gatti
Politecnico di Milano
Piazza Leonardo da Vinci, 32
Milano, Italy
{andrea.celli, nicola.gatti}@polimi.it

Abstract the inefficiency a team can incur in normal-form games


when teammates cannot correlate their strategies w.r.t. when
We provide, to the best of our knowledge, the first computa- they can. They also provide algorithms to approximate the
tional study of extensive-form adversarial team games. These
games are sequential, zero-sum games in which a team of
Team-maxmin equilibrium—introduced by von Stengel and
players, sharing the same utility function, faces an adversary. Koller (1997)—that is the optimal solution when correlation
We define three different scenarios according to the commu- is not possible. Furthermore, it is known that finding a Team-
nication capabilities of the team. In the first, the teammates maxmin equilibrium is FNP-hard and inapproximable in ad-
can communicate and correlate their actions both before and ditive sense (Hansen et al. 2008; Borgs et al. 2010).
during the play. In the second, they can only communicate When extensive-form games enter the picture, adversarial
before the play. In the third, no communication is possible team games become much more intriguing, various forms of
at all. We define the most suitable solution concepts, and we correlation being possible. Nevertheless, to the best of our
study the inefficiency caused by partial or null communica- knowledge, this game class is still unexplored in the liter-
tion, showing that the inefficiency can be arbitrarily large in
ature. In the present paper, we focus on three main forms
the size of the game tree. Furthermore, we study the compu-
tational complexity of the equilibrium-finding problem in the of correlation (Forges 1986). In the first, preplay and in-
three scenarios mentioned above, and we provide, for each traplay communication is possible, corresponding to the
of the three scenarios, an exact algorithm. Finally, we em- case in which a communication device receives inputs from
pirically evaluate the scalability of the algorithms in random the teammates about the information they observe during
games and the inefficiency caused by partial or null commu- the play, and sends them recommendations about the ac-
nication. tion to play at each information set. In the second, only pre-
play communication among the teammates is possible, cor-
responding to the case in which a correlation device commu-
Introduction nicates a plan of actions to each teammate before playing the
The design of algorithms for strategic settings has been a game.1 Finally, in the third, no communication is possible2 .
central problem in Artificial Intelligence for several years, Original contributions. We formally define game mod-
with the aim of developing agents capable of behaving op- els capturing the three aforementioned cases and the most
timally when facing strategic opponents. Many efforts have suitable solution concepts (trivially, the Team-maxmin equi-
been made for 2-player games, e.g., finding a Nash equilib- librium in the third setting). Furthermore, we define three
rium (Lemke and Howson, Jr 1964; Gatti et al. 2012) and,
1
more recently, finding a Stackelberg equilibrium (Conitzer With only preplay correlation, three solution concepts are
and Sandholm 2006). The study of this latter problem paved known: Normal-Form, Extensive-Form, and Agent-Form Corre-
the way to the field of Security Games, which is, nowadays, lated Equilibrium. The spaces of players’ strategies achievable with
one of the application fields of non-cooperative game theory the three correlation devices are the same, while the players’ incen-
with the highest social impact (Tambe 2011). tive constraints are different (even if it is not known whether the
spaces of the outcomes for the three equilibria in adversarial team
Fewer results are known, instead, about games with more games are different). The complexity of computing the equilibrium
than 2 players—except for games with particular structures, maximizing the team’s utility in adversarial team games with at
e.g., congestion and polymatrix games (Nisan et al. 2007). least 2 teammates is NP-hard for all the three equilibria (von Sten-
An interesting class of games widely unexplored in the lit- gel and Forges 2008). In our paper, we focus on the first one.
erature is that one of adversarial team games. Here, a team 2
This setting is frequent in practice. Consider, for example, a
of players with the same utilities faces an adversary (von security problem where a set of defensive resources from differ-
Stengel and Koller 1997). These games can model many re- ent security agencies are allocated to protect an environment at risk
alistic security scenarios and can provide tools to coordinate but, due to organizational constraints, they are not able to share in-
teammates acting strategically. Basilico et al. (2017) study formation with each other. The resources have the same goal (i.e.,
to secure the environment) but cannot coordinate strategy execu-
Copyright  c 2018, Association for the Advancement of Artificial tion. The same happens when a set of resources has to operate in
Intelligence (www.aaai.org). All rights reserved. stealth mode.

965
inefficiency indices for the equilibria, capturing: the ineffi- and x2 P h. As usual in game theory, we assume, for each
ciency caused by using a correlation device in place of a a P A, there is only one h s.t. a P ρphq. We focus on games
communication device, the inefficiency caused by not using with perfect recall where, for each player i and each h P Hi ,
any form of communication in place of a communication decision nodes belonging to h share the same sequence of
device, and the inefficiency caused by not using any form of moves of player i on their paths from the root.
communication in place of a correlation device. We provide
lower bounds to the worst-case values of the inefficiency in- The study of extensive-form games is commonly con-
dices, showing that they can be arbitrarily large. ducted under other equivalent representations. The normal
Furthermore, we study the computational complexity of form is a tabular representation in which player i’s actions
the problems of finding the three equilibria with the differ- are plans p P Pi , specifying a move at each information set
ent forms of correlation, and we design, for each equilib- in Hi , and player i’s utility is Ui1 : P1 ˆ . . . ˆ Pn Ñ R s.t.
rium, an exact algorithm. We show that when a communi- Ui1 pp1 , . . . , pn q “ Ui plq, where l P L is the terminal node
cation device is available, an equilibrium can be computed reached when playing plan profile pp1 , . . . , pn q. Basically,
in polynomial time (even in the number of players) by a in the normal-form representation, players decide their be-
2-stage algorithm. In the first stage, the game is cast into havior in the whole game ex ante the play. The reduced nor-
an auxiliary 2-player equivalent game, while, in the second mal form is obtained by deleting replicated strategies from
stage, a solution is found by linear programming. When a the normal form. However, the size of the reduced normal
correlation device is available, the problem can be easily form is exponential in the number of information sets. A
shown to be FNP-hard. In this case, we prove that there is mixed strategy σi of player i P N is a probability distri-
always an equilibrium with a small (linear) support, and we bution on her set of pure strategies Pi . In the agent form—
design an equilibrium-finding algorithm, based on a classi- whose definition is omitted due to reasons of space, see (Sel-
cal column-generation approach, that does not need to enu- ten 1975)—, players play behavioral strategies, denoted by
merate an exponential space of actions before its execution. πi ph, aq, each specifying a probability distribution over the
Our algorithm exploits an original hybrid representation of actions ρphq available at information set h of player i. Two
the game combining both normal and sequence forms. The strategies, even of different representations, are realization
column-generation oracle is shown to deal with an APX- equivalent if, for any strategy profile of the opponents, they
hard problem, with an upper approximation bound decreas- induce the same probability distribution over the outcomes.
ing exponentially in the depth of the tree. We also provide an In a finite perfect-recall game, any mixed strategy can be
approximation algorithm for the oracle that matches certain replaced by an equivalent behavioral one (Kuhn 1953).
approximation guarantees on a subset of instances. When no
communication is possible, the equilibrium-finding problem
can be easily shown to be FNP-hard. In this case, the prob- Both normal and agent forms suffer from computational
lem can be formulated as a non-linear programming problem issues that can be overcome by using the sequence form (von
and solved by resorting to global optimization tools. Stengel 1996), whose size is linear in the size of the game
Finally, we empirically evaluate the scalability of our al- tree. A sequence for player i, defined by a node x of the
gorithms in random game instances. We also evaluate the game tree, is the subset of A specifying player i’s actions
inefficiency for the team of not adopting a communication on the path from the root to x. We denote the set of se-
device, showing that, differently from the theoretical worst- quences of player i by Qi , these are the sequence-form ac-
case bounds, the empirical inefficiency is extremely small. tions of player i. A sequence is said terminal if, together
with some sequences of the other players, leads to a termi-
Preliminaries nal node. Moreover, we denote by qH the fictitious sequence
leading to the root node and with qa P Qi the extended se-
A perfect-information extensive-form game (Shoham and
quence obtained by appending a P A to sequence q P Qi .
Leyton-Brown 2009) is a tuple pN, A, V, L, ι, ρ, χ, U q,
The sequence-form strategy, said realization plan, is a func-
where: N is a set of n players, A is a set of actions, V is the
tion ri : Qi Ñ R associating each sequence q P Qi with its
set of nonterminal decision nodes, L is the set of terminal
probability of being played. A well-defined sequence-form
(leaf) nodes, ι : V Ñ N is a function returning the player
strategy is such that, for each i P N , ri pqřH q “ 1, for each h
acting at a given decision node, ρ : V Ñ 2A is the action
and sequence q leading to h, ´ri pqq ` aPρphq ri pqaq “ 0
function—assigning to each choice node a set of available
actions—, χ : V ˆ A Ñ V Y L is the successor func- and ri pqq ě 0. Constraints are linear in the number of se-
tion, and U “ tU1 , U2 , . . . , Un u is the set of utility func- quences and can be written as Fi ri “ fi , where Fi is an
tions in which Ui : L Ñ R specifies utilities over terminal opportune matrix and fi is an opportune vector. The utility
nodes for player i. Let Vi be the inclusion-wise maximal function of player i is represented as an n-dimensional ma-
set of decision nodes such that, for all x P Vi , i “ ιpxq. trix defined only for profiles of terminal sequences leading
Then, an imperfect-information extensive-form game is a tu- to a leaf. With a slight abuse of notation, we denote it by Ui .
ple pN, A, V, L, ι, ρ, χ, U, Hq, where pN, A, V, L, ι, ρ, χ, U q
is an extensive-form game with perfect information and A Nash equilibrium (NE), whose definition does not de-
H “ tH1 , H2 , . . . , Hn u is the set of information sets, in pend on the representation of the game, is a strategy profile
which Hi is a partition of Vi such that, for any x1 , x2 P Vi , in which no player can improve her utility by deviating from
ρpx1 q “ ρpx2 q whenever there exists a h P Hi where x1 P h her strategy once fixed the strategies of all the other players.

966
Extensive-Form Adversarial Team Games, Definition 4 (Correlation device)
Ś A correlation
Ś device is a
Equilibria, and Inefficiency pair ptPi uiPT , RCor q. RCor : iPT Pi Ñ Δp iPT Pi q is the
recommendation function which returns a probability distri-
We initially provide the formal definition of a team.
bution over the reduced joint plans of the teammates.
Definition 1 (Team) Given an extensive-form game with
imperfect information pN, A, V, L, ι, ρ, χ, U, Hq, a team T Notice that, while a communication device provides its
is an inclusion-wise maximal subset of players T Ď N such recommendations drawing actions from probability distri-
that, for any i, j P T , for all l P L, Ui plq “ Uj plq. butions during the game, a correlation device does that only
Ť before the beginning of the game. Resorting to these defini-
We denote by HT the set iPT Hi and by AT the set tions, we introduce the following solution concepts.
of actions available at the information sets in HT . An
extensive-form team game (EF-TG) is a generic extensive- Definition 5 (Team-maxmin equilibrium variations)
form game where at least one team is present. Von Sten- Given a communication device—or a correlation device—
gel and Koller (1997) analyze zero-sum normal-form games for the team, a Team-maxmin equilibrium with communi-
where a single team plays against an adversary. We extend cation device (TMECom)—or a Team-maxmin equilibrium
this game model to the scenario of extensive-form games. with correlation device (TMECor)—is a Nash equilibrium
in which all teammates follow their recommendations and,
Definition 2 (STSA-EF-TG) A zero-sum single-team only for TMECom, report truthfully their information.
single-adversary extensive-form team game (STSA-EF-TG)
is a game pN, A, V, L, ι, ρ, χ, U, Hq in which: Notice that in our setting (i.e., zero-sum games), both
TMECom and TMECor maximize team’s utility. We state
• N “ T Y tnu, where set T defines a team T (as in Defi-
the following, whose proof is straightforward.
nition 1) and player n is the adversary (A);
• for each l P L it holds: UA plq “ ´pn ´ 1qUT plq, where Property 1 (Strategy space) The space of lotteries over
UT denotes the utility of teammates and UA that one of the outcomes achievable by using a communication device
the adversary. includes that one of the lotteries achievable by using a cor-
relation device, that, in its turn, includes the space of the
When the teammates have no chance to correlate their strate- lotteries achievable without any device.
gies, the most appropriate solution concept is the Team-
maxmin equilibrium (TME). Formally, the TME is defined Let vNo , vCom , vCor be the utility of the team at, respec-
śn tively, the TME, the TMECom and the TMECor. From the
as arg maxr1 ,...,rn´1 minrn UT i“1 ri . By using the same
arguments used by von Stengel and Koller (1997) for the property above, we can easily derive the following.
case of normal-form games, it follows that also in extensive- Property 2 (Equilibria utility) The game values obtained
form games a TME is unique except for degeneracy and with the different solution concepts introduced above are
it is the NE maximizing team’s expected utility. Neverthe- such that vCom ě vCor ě vNo .
less, in many scenarios, teammates may exploit higher cor- In order to evaluate the inefficiency due to the impossibil-
relation capabilities. While in normal-form games these ca- ity of adopting a communication or correlation device, we
pabilities reduce to employing a correlation device as pro- resort to the concept of Price of Uncorrelation (P oU ), previ-
posed by (Aumann 1974), in extensive-form games we can ously introduced in (Basilico et al. 2017) as a measure of the
distinguish different forms of correlation. More precisely, inefficiency of the TME w.r.t. the TMECor in normal-form
the strongest correlation is achieved when teammates can games. In these games, the P oU is defined as the ratio be-
communicate both before and during the execution of the tween the utility given by the TMECor and the utility given
game (preplay and intraplay communication), exchanging by the TME, once all the team’s payoffs are normalized in
their private information by exploiting a mediator that rec- r0, 1s. For extensive-form games, we propose the following
ommends actions to them. This setting can be modeled by variations of the P oU to capture all the possible combina-
resorting to a communication device defined in a similar way tions of different forms of correlation.
to (Forges 1986). A weaker correlation is achieved when
teammates can communicate only before the play (preplay Definition 6 (Inefficiency indices) P oUCom/No “ vvCom No
,
communication). This setting can be modeled by resorting to P oUCor/No “ vvCor
No
, P oU Com/Cor “ vCom
vCor .
a correlation device analogous to that one for normal-form In perfect-information games all these indices assume a
games. We formally define these two devices as follows (as value of 1, the solution being unique unless degeneracy by
customary, Δp¨q denotes the simplex over ¨). backward induction. With imperfect information the indices
Definition 3 (Communication device) A communication can be larger than 1. In normal-form games, the tight upper
device is a triple pHT , AT , RCom q where HT is the set of bound to P oU is mn´2 , where m is the number of actions
inputs (i.e., information sets) that teammates can communi- of each player and n is the number of players (Basilico et
cate to the mediator, AT is the set of outputs (i.e., actions) al. 2017). Using a definition based on m is not suitable for
that the mediator can recommend to the teammates, and extensive-form games, where each player may have a differ-
RCom : 2HT ˆ 2AT Ñ ΔpAT q is the recommendation ent number of actions per node. Thus, we state the bounds in
function that associates each information set h P HT terms of |L| (i.e., the number of terminal nodes). The follow-
with a probability distribution over ρphq, as a function of ing three examples provide lower bounds to the worst-case
information sets previously reported by teammates and of values of the indices, showing that the inefficiency may be
the actions recommended by the mediator in the past. arbitrarily large in L.

967
A Example 3 (Lower bound for worst-case P oUCom/Cor )
... Consider the game presented in Example 1. Since vCom “ 1
1 m
1.1 1.m and vCor “ 1{m, it follows P oUCom/Cor “ m. The struc-
... m
ture of the game tree is such that |L| “ mn´1 and thus
1 1
2.1
P oUCom/Cor “ |L| n´1 . Notice that, in this case, the inef-
1 ... m 1 ... m ficiency is maximized when n “ 3, which corresponds to
having a team of two
a members. Thus, in the worst case w.r.t.
1 0 0 1 n, P oUCom/Cor “ |L|.

Figure 1: A game with a spy used in Example 1. Finding a TMECom


We show that there is a polynomial-time TMECom-finding
algorithm. Indeed, we prove that the problem of finding a
Initially, to ease the presentation, we define a specific type TMECom is equivalent to finding a 2-player maxmin strat-
of team player that we call spy. egy in an auxiliary 2-player game with perfect recall and that
Definition 7 (Spy player) Player i P T is said to be a spy the auxiliary game can be built in polynomial time.
if, for each h P Hi , |ρphq| “ 1 and h is a singleton. First, we define the structure of the auxiliaryŤ
game we use.
Let Γ be an extensive-form game and Q “ iPN Qi . We
A spy just observes the actual state of the game and her con-
tribution to the play is only due to her communication capa- define the following functions. Function lead : V Y L Ñ 2Q
bilities. Notice that the introduction of a spy after decision returns the sequence profile constituting the path from the
nodes of the adversary does not affect the team’s utility in a root to a given node of the tree. Function path : V ˆ 2N Ñ
TMECor (the team’s joint plans are the same) but improves 2Q is s.t., for each x P V and each set of players G Ď N ,
the team’s capabilities, and final utility, in a TMECom. # ˇ +
ď ˇ ď
ˇ 1 1
Example 1 (Lower bound for worst-case P oUCom/No ) pathpx|Gq “ qĂ Qi ˇDq Ă Qi ^ q Y q “ leadpxq .
ˇ
Consider a STSA-EF-TG with n players and m ě 2 actions iPG iPN zG

for each player at every decision node except for the first Intuitively, pathpx|Gq returns the unique profile of se-
team player, who is a spy. The game tree is structured as quences of players in G leading to x when combined with
follows (see Figure 1 for the case with n “ 3). some sequences of the players in N zG.
• The adversary plays first; The following definition describes the information struc-
• then the spy observes her move; ture of the auxiliary extensive-form game.
• each one of the other teammates is assigned one of the fol- Definition 8 (G-observable game) For any game Γ “
lowing levels of the game tree and all her decision nodes pN, A, V, L, ι, ρ, χ, Hq and any set of players G Ď N ,
are part of the same information set;
´Ť game¯Γ̂ is´aŤtuple pN, ¯
the G-observable A, V, L, ι, ρ, χ, Ĥq,
• UT “ 1 iff, for each i P T zt1u and for each h P Hi , the
where Ĥ “ iPG iĤ Y H
iPN zG i is such that:
action chosen at h is equal to the one selected by A.
We have vCom “ 1, vNo “ m2´n and thus P oUCom/No “ 1. for each decision node x P V , there exists one and only
mn´2 . Since the tree structure is such that |L| “ mn´1 one ĥ P Ĥ s.t. x P ĥ and ιphq “ ιpĥq where h denotes the
1
we obtain P oUCom/No “ |L|p1´ n´1 q . Once |L| is fixed, the information set containing x in Γ;
inefficiency is monotonically increasing in n, but n is upper 2. for each player i P G, Ĥi is the set with the lowest pos-
bounded by n “ log2 p|L|q ` 1 (corresponding to the case sible cardinality s.t. for each ĥ P Ĥi and for each pair of
in which each team player except the spy has the minimum decision nodes x, x1 P ĥ, it holds:
number of actions, i.e., 2). It follows that, in the worst case ´ ¯ ´ ¯
w.r.t. n, P oUCom/No “ |L|
2 .
pathpx|Gq “ pathpx1 |Gq ^ Dh P Hi |x P h^x1 P h .
Example 2 (Lower bound for worst-case P oUCor/No ) In a G-observable extensive-form game, players belonging
Consider a STSA-EF-TG with n players and m actions to G are fully aware of the moves of other players in G and
at each of their decision nodes, in which each level of share the same information on the moves taken by players in
the game tree is associated with one player and forms
N zG. We show that we can build Γ̂ in polynomial time.
a unique information set. UT “ 1 iff all the teammates
choose the same action of the adversary, who plays first. Lemma 1 (T -observable game construction)
This case corresponds to the worst case for P oU in The T-observable game Γ̂ of a generic STSA-EF-TG Γ can
normal-form games. Here we formulate the bound in be computed in polynomial time.
terms of |L|. We have vNo “ m1´n and vCor “ 1{m. It Proof. We provide the sketch of an algorithm to build a
follows that P oUCor/No “ mn´2 . This time, |L| “ mn T -observable game (i.e., a G-observable game with G “ T )
2
and thus P oUCor/No “ |L|p1´ n q . The worst case w.r.t. n in time and space polynomial in the size of the game tree.
is reached when m “ 2 and n “ log2 p|L|q. Therefore, The algorithm employs nested hash-tables. The first hash-
P oUCor/No “ |L|
4 . table associates each joint sequence of the team with another

968
hash-table, which is indexed over information sets and has each Pi is exponentially large in the size of the tree. We pro-
as value the information set id to be used in Γ̂. Γ is traversed vide here a more efficient method that can also be used in an
in a depth-first manner while keeping track of the sequence anytime fashion, without requiring any exponential enumer-
leading to the current node. For each x P V s.t. ιpxq P T , ation before the execution of the algorithm. In our method,
a search/insertion over the first hash-table is performed by we use a hybrid representation that, to the best of our knowl-
hashing pathpx|T q. Then, once the sequence-specific hash- edge, has not been used in previous works.
table is found, the information set is assigned a new id if it is Hybrid representation. In our representation, A’s strat-
not already present as a key. Γ̂ is built by associating to each egy is represented in sequence form, while the team plays
decision node of the team a new information set as specified over jointly-reduced plans, as formally defined below. Given
in the hash-table. The worst-case running time is Op|V |2 q.l a generic STSA-EF-TG Γ, let us denote with Pr “
tPr,1 , . . . , Pr,n u the set of actions of the reduced normal-
Theorem 2 (TMECom computation) Given a STSA-EF- form of Γ, whereŚ Pr,i is the set of reduced plans for player i.
TG and a communication device for T , the unique (unless Therefore, iPT Pr,i is the set of joint Ś reduced plans of the
degeneracy) TMECom can be found in polynomial time. team. Let function terminal : QA ˆt iPT Pr,i u Ñ LYt∅u
Proof. Given a STSA-EF-TG Γ, the use of a communica- be s.t. it returns, for a given pair pqA , pq, the terminal node
tion device for the team T changes the information structure reached when the adversary plays qA and the team mem-
bers, at each of their information set, play according to p. If
of the game inducing a T -observable game Γ̂. A TMECom
can be computed over Γ̂ as follows. Given a communica-
no terminal node is reached, Ś ∅ is returned. We define some
equivalence classes over iPT Pr,i by the relation „:
tion device pHT , AT , RCom q, RCom enforces a probability Ś
distribution γ over the set of feedback rules. γ is chosen Definition 9 The equivalence
Ś relation „ over iPT Pr,i is
in order to maximize the expected utility of the team. In s.t., given p1 , p2 P iPT Pr,i , p1 „ p2 iff, for each qA P
this setting, no incentive constraints are required because QA , terminalpqA , p1 q “ terminalpqA , p2 q.
teammates share the same utility function and therefore, un- Definition 10 (Jointly-reduced
Ś plans) The set of jointly-
der the hypothesis that γ maximizes it, it is in their best reduced plans Pjr Ď iPT Pr,i is obtained by picking ex-
interest to follow the recommendations sent by the device actly one representative from each equivalence class of „.
and to report truthfully their information. Thus, considering
The team’s utility function is represented by the sparse
Ť path to be defined over information sets and
the function
|QA | ˆ |Pjr | matrix Uh . Given a pair pqA , pjr q P
ĤT “ iPT Ĥi , γ reduces to a distribution over rules of QA ˆ Pjr , UT pterminalpqA , pjr qq is stored in Uh iff
type tβ “ pβ h qhPĤT |β h : pathph|T q Ñ ρphq, @h P ĤT u. terminalpqA , pjr q ‰ ∅. Notice that Uh is well defined since
We are left with an optimization problem in which we each pair pqA , pjr q leads to at most one terminal-node.
have to choose γ s.t. the worst-case utility of the team is Let σT denote the team’s strategy over Pjr . The problem
maximized. This is equivalent to a 2-player maxmin prob- of finding a TMECor in our hybrid representation can be
lem over Γ̂ between A and a player playing over team’s joint formulated as the following LP named HYBRID - MAXMIN:
sequences. By construction, the team player has perfect re-
ÿ
call and thus the maxmin problem can be formulated as an arg max fA phqvphq s.t.
LP in sequence form, requiring polynomial time. l σT ,v
hPHA YthH u
ÿ ÿ
FA ph, qA qvphq ´ Uh pqA , pqσT ppq ď 0 @qA P QA
Finding a TMECor hPHA YthH u
ÿ
pPPjr

σT ppq “ 1
We initially focus on the computational complexity of the pPPjr
problem of searching for a TMECor.
σT ppq ě 0 @p P Pjr
Theorem 3 (TMECor complexity) Finding a TMECor is
FNP-hard when there are two teammates, each with an ar-
bitrary number of information sets, or when there is an arbi- composed of |QA | ` 1 constraints (except σT ppq ě 0 con-
trary number of teammates, each with one information set. straints) and an exponential number of variables σT . Thus,
we can state the following proposition.
The first result directly follows from the reduction presented
in (von Stengel and Forges 2008, Theorem 1.3) since the Proposition 1 There exists at least one TMECor in which
game instances used in the reduction are exactly STSA-EF- the number of joint plans played with strictly positive prob-
TGs with 2 teammates. The second result can be proved ability by the team is at most |QA |.
by adapting the reduction described in (Koller and Megiddo Proof. The above LP admits a basic optimal solution with at
1992, Proposition 2.6), assigning each information set of the most |QA | ` 1 variables with strictly positive values (Shap-
game instances to a different teammate. ley and Snow 1950). Since v is always in the basis (indeed,
In principle, a TMECor can be found by casting the game we can add a constant to make the team’s utility in each
in normal form and then by searching for a Team-maxmin terminal node strictly positive without affecting equilibrium
equilibrium with correlated strategies. This latter equilib- strategies), the joint plans in the basis are |QA |. l
rium can be found in polynomial time in the size of the nor- Proposition 1 shows that the NP-hardness of the problem
mal form, which, however, is given by P1 ˆ . . . ˆ Pn , where is merely due to guessing the jointly-reduced plans played

969
with strictly positive probability in a TMECor. Thus, we can Algorithm 1 Hybrid Column Generation
avoid enumerating entirely Pjr before executing the algo- 1: function HYBRID-COL-GEN(Γ, F1 , . . . , Fn´1 , FA ) Ź Γ is a generic
rithm by working with a subset of jointly-reduced plans built STSA-EF-TG and Fi are sequence-form constraint matrices
progressively, in a classical column-generation fashion (see, 2: Uh “ 0, Pcur “ tu, v Ð 0 Ź initialization
e.g., (McMahan, Gordon, and Blum 2003)). 3: r A Ð realization plan equivalent to a uniform behavioral mixed strategy
Column-generation algorithm. The pseudocode is given 4: br Ð BR-ORACLEpΓ, tF1 , . . . , Fn´1 u, r A q Ź call to the oracle
5: while br R Pcur do
in Algorithm 1. It receives in input the game tree and
6: Pcur Ð Pcur Y br
the sequence-form constraint matrices Fi of all the play- 7: players’ utilities in pqA , brq for every qA are added to Uh
ers (Line 1). Then, the algorithm is initialized, assigning 8: σT Ð solve HYBRID - MAXMIN problem with pUh , Pcur , FA q
a matrix of zeros to Uh , an empty set to Pcur , and 0 to 9: r A Ð solve HYBRID - MINMAX problem with pUh , Pcur , FA q
v (Line 2). Notice that Uh is sparse and therefore its rep- 10: br Ð BR-ORACLEpΓ, tF1 , . . . , Fn´1 u, r A q
resentation requires a space equal to the number of non- 11: return pr A , σT q
null entries. rA is initialized as a realization plan equivalent
to a uniform behavioral mixed strategy, i.e., the adversary,
at each information set, randomizes uniformly over all the
available actions (Line 3). Then, the algorithm calls the BR- The column-generation oracle solving BR-T can be formu-
ORACLE (defined below) to find the best response of the lated as the following integer linear program (ILP):
team given the adversary’s strategy rA (Line 4). Lines 6-10
are repeated until an optimal solution is found. Initially, br is ÿ
arg max UT plqxplqr A ppathpl|tnuqq s.t.
added to Pcur (Line 6) and players’ utilities at nodes reached r1 ,...,rpn´1q ,x
lPL
by pqA , brq for every qA are added to Uh . Then, the algo- ÿ
Fi ph, qi qri pqi q “ fi phq
@iPT ,
@hPHi YthH u
rithm solves the maxmin (HYBRID - MAXMIN) and minmax qi PQi
(HYBRID - MINMAX) problems restricted to Pcur (Lines 8 xplq ď ri pqi q
@iPT ,@lPL,
@qi Ppathpl|tiuq
and 9), where the HYBRID - MINMAX problem is defined as:
xplq P t0, 1u @l P L

arg min v s.t.


rA ,v
ÿ where xplq is a binary variable which is equal to 1 iff, for
v´ Uh pq, pjr qrA pqq ě 0 @pjr P Pjr
qPQA
all the sequences qi P Q necessary to reach l, it holds
ÿ ri pqi q “ 1. Notice that the oracle returns a pure realiza-
FA ph, qq “ fA phq @h P HA
qPQA
tion plan for each of the teammates. Team’s best-response
is a jointly-reduced realization plan that can be derived as
rA pqq ě 0 @q P QA
follows. Denote with QL i the set of sequences played with
probability one by i that are not subsets of any other se-
Finally, the algorithm calls BR-ORACLE to find the best quence played with positive probability. Let p1i be the re-
response to rA (Line 10). duced normal-form plan of player i specifying all and only
Best-response oracle. Given a generic STSA-EF-TG Γ, actions played in the sequences belonging to QL i . The joint
we denote the problem of finding the best response of the plan p1 “ pp11 , . . . , p1n´1 q is s.t. p1 P Pjr .
team against a given a fixed realization plan rA of the adver- A simple approximation algorithm can be obtained by a
sary over Γ as BR-T. This problem is shown to be NP-hard continuous relaxation of the binary constraints xplq P t0, 1u.
in the reduction used for (von Stengel and Forges 2008, The- The resulting mathematical program is linear and there-
orem 1.3), where we can interpret the initial chance move as fore solvable in polynomial time. An approximated solu-
the fixed strategy of the adversary. We can strengthen such tion can be obtained by randomized rounding (Raghavan
a hardness result as follows (the proofs are provided in the and Tompson 1987). When considering game trees encod-
full version of the paper, available on the arXiv website): ing MAX-SAT instances (see the proof of Theorems 4), the
approximation algorithm matches the ratio guaranteed by
Theorem 4 BR-T is APX-hard. randomized-rounding for MAX-SAT (details are given in the
Let αp¨q P r0, 1s be the best approximation bound of the full version of the paper).
maximization problem p¨q.
Theorem 5 Denote with BR-T-h the problem BR-T over
Finding a TME
STSA-EF-TG instances of fixed maximum depth 3h and We recall that finding a TME is hard, since it is hard even
branching factor variable at each decision-node, it holds: with normal-form games (Hansen et al. 2008).
αBT-T-h ď pαMAX-SAT qh . Theorem 6 (TME complexity) Finding a TME is FNP-
hard and its value is inapproximable in additive sense even
This means that the upper bound on the approximation factor with binary payoffs.
decreases exponentially as the depth of the tree increases3 .
The problem of finding a TME can be formulated as the fol-
3
Notice that αMAX-SAT ď 7{8, see (Håstad 2001). lowing non-linear mathematical programming problem:

970
P oUCom/Cor P oUCom/N o P oUCor/N o
1.8 1.07 1.025
ν

average P oUCom/N o
average P oUCom/Cor

average P oUCor/N o
1.7 1.06 1.020
1.6 0.25 1.05
1.5 0.5 1.04 1.015
1.4 0.75 1.03 1.010
1.3 1.0 1.02 1.005
1.2 1.01
1.1 1.00 1.000
1.0 0.99 0.995
3 4 5 6 7 8 9 10 11 3 4 5 6 7 8 9 3 4 5 6 7 8 9
tree depth tree depth tree depth

Figure 2: Average empiric inefficiency indices with 3 players and some values of ν.

104
compute times (seconds)

102

100
TME
−2
TMECor
10 TMECom
3 4 5 6 7 8 9 10 11 12 13 14 15
tree depth

Figure 3: Average compute times of the algorithms and their box plots with 3 players and ν “ 0.5.

(thus, when it is equal to 0 the game is with perfect infor-


mation), while guaranteeing perfect recall for every player.
max vphH q s.t.
r1 ,...,rpn´1q Finally, payoffs associated with terminal nodes are randomly
ÿ drawn from a uniform distribution over r0, 1s. We generate
FA ph, qA qvphq ď
hPHA YthH u 20 game instances for each combination of the following
ď
ÿ
pUT pqT , qA q
ź
ri pqT piqqq @qA P QA
parameters’ values: n P t3, 4, 5u, d P tn, . . . , 15u with step
qT PQT iPT size 1 (i.e., for games with 5 players, d P t5, 6, . . . , 15u),
ÿ
Fi ph, qi qri pqi q “ fi phq
@iPT , ν P t0.0, 0.25, 0.5, 0.75, 1.0u. For simplicity, we fix the
@hPHi YthH u
qi PQi branching factor to 2 (this value allows us to maximize d
ri pqi q ě 0 @iPT , and it is also the worst case for the inefficiency indices).
@qi PQi
The algorithms are implemented in Python 2.7.6, adopt-
ing GUROBI 7.0 for LPs and ILPs, AMPL 20170207 and
global optimization solver BARON 17.1.2 (Tawarmalani
where QT is the set of team’s joint sequences and qT piq and Sahinidis 2005). We set a time limit to the algorithms of
identifies the sequence of player i in qT . This program can 60 minutes. All the algorithms are executed on a UNIX com-
be solved exactly, within a given numerical accuracy, by puter with 2.33GHz CPU and 128 GB RAM. We discuss the
means of global optimization tools in exponential time. main experimental results with 3 players below, while the
results with more players are provided in the full version of
Experimental Evaluation the paper. Since the computation of the TMECor from the
Experimental setting. Our experimental setting is based on reduced normal form is impractical for d ě 5, we use only
randomly generated STSA-EF-TGs. The random game gen- Algorithm 1 employing the exact oracle (this demonstrated
erator takes as inputs: the number n of players, a probability very fast on every instance).
distribution over the number of actions available at each in- Empirical PoUs. We report in Fig. 2 the average em-
formation set, the maximum depth d of the tree, and a param- piric inefficiency indices with 3 players for some values of
eter ν for tuning the information structure of the tree. Specif- ν. We observe that, despite the theoretical worst-case value
ically, this parameter encodes the probability with which a increases in L, the empiric increase, if any, is negligible. For
newly created decision-node, once it has been randomly as- instance, the worst-case value of P oUCom{Cor with n “ 3
signed to a player, is assigned to an existing information-set and L “ 211 is ą 45, while the average empiric value is less

971
than 2. We also observe that the inefficiency increases in ν, Håstad, J. 2001. Some optimal inapproximability results.
suggesting that it may be maximized in normal-form games. Journal of the ACM (JACM) 48(4):798–859.
Compute time. We report in Fig. 3 the average compute Koller, D., and Megiddo, N. 1992. The complexity of two-
times of the algorithms and their box plots with 3 play- person zero-sum games in extensive form. Games and eco-
ers and ν “ 0.5 (the plot includes instances reaching the nomic behavior 4(4):528–552.
time limit as this not affects results presentation). As ex- Kuhn, H. W. 1953. Extensive Games and the Problem of
pected, the TMECom computation scales well, allowing one Information. Princeton University Press. 193–216.
to solve games with more than 16,000 terminal nodes in
the time limit. The performances of Algorithm 1 (TMECor) Lemke, C. E., and Howson, Jr. 1964. Equilibrium Points of
are remarkable since it solves games with more than 2,000 Bimatrix Games. Journal of the Society for Industrial and
terminals in the time limit, and presents a narrow boxplot, Applied Mathematics 12(2):413–423.
meaning that the variance in the compute time is small. No- McMahan, H. B.; Gordon, G. J.; and Blum, A. 2003. Plan-
tice that, with d ď 10, the compute times of TMECom and ning in the presence of cost functions controlled by an ad-
TMECor are comparable, even if the former is computa- versary. In Proceedings of the 20th International Conference
tionally hard while the latter is solvable in polynomial-time. on Machine Learning (ICML-03), 536–543.
As expected, the TME computation does not scale well and Nisan, N.; Roughgarden, T.; Tardos, E.; and Vazirani, V.
its compute time is extremely variable among different in- 2007. Algorithmic game theory, volume 1. Cambridge Uni-
stances. versity Press.
Raghavan, P., and Tompson, C. D. 1987. Randomized
Conclusions rounding: a technique for provably good algorithms and al-
In this paper, we focus on extensive-form team games with gorithmic proofs. Combinatorica 7(4):365–374.
a single adversary. Our main contributions include the def- Selten, R. 1975. Reexamination of the perfectness con-
inition of game models employing different correlation de- cept for equilibrium points in extensive games. International
vices and their suitable solution concepts. We study the in- journal of game theory 4(1):25–55.
efficiency a team incurs employing various forms of correla- Shapley, L. S., and Snow, R. N. 1950. Basic solutions of
tion, providing lower bounds to the worst-case values of the discrete games. Annals of Mathematics Studies 24:27–35.
inefficiency indices that are arbitrarily large in the game tree.
Shoham, Y., and Leyton-Brown, K. 2009. Multiagent sys-
Furthermore, we study the complexity of finding the equilib-
tems: Algorithmic, game-theoretic, and logical foundations.
ria, and we provide exact algorithms. Finally, we experimen-
tally evaluate the scalability of our algorithms and the empir- Tambe, M. 2011. Security and game theory: algorithms,
ical equilibrium inefficiency in random games. In the future, deployed systems, lessons learned. Cambridge University
it would be interesting to study approximate equilibrium- Press.
finding algorithms in order to reach an improved scalability Tawarmalani, M., and Sahinidis, N. V. 2005. A polyhedral
in all the three correlation scenarios. branch-and-cut approach to global optimization. Mathemat-
ical Programming 103:225–249.
References von Stengel, B., and Forges, F. 2008. Extensive-form corre-
Aumann, R. 1974. Subjectivity and correlation in ran- lated equilibrium: Definition and computational complexity.
domized strategies. Journal of Mathematical Economics Mathematics of Operations Research 33(4):1002–1022.
1(1):67–96. von Stengel, B., and Koller, D. 1997. Team-maxmin equi-
libria. Games and Economic Behavior 21(1):309 – 321.
Basilico, N.; Celli, A.; Nittis, G. D.; and Gatti, N. 2017.
Team-maxmin equilibrium: efficiency bounds and algo- von Stengel, B. 1996. Efficient computation of behavior
rithms. In AAAI. strategies. Games and Economic Behavior 14(2):220 – 246.
Borgs, C.; Chayes, J. T.; Immorlica, N.; Kalai, A. T.; Mir-
rokni, V. S.; and Papadimitriou, C. H. 2010. The myth of
the folk theorem. Games and Economic Behavior 70(1):34–
43.
Conitzer, V., and Sandholm, T. 2006. Computing the optimal
strategy to commit to. In ACM EC, 82–90.
Forges, F. 1986. An approach to communication equilibria.
Econometrica 1375–1385.
Gatti, N.; Patrini, G.; Rocco, M.; and Sandholm, T. 2012.
Combining local search techniques and path following for
bimatrix games. In UAI, 286–295.
Hansen, K. A.; Hansen, T. D.; Miltersen, P. B.; and
Sørensen, T. B. 2008. Approximability and parameterized
complexity of minmax values. In WINE, 684–695.

972

You might also like