Ipsj GPWS2018022
Ipsj GPWS2018022
Ipsj GPWS2018022
only this terminal history and information sets regularly. Therefore, computing and storing
along this history will be updated. In external- the average strategies can be omitted, lead-
sampling MCCFR, only the actions of the op- ing to a reduction of the computational cost.
ponent and chance are sampled. Empirically, By running the implementation on a clus-
MCCFR converges faster than CFR [4]. ter of 200 nodes each with 24 cores for 68.5
days, a close approximation to a Nash equi-
3.3 PCS librium is finally reached, making Heads-up
Johanson et al. described new sampling Limit Hold’em poker weakly solved [3].
techniques to sample a subset of chance nodes,
resulting in slower but more accurate itera- 3.5 Pure CFR
tions [2]. Chance nodes are partitioned into Pure CFR is a new sampling algorithm de-
three sets by observability: observable only scribed in Gibson’s PhD dissertation [5]. Pure
by the player, observable only by the oppo- CFR samples a pure strategy profile ŝt from
nent, and observable by both players. The the current profile σ t , and a pure chance dis-
first two are considered to be private and the tribution ŝtc from chance player’s outcomes σc .
last one is considered to be public. There are For each information set I,
three variants: Self-Public Chance Sampling
(SPCS), Opponent-Public Chance Sampling Prob[ŝ(I) = a] = σ(I, a) for all a ∈ A(I)
(OPCS), and Public-Chance Sampling (PCS). (14)
In SPCS, player’s and public chance nodes Then the estimated counterfactual value
are sampled; in OPCS, opponent’s and public will be
chance nodes are sampled while in PCS only X
ŝ
public chance nodes are sampled. Empirically, v̂i (I, σ) = ui (z)π−i (z[I])π ŝ (z[I], z)
z∈ZI
Public Chance Sampling converges faster than
(15)
Chance Sampling on large games [2].
When the utilities of the game are integers,
3.4 CFR+ all computations in Pure CFR can be done
Bowling et al. introduced a new algorithm with integers. The computing procedure of
called CFR+ [3]. While CFR implementations integers can be faster than float numbers and
typically sample only a part of the game tree, storing integer values reduces memory cost by
CFR+ does exhaustive iterations over the en- half. These benefits enable Pure CFR to be
tire game tree and uses a different method for applied to games of a larger scale [5].
regret matching called regret matching+ . It 4. The Game of Geister
maintains an alternative counterfactual regret
4.1 The Original Game
value Qt (I, a) for every information set and its
Geister, which is also referred to as Ghosts
actions, which is updated as
or Phantoms vs phantoms, is a board game de-
Q0 (I, a) = 0 (10) signed by Alex Randolph for two players [6].
Qt (I, a) = (11) The game was the winner of Årets Spel Best
+ Family Game in 1986 and has been recom-
Qt−1 (I, a) + Rt (I, a) − Rt−1 (I, a) mended in Spiel des Jahres in 1982 [7]. The
(12) game takes place on a 6×6 square grid game
board. Each player has four good ghosts and
where x+ = max(x, 0).
four evil ghosts assembled in two rows, and
The strategy for time t + 1 is calculated as
their types are not revealed to the opponent
Qti (I,a)
player [6]. In each turn a player can move one
a∈A(I) Qti (I,a) ,
P
of his ghosts one step vertically or horizontally.
σit+1 (I)(a) = if a∈A(I) Qti (I, a) > 0
P
Moving into a square containing an opponent’s
1 , otherwise
|A(I)| ghost will kill the opponent’s ghost. A player
(13) will win if one of the three constraints is sat-
+
Empirically, with CFR , player’s current isfied [7]:
strategies also approximate an equilibrium • All the player’s evil ghosts are killed.
• All the opponent player’s good ghosts are 5 1600 six core@3.4GHz CPU using PyPy3
killed. v6.0.0.
• One of the player’s good ghosts is moved We assume that when playing a board
off the board from one of the opponent game, the situation of the board often con-
player’s corner squares. tains enough information for an experienced
player to decide the next move. Since Geister
4.2 The Simplified Game is a board game, for simplicity, we adopt the
As the game complexity increases exponen- players’ vision of the game board as the infor-
tially with the size of the game board, it is mation set, which includes no history memory.
hard to solve the full game of Geister with As is said in Section 4.2, board games have a
normal computing resources. Due to the limi- problem of infinite loops. We restricted play-
tation of computing resources, our research on ers’ moving turns to see the expansion of the
Geister is based on a reduced game board with game tree, i.e., the number of total informa-
small changes in the rules. Now we introduce a tion sets and the time needed to run the pro-
simplified version of Geister, Geister(2×3, 2). gram for a certain number of iterations. We
Geister(2×3, 2) is a two-player board game denote one player moving one of his ghosts as
that takes place on a 2×3 board with 1 good one move, and two players making their moves
ghost and 1 evil ghost for each player. Other respectively as one turn. We set the terminal
rules are identical to those of the full game, state utilities such that 1 for a win, -1 for a
except that: loss and 0 if maximum move limit is reached.
• The game of Geister does not have any
chance nodes. In order to implement it 5.1 Chance-sampled CFR
with CFR, we added a chance node at the Chance-sampled CFR program is imple-
beginning of the game, which decides the mented according to the pseudocode and the
positions of players’ ghosts randomly. It Java source code written in the study [8]. An
means that at the beginning of the game, iteration is to run the recursive CFR func-
every player gets a random ghost arrange- tion for one time. For chance-sampled CFR,
ment rather than arranging them by him- chance node events are sampled before each
self. iteration.
• CFR algorithm calculates all possible We first set moving turn limits from 1 to
routes until the end of the game. How- 10 (one turn contains two moves, one move
ever, in board games, infinite loops are in- for each player, so total move limits are 2 to
evitable. To avoid infinite loops, we add a 20), and ran the program for 100 iterations
constraint that if each player has played a to observe the changes in execution time and
certain number of turns, the game is ended the total number of information sets. For ev-
forcibly with a draw. ery different turn limit, the program was run
• Due to the limited size of the game board, from the beginning. Execution time is the real
the corners of the board are not removed. time between the starting and the ending of
To win the game by moving good ghosts total training iterations. Time is calculated
off the board, players must move the good in Python code by time package. Results are
ghost to the corner first and then move it shown in Figure 1.
off at the next turn. We can see from Figure 1 that when turn
5. Experiments limit is greater than or equal to 7 (14 total
moves), the number of information sets does
We implemented the game of Geister and not increase anymore. It suggests that after
variants of CFR algorithms using Python lan- 7 turns, all possible nodes of the game tree
guage. To accelerate the execution of the pro- have been visited by the program. We can
gram, we ran the program with PyPy, which is also conclude that the execution time grows
a fast interpreter of the Python language with exponentially with move limit.
Just-in-Time compiler. All our programs are Due to the limitations of computing re-
executed on a single thread of AMDr Ryzen sources, turn limit over 8 will take too much
0.575
Win rate
100
Time/s
600
75 0.550
400
50 0.525
200
25 0.500
0 0 0.475
2 4 6 8 10 12 14 16 18 20
Move limit 0.450
2 4 6 8 10 12 14 16
Execution time and number of information sets - cscfr Move limit
175 Time for 100 iterations 103 with Random with move limit=10
with move limit=2 with move limit=12
150 # of infosets with move limit=4 with move limit=14
with move limit=6 with move limit=16
125 102 with move limit=8
# of infosets
100 Time/s
Fig. 2 Win rate against other players - cscfr
75 101
50 We can observe from the figure that win
25 100 rate against random player is quite high when
0 move limit is small, and decreases when move
2 4 6 8 10 12 14 16 18 20 limit increases, and then increase again. How-
Move limit
ever, self-play between agents shows an oppo-
Fig. 1 Execution time and number of infosets - CSCFR
site trend. To find the reason for this result,
we looked into the first step strategy every
time to converge. For turn limits from 1 to agent makes. The probability of choosing to
8, we run the program for 100 000 iterations move the evil ghost for each agent is shown in
respectively. Each program was run from the Figure 3.
beginning.
According to the average strategy profile
Probability to move the evil ghost at first step - cscfr
generated under different turn limits, we made 1.0
an agent for each strategy profile and per-
formed self-plays between agents and the ran- 0.9
dom player. Self-plays do not limit turns. If
Probability
# of infosets
agent finally finds its right way to go. When
100
Time/s
move limits are too large, although informa- 150
tion sets are fully visited, the long history of 75
100
the game contains too many loops. Our im- 50
plementation of CFR does not include history 50
25
memory in the information set, so these loops Time for 10000 iterations
0 0
may confuse the agents. 2 4 6 8 10 12 14 16 18 20
Move limit
Also, limitations of hardware resources also
Fig. 4 Execution time and number of infosets - purecfr
prevented us from running more iterations or
doing experiments on larger move limits. It is We generated Pure CFR agents for turn lim-
possible that the strategy profile we get from its from 1 to 8. For every move limit, the Pure
100 000 iterations is not fully converged, which CFR algorithm was executed for 1 000 000 it-
may also affect the strategies generated by the erations. We also performed self-play between
agents. agents and the random player. Other rules are
the same as these in Section 5.1. The result is
5.2 Pure CFR
shown in Figure 5.
Pure CFR program is implemented accord-
ing to the pseudocode written in the study [5].
Win rate against random player - purecfr
An iteration is to run the recursive CFR func- 0.65
tion for both players once respectively. The
chance node events are also sampled before 0.60
each iteration.
0.55
Win rate
40000 be.
30000
Acknowledgement
20000
10000 A part of this work was supported by JSPS
0 KAKENHI Grant Number 16H02927 and by
2 4 6 8 10 12 14 16 JST, PRESTO.
Move limit
Fig. 7 Total execution time References
[1] Zinkevich, M., Johanson, M., Bowling, M. and Piccione,
C.: Regret Minimization in Games with Incomplete In-
After all, as agents battling with the ran- formation, Advances in Neural Information Processing
Systems 20, pp. 1729–1736 (2008).
dom player has a win rate of far greater than [2] Johanson, M., Bard, N., Lanctot, M., Gibson, R. and
50% and their first step strategies finally con- Bowling, M.: Efficient Nash Equilibrium Approximation
through Monte Carlo Counterfactual Regret Minimiza-
verge to ones that are corresponding to our tion, Proceedings of the Eleventh International Confer-
analysis, we propose that CFR algorithm is ence on Autonomous Agents and Multi-Agent Systems
(AAMAS) (2012).
able to generate an appropriate strategy for [3] Bowling, M., Burch, N., Johanson, M. and Tam-
melin, O.: Heads-up limit hold’em poker is solved, Sci-
Geister(2×3, 2). ence, Vol. 347, No. 6218, pp. 145–149 (online), DOI:
10.1126/science.1259433 (2015).
6. Conclusion [4] Lanctot, M., Waugh, K., Zinkevich, M. and Bowling,
M.: Monte Carlo Sampling for Regret Minimization in
In this paper, we applied chance-sampled Extensive Games, Advances in Neural Information Pro-