Rasmussen 2004
Rasmussen 2004
August 2004
Abstract
In order to make chess programs faster and thus stronger, the two approaches
of parallelizing the search and of using clever data structures have been suc-
cessful in the past. In this project it is examined how the use of a specific
data structure called a bitboard affects the performance of parallel search.
For this, a realistic bitboard chess program with important modern enhance-
ments is implemented, and several experiments are done to evaluate the
performance of the implementation. A maximum speedup of 9.2 on 22 pro-
cessors is achieved in one experiment and a maximum speedup of 9.4 on 12
processors is achieved in another experiment. These results indicate that
bitboards are a realistic choice of data structure in a parallel chess program,
although the considerable difference in the two results suggests that more
research could be done to clarify what factors affect such an implementation,
and how.
I
II
Contents
1 Introduction 1
III
4.2.1 Zobrist Hashing . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Repetition Detection . . . . . . . . . . . . . . . . . . . 29
4.3 Improving Move Ordering . . . . . . . . . . . . . . . . . . . . 29
4.4 Iterative Deepening . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Principal Variation Search . . . . . . . . . . . . . . . . . . . . 32
4.6 Quiescence Search . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.7 Search Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.8 Null Move Pruning . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Data Structures 37
5.1 Board Representation . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Bitboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.1 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 Rank Attacks . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.3 Rotated Bitboards . . . . . . . . . . . . . . . . . . . . 45
5.2.4 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.5 Operations . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Parallel Searching 51
6.1 Parallel Alpha-Beta . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 Alpha-Beta Revisited . . . . . . . . . . . . . . . . . . . . . . . 53
6.3 Young Brothers Wait Concept . . . . . . . . . . . . . . . . . . 54
6.4 Splitting With Bitboards . . . . . . . . . . . . . . . . . . . . . 55
6.5 Global Data Structures . . . . . . . . . . . . . . . . . . . . . . 55
6.5.1 The Hash Table . . . . . . . . . . . . . . . . . . . . . . 56
6.5.2 Killers And History . . . . . . . . . . . . . . . . . . . . 57
6.6 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.6.1 Inter-Thread Relationship . . . . . . . . . . . . . . . . 58
6.6.2 Parallel Operation . . . . . . . . . . . . . . . . . . . . 58
7 Experiments 61
7.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2.1 Search Speed . . . . . . . . . . . . . . . . . . . . . . . 63
IV
7.2.2 Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2.3 Tree Size . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.2.4 Time Usage . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2.5 Hash Hits . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2.6 Move Ordering . . . . . . . . . . . . . . . . . . . . . . 73
7.2.7 Quiescence Nodes . . . . . . . . . . . . . . . . . . . . . 73
7.3 Further Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 74
8 Conclusions 77
8.1 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A Data 81
A.1 Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B Source Code 85
B.1 bitboard.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
B.2 evaluate.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
B.3 hash.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
B.4 main.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
B.5 moves.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.6 parallel.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.7 position.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
B.8 random.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
B.9 search.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
B.10 base.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
B.11 bitboard.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
B.12 enums.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
B.13 evaluate.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
B.14 hash.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
B.15 move.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.16 moves.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
B.17 parallel.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
V
B.18 position.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
B.19 random.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
B.20 search.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Bibliography 187
VI
Chapter 1
Introduction
Since the early 1950s when Claude E. Shannon and Alan M. Turing wrote the
first chess programs based on the idea of game tree searching, much research
has been done to make such programs better and stronger. Early on, it was
understood that search algorithmic enhancements were not enough to make
a strong program. Raw speed was a major strength factor as well. Apart
from using specialized hardware to gain speed, the two main solutions to this
problem have been parallelism and specialized data structures.
An idea for a particular data structure to represent information about the
chess game state comes from the two observations that many computers work
fast with 64-bit integers and that the chess board has exactly 64 squares.
This data structure, called a bitboard, is a 64-bit integer holding boolean
information about the chess board. Much more than one bitboard is needed
to represent the full game state, however. In fact, this way of representing the
game state requires more space than most other representations, but speed is
more important than space in this case. Even though the idea of bitboards is
not new, it is relevant now more than ever because 64-bit computers (which
work fast with 64-bit integers) are becoming more and more common.
In this project, we try to examine how well bitboards are suited for par-
allel chess programs. While the space requirements of the representation of
the game state are usually not important for sequential programs, they are
important for parallel chess programs, because we have to transfer the entire
1
game state between different processors frequently. It is unclear how the use
bitboards will affect the performance of a parallel chess program.
We implement a parallel bitboard chess program using most known mod-
ern enhancements and techniques to get a realistic test case. We perform
several experiments with the program to get some useful statistics giving us
insights into parallel chess searching in general and into the performance of
the implemented bitboard program in particular.
Our implementation achieves acceptable speedup in the experiments per-
formed, although systematic fluctuations seem to affect the performance. We
also get differing results in the various experiments we perform, indicating
that memory bandwidth is an issue. The experiments also show us that
while the parallelization makes the search speed larger as more processors
are used, it also inherently makes the amount of information to search con-
siderably larger, simultaneously.
After this introduction, Chapter 2 presents the rules and terminology of
the game of chess, which is necessary to discuss more complicated issues later.
In Chapter 3 we describe the fundamental elements of game tree searching,
from the plain minimax algorithm to a modern formulation of the alpha-beta
search algorithm, and Chapter 4 continues by explaining the most important
algorithmic improvements to the alpha-beta algorithm, as used in practice by
modern chess programs. Chapter 5 discusses the problem of data structures
in chess programs and presents the solution we are using in our implemen-
tation: bitboards. In Chapter 6 we discuss the problems of parallelizing
the alpha-beta algorithm, and describe the solutions we have implemented.
Chapter 7 describes the experiments we have performed and discusses the
results of these experiments. Chapter 8 contains our conclusions and our
suggestions for further work. Appendix A contains the input positions and
output data of our experiments and Appendix B contains the source code of
our implementation.
Throughout this paper, it is assumed that the reader has good knowledge
of data structures, algorithms, algorithmic complexity and parallelism.
I would like to thank Povl Ole Haarlev Olsen for helping with scripts for
automating the process of gathering and processing data from experiments,
2
and also for proofreading and making useful comments. I would also like to
thank Ditte Marie Johansen and Silas Frederik Johansen for proofreading and
for useful comments. Finally, I would like to thank my supervisor Professor
Jens Clausen for helping me keep focus.
3
4
Chapter 2
rmblkans
opopopop
8
0Z0Z0Z0Z
7
Z0Z0Z0Z0
6
0Z0Z0Z0Z
5
Z0Z0Z0Z0
4
POPOPOPO
3
SNAQJBMR
2
1
a b c d e f g h
Chess is a board game played by two players called “white” and “black”.
The players alternately make a move, but white always makes the first move.
It is a game of full information: there are no hidden or random elements; the
players know the full state of the game at all times. In the following, we will
briefly describe the rules of the game [27].
5
2.1 The Board
The game is played on a square board that is subdivided into 64 small squares.
The 64 squares are arranged in an 8 by 8 grid. The vertical columns of the
board are known as “files”. These are denoted by the letters a–h, from left
to right . The horizontal rows are known as “ranks”. These are denoted by
the numbers 1–8, from bottom to top. A square is denoted by the letter of
the file it is on, followed by the number of the rank it is on. For example,
the square in the lower left corner of Figure 2.1 is called “a1”. The diagonal
lines are known as “diagonals”. To make it easier to see and think about
the diagonals, the squares of the board have alternating colors called “light”
and “dark”, in such a way that any diagonal consists of squares of only a
single color, light or dark. As a fixed reference, the color of the square a1,
and hence of the diagonal it is on, is always dark. The rest of the squares
are colored accordingly.
6
8
0Z0Z0Z0Z
Z0Z0Z0Z0
8
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z 0Z0Z0Z0Z
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
0Z0A0Z0Z 0Z0S0Z0Z
5 5
Z0Z0Z0Z0 Z0Z0Z0Z0
4 4
0Z0Z0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0Z0Z0 Z0Z0Z0Z0
2 2
1 1
a b c d e f g h a b c d e f g h
7
8
0Z0Z0Z0Z
Z0Z0Z0Z0
8
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z 0Z0Z0Z0Z
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
0Z0L0Z0Z 0Z0M0Z0Z
5 5
Z0Z0Z0Z0 Z0Z0Z0Z0
4 4
0Z0Z0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0Z0Z0 Z0Z0Z0Z0
2 2
1 1
a b c d e f g h a b c d e f g h
8
8
0Z0Z0Z0Z
Z0Z0Z0Z0
8
0Zks0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z 0Z0Z0Z0Z
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
0Z0J0Z0Z 0Z0Z0Z0Z
5 5
Z0Z0Z0Z0 Z0Z0Z0Z0
4 4
0Z0Z0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0Z0Z0 Z0Z0ZRJ0
2 2
1 1
a b c d e f g h a b c d e f g h
Figure 2.6: The King. Figure 2.7: White has castled king-
side. Black has castled queenside.
9
8
0Z0ZqZ0Z
Z0ZPZ0Z0
rZnZ0Z0Z
7
ZPZ0ZpO0
6
0Z0Z0Z0Z
5
Z0Z0Z0Z0
4
PZ0Z0Z0Z
3
Z0Z0Z0Z0
2
1
a b c d e f g h
• The last move of the opponent was to move a Pawn two squares forward
10
from its initial square
If this happens, the player on the move can choose to capture the enemy
Pawn as if it had just moved one square forward instead of two. The option
to capture en passant is only available at the move immediately following the
two-squared Pawn move. In Figure 2.8, if black’s last move was to move his
f7 Pawn to f5, white now has the option of moving his Pawn to f6 capturing
the f5 Pawn, just as if the f5 Pawn was on f6.
Checkmate
When a player’s King is in check, and the player has no legal moves
(capturing the checking piece, blocking the checking piece or evading
the checking piece), the player is said to be checkmated. The player
that checkmates his opponent wins the game.
Stalemate
When a player’s King is not in check, but the player has no legal moves,
the situation is called a stalemate. The game is a draw.
Draw by Repetition
If a position occurs three times in a game (with the same player on the
move), the game is a draw.
Figure 2.9 shows a more involved example. White has checkmated black.
Blacks King is in check because it is attacked by white’s Bishop on d8. Black
11
8
rmbA0a0s
opj0Zpop
0ZpZ0Z0Z
7
Z0Z0l0Z0
6
0Z0ZnZ0Z
5
Z0Z0Z0Z0
4
POPZ0OPO
3
Z0JRZBMR
2
1
a b c d e f g h
has no pieces that can capture the d8 Bishop. All possible squares of escape
for black’s King are attacked by the d8 Bishop and the d1 Rook.
12
Chapter 3
In this chapter we will explain the concept of a game tree and describe the
basic algorithms used by computers to search such game trees to find the
optimal line of play.
13
8
rmblkans
opopopop
0Z0Z0Z0Z
7
Z0Z0Z0Z0
6
0Z0Z0Z0Z
5
Z0Z0Z0Z0
4
POPOPOPO
3
SNAQJBMR
2
1
a b c d e f g h
8
7
rmblkans
Zpopopop
8
7
rmblkans
Zpopopop
8
7
rmblkans
Zpopopop
8
7
rmblka0s
opopopop
6
5
pZ0Z0Z0Z
Z0Z0Z0Z0
6
5
0Z0Z0Z0Z
o0Z0Z0Z0
6
5
pZ0Z0Z0Z
Z0Z0Z0Z0
6
5
0Z0Z0Z0m
Z0Z0Z0Z0
4 0Z0Z0Z0Z 4 0Z0Z0Z0Z 4 PZ0Z0Z0Z 4 0Z0Z0Z0Z
3 O0Z0Z0Z0 3 O0Z0Z0Z0 3 Z0Z0Z0Z0 3 Z0Z0Z0ZN
2
1
0OPOPOPO
SNAQJBMR
2
1
0OPOPOPO
SNAQJBMR
2
1
0OPOPOPO
SNAQJBMR
2
1
POPOPOPO
SNAQJBZR
a b d e f g h a b d e f g h a b d e f g h · · · 400 positions in total · · · a b d e f g h
14
For simple games like “tic-tac-toe”, the game tree is relatively small be-
cause of the relatively small branching factor (9 at the most) and because all
games are short: the maximum depth is 9. Solving a game like “tic-tac-toe”
(that is, finding the optimal line of play) is easy because the entire game tree
can be considered.
For chess, the case is not so easy. The average branching factor of chess
has been estimated to be about 35 [9]. It can vary wildly, though, depending
on the position. In positions where a player is in check, there are usually
only few legal moves. In a reasonably dynamic middlegame position, there
may be 40–80 legal moves. In artificial positions with lots of Queens on the
board, there may be more than 200 legal moves.
Also, a game of chess can be much longer than a game of “tic-tac-toe”.
A typical game of chess is maybe 30–80 moves (60–160 half moves) long.
The longest real game ever recorded is 269 moves long. The longest possible
game of chess is about 6000 moves long [26]. That means that some of the
branches of the chess game tree are about 12000 nodes deep. The reason
that a game of chess cannot continue forever is because of the rules of draw
by repetition and draw by the 50 moves rule.
The bottom-line is this: the chess game tree is huge, and it is utterly infea-
sible to solve chess by considering all of it, no matter how many present-day
computers we use. Even when considering the solution to interior positions
deep within the chess game tree, we can only consider a small local subtree.
Therefore, instead of considering the entire game tree or the complete
subtree of a position, we use heuristic search algorithms to search part of the
subtree to obtain an approximate solution.
15
3.2.1 Minimax
The fundamental algorithm used for searching a subtree of a node to obtain
an approximate solution, is called the minimax algorithm. The minimax
algorithm works by minimizing the maximum damage that the opponent
can do [24, 21, 23].
Imagine a position in which white is on the move. White will list all his
legal moves or more precisely all the child positions immediately resulting
from the current position by performing these legal moves, one by one. All
of these child positions are assigned a value of either loss for white (−1), draw
(0) or win for white (+1). White chooses the position with the maximum
value. Of course, the “magic” here is how the value of each child position is
determined.
In a child position, C, in which black is on the move, the value is deter-
mined as follows: If the position is a terminal game position (mate or draw),
the value is immediately known. If it is a non-terminal position, black lists
the child positions following from his legal moves, in the same way that white
did before. These child positions are assigned one of the above values. Black
then chooses the child position that has the minimum value. The value
determined for the position C is this minimum value.
The values of these new child positions, in which white is again on the
move, are of course determined in the same way as white did before. Thus,
the minimax algorithm is recursive.
As described above, it would seem that the minimax algorithm deals
with one level of the tree at a time, assigning values to all child positions of
a position, before looking at any other positions. But because the value of
a position cannot be determined until we have determined the value of its
children (and so on recursively), minimax traverses the tree in a depth-first
fashion. The first positions that are actually given a value, are the terminal
positions at the leafs of the tree. These values are then propagated back
trough the tree, minimizing or maximizing at each level.
Figure 3.2 shows an example of how the value of the root position is deter-
mined by the minimax algorithm. In this example, we have used more than
16
3
3 0 2
3 9 0 7 2 6
2 3 5 9 0 7 4 2 1 5 6
s c o r e minimax ( p o s i t i o n )
{
i f ( isTerminal ( position ))
return ( va lueO f ( p o s i t i o n ) ) ;
i f ( tur n == white )
best = − i n f i n i t y ;
else
best = i n f i n i t y ;
v a l u e = minimax ( c h i l d ) ;
17
just −1, 0 and 1 for the values. This is because it makes the example clearer
and also because it is closer to the practical case where we use a so-called
evaluation function, as described later. The boxed nodes are maximizing
nodes (white’s turn), and the circled nodes are minimizing nodes (black’s
turn). The arrows show how the algorithm traverses the tree. The enlarged
nodes are the nodes on the so-called principal variation. They represent the
optimal line of play for both players.
Figure 3.3 shows the pseudo-code for the minimax algorithm.
3.2.2 Negamax
To simplify the minimax algorithm, we can modify it so that it always max-
imizes regardless of which player is on the move. We just have to make sure
that the values we are working with are always relative to the player on the
move. For this, two changes are needed: The valueOf() function no longer
returns absolute values, but relative ones. The value −1 now means “loss
for the side on the move”, 0 means draw, +1 means “win for the side on the
move”. Also, when a value is returned from a child node, it is negated. This
formulation of the minimax algorithm is called negamax. Figure 3.4 shows
the negamax algorithm. Note that these two algorithms are identical in that
they visit exactly the same nodes and return the same answer. It is just a
practical simplification that will make later modifications less difficult.
3.2.3 Evaluation
As explained earlier, the complete subtree of a chess position may be (and
most often is) too large to search exhaustively given typical time and resource
constraints. Therefore, instead of searching every branch of the tree until a
terminal position is met, we only search a branch until a certain depth is
reached. When a position at this depth is reached, we do not really know
which player will win, unless it happens to be a terminal position. So we
use a so-called heuristic evaluation function to obtain an approximate value.
The greater the value, the more likely the player on the move is to win. Of
course, writing such an evaluation function is not easy. It is domain-specific
18
s c o r e negamax( p o s i t i o n )
{
i f ( isTerminal ( position ))
return ( va lueO f ( p o s i t i o n ) ) ;
best = − i n f i n i t y ;
v a l u e = −negamax( c h i l d ) ;
s c o r e negamax( p o s i t i o n , depth )
{
i f ( isTerminal ( position ))
return va lueO f ( p o s i t i o n ) ;
i f ( depth == 0)
return e v a l u a t e ( p o s i t i o n ) ;
best = − i n f i n i t y ;
v a l u e = −negamax( c h i l d , depth − 1 ) ;
19
and usually cannot be used for evaluating positions of any other games. For
domains complex enough to be interesting, it is usually impossible to write
a perfect evaluation function that will always give a consistent and correct
answer. After all, if we had such a function, we would not need to search at
all. We could just evaluate the children of the current position and choose
the best one.
A lot of the heuristic knowledge in a chess program lies in the evaluation
function. It typically takes into account such things as material balance,
mobility, control of important squares, development of pieces, pawn structure
and king safety. The most important of these is material balance.
Figure 3.5 shows the negamax algorithm with depth limitation and eval-
uation.
20
3
3 ≤0 ≤2
3 ≥5 0 7 2 6
2 3 5 9 0 7 4 2 1 5 6
bound designates how good a position the opponent will at most allow. The
lower and upper bounds are called alpha and beta respectively.
Figure 3.6 shows an example of how the alpha-beta algorithm works. It
is the same tree as in the minimax example given earlier in Figure 3.2. As
shown, we do not know the precise value of all nodes, just a bound on the
value. But it is enough to make the correct decision. The nodes that are
pruned are shown in dashed style.
The negamax formulation of the alpha-beta algorithm is shown in Fig-
ure 3.7.
As can be seen from Figure 3.7, the benefit of the alpha-beta algorithm is
the early return that happens when the value becomes greater than or equal
to beta. The rest of the children are pruned. It is obvious then, that if a
refuting move exists at a node, we would like to find it as early as possible.
If it is the first child searched, we can prune all the other children. If it
is the last child searched, we have examined no fewer nodes than the plain
minimax algorithm would. On the other hand, in the best case scenario where
a refuting move is always searched first if one exists, pruning saves a lot of
time. Searching an artificial tree of fixed branching factor W and depth D
with the minimax algorithm takes O(W D ) time. Searching the same tree with
D
the alpha-beta algorithm only takes O(W 2 ) time in the best case, meaning
21
s c o r e a l p h a b e t a ( p o s i t i o n , alpha , beta , depth )
{
i f ( isTerminal ( position ))
return va lueO f ( p o s i t i o n ) ;
i f ( depth == 0)
return e v a l u a t e ( p o s i t i o n ) ;
i f ( v a l u e > a lpha )
{
i f ( v a l u e >= beta )
return beta ;
a lpha = v a l u e ;
}
}
return a lpha ;
}
22
an exponential speedup [13]. In other words, the alpha-beta algorithm takes
only in the order of the square root of the time that the minimax algorithm
takes. Or put another way: given the same time, the alpha-beta algorithm
searches twice as deep as the minimax algorithm, a vast improvement. Thus,
the order in which moves are searched is very important, contrary to plain
minimax. Heuristics are used to order the moves at each node so that good
moves are generally tried before bad ones. But we do not have perfect move
ordering heuristics. If we did, we could just order the moves and pick the
best one, instead of searching.
23
24
Chapter 4
Search Algorithm
Improvements
25
smaller window than (−∞, +∞) at the root position. Basically, we “lie” to
the children nodes below and pretend that both players have already found
a reasonably good move, and that we are only interested in moves that are
better. If this bluff works and the true minimax value of the root position is
within the window given, we will get this value back from the search. We will
have pruned more nodes than we would have with a full windowed search,
though. That is, we have found the true minimax value of the position, but
we have searched a smaller tree.
If the true minimax value of the position is outside the window given, the
search will return alpha or beta depending on whether the true value is below
alpha or above beta. In this case, we re-search the position but with a new
window. If the value returned was alpha, we know that the true minimax
value is at most alpha, so the new window becomes (−∞, alpha). If the value
returned was beta, we know that the true value was at least beta, so the new
window becomes (beta, +∞).
If we choose the aspiration window to be too small, we will have too many
re-searches, searching more nodes than necessary. Choosing a reasonable
aspiration window around the expected value of the search, will give a net
gain by searching fewer nodes overall.
26
Whenever the recursive search encounters a node, instead of just doing a
recursive search of all the legal moves, it probes the hash tables first to see if
we have any known information about this node. If so, we check whether the
stored information comes from a search deep enough to replace the current
search. If the search finds useful information from a deep enough search
in the hash table, it can sometimes return an answer immediately without
performing an expensive recursive search. This happens if the hash table has
an exact value, or if the bound stored is good enough. That is, if an upper
bound is stored and it is alpha or lower, or if a lower bound is stored and it
is beta or higher.
By “deep enough” we mean this: If the stored information comes from
a 2-ply search, it is not good enough to replace a 7-ply search. On the
other hand, we can get even better information than we asked for. If the
search is asked to perform a 7-ply search, and it finds information in the
hash table from an 11-ply search, the information from the table is better
than the information that we would have gotten from just performing the
7-ply search.
The utility of this table is two-fold: We can sometimes avoid expensive
searches of positions that have been searched before. And in the cases where
the search cannot be avoided, we have a good guess about the best move of
the position, which will improve move ordering, as described later. Positions
can occur more than once in the search for several reasons. First of all
transpositions of moves can lead to the same position. If position P leads
to position Q by the move sequence a → b → c, most often P will also lead
to Q by the move sequence c → b → a. Secondly, performing re-searches of
positions is an important part of several algorithmic enhancements described
later. Doing a re-search of a position will be cheap if information about most
of this subtree is already in the hash table.
27
game playing programs. In chess, this can be applied as follows: For each of
the 64 squares, we generate a 64-bit random number for each kind of piece
that can occupy the square. There are six different pieces in two different
colors, so we generate 2 · 6 · 64 = 768 random numbers. Of course, there are
more things defining the position that we also have to put into the hashing
scheme: castling rights and en passant square. We generate random 64-bit
numbers for these too, one for each state that these can be in. The hash key
of a position is then defined as the XORing of all the 64-bit random numbers
corresponding to the state of each square, and of the numbers corresponding
the the state of the castling rights and the en passant square. The resulting
64-bit number is the hash key for the chess position. Of course, all the 64-
bit random numbers are pre-computed, and the hash key of the position is
being incrementally updated when moves are performed. This is simple: We
just XOR out the 64-bit number for the piece at the source square, XOR
in the 64-bit number for this piece at the destination square, and XOR out
any captured piece on the destination square. Special moves like castling, en
passant and promotion are dealt with trivially. We also have to account for
the side to move. We do this by having two separate hash tables, one for
positions where white is on the move, and one for positions where black is
on the move.
The hash tables are implemented as simple arrays of entries holding the
stored information and the hash key of the position. The hash key is stored
so that we can later ensure that we only use this information with the same
position; other positions will hash to the same slot. The hashing function
becomes very simple: We just take the hash key modulo the hash table size
to get the hash slot.
The quality of the random numbers used is important to keep the prob-
ability of collisions low. To ensure this, we do not use the pseudo random
number generator (PRNG) provided by the programming language. Instead,
we use the so-called “Mersenne Twister” [17], a renowned PRNG with excel-
lent properties.
28
4.2.2 Repetition Detection
To play sensible chess, we have to be able to detect the case of draw by
repetition. Hash keys are also useful for solving the problem of repetition
detection. We do this by keeping the game history as a list of hash keys.
Whenever we enter a node, we look back in the list to see if the current
hash key is in the history, indicating a repetition. This may sound tedious,
but by definition, we only have to look back until the latest irreversible move
(Pawn move or capture) for repetitions. The list of hash keys that have to be
considered, is usually very small as irreversible moves are frequent in chess.
29
a way similar to minimax, but much simpler and without recursion. This is
not perfect, but it is considerably better than the simple approach [6].
The reason for caring so much about the accuracy of the estimation, is
that it improves move ordering immensely. This is because a very large part
of the moves considered in a typical alpha-beta search, are in fact very bad
moves that immediately lose material without gaining any other advantage.
Of course, this does not mean that we can just dismiss these moves before-
hand, but when recursively considering the opponent’s answers to such a bad
move, we want to consider his obvious refutation move as early as possible,
because it often produces a cutoff immediately. If the move from the hash
table does not exist or yield a cutoff, an easy winning capture is the next
best guess. And of course, the more accurate we can estimate the net gain
of a capture, the better move ordering is. After the winning captures, the
equal captures (trades) are tried.
After all the winning and equal captures have been tried, the so-called
killer heuristic [2] is applied to the non-capture moves. If a player has an
exceptionally good non-capture move in one position, then often it will be
the case that the same move will be good in a lot of its sibling positions.
That is, even if the opponent had chosen another move at the previous ply,
our good move (our killer move) would still have been good nonetheless.
This is the idea behind the killer heuristic. Every time we try a move that
is so good that it causes a cutoff, we remember, that for this ply, this is a
good move. Then when searching sibling positions (that is, at the same ply),
we will search the killer move as the first move after the hash table move
and the winning and equal captures. In typical implementations, we keep a
prioritized list of a couple of killer moves, instead of just one killer move.
The so-called history heuristic [19] is applied next, also to the non-capture
moves. This is a general and less efficient way of remembering good moves.
Whenever a cutoff is produced, a table entry indexed by the source and
destination squares of the move, is incremented by some amount. When
all other heuristics have been applied to the move ordering of a node, the
remaining non-capture moves are sorted by the values in this history table.
In this way, moves that have often been good in other parts of the tree, will
30
be searched before moves that we know less about.
The remaining moves are the losing captures. These are tried in order of
expected loss, least loss first.
To measure how good our move ordering heuristics are, we can maintain
two counters during search: the number of nodes where a cutoff was pro-
duced, and the number of nodes where the cutoff was produced by the first
move tried. The ratio between these two can of course at most be 100%,
indicating perfect move ordering. With all of the above heuristics applied,
this ratio is typically about 90% or higher. This means that whenever there
is a move that produces a cutoff, we guess correctly in about 90% of the
cases, and search this move first. In many tactical positions where a single
move is considerably better than the rest (for instance a winning capture or
move forcing mate in number of moves), the ratio is typically around 98% or
higher.
31
remembers many of the important results of the previous shallow searches,
which means that searches will be cheaper than if the hash table was unused.
Also, a great deal of move ordering information is found; the hash table, killer
move tables and history tables are filled with useful information, making
move ordering better on subsequent searches. Finally, the resulting value
from one call of the search function can be used as a good candidate for the
expected value of the next search, which helps the efficiency of aspiration
windows, as described earlier.
32
4.6 Quiescence Search
Some of the most difficult things for the evaluation function to evaluate fairly
accurately, are the wild and complex exchanges of material that often happen
in chess. In general, it is easier to write an evaluation function that focuses on
the more static and long-term features of the position, than the more dynamic
and short-term features. When we do a fixed depth alpha-beta search, there
is no guarantee that the leafs of the trees, which are the ones that are getting
evaluated, are not wild and complex positions. Therefore, instead of just
evaluating when remaining depth is zero, we conduct a selective search, called
quiescence search, [4] that takes only “interesting” and non-quiet moves into
account. These are the moves that will potentially upset the evaluation
greatly. In chess these are mainly the captures and the promotions, moves
affecting material. Whenever the quiescence search reaches a position where
there are only quiet moves, it relies on the evaluation function.
If quiescence search is to mindlessly catch every possible interesting move,
the tree it searches will explode in size. A trivial implementation of quies-
cence search as described here, will result in a tree where much more than
half of the nodes searched in total, will be searched by the quiescence search.
This means that less time is available for the more accurate normal search.
Therefore, in good implementations of quiescence search, seemingly futile
moves are not searched. A move is futile if it seems to be a losing capture.
A move is also futile if the estimated gain from the capture is not enough to
raise the material evaluation up near alpha. Example: If alpha is zero, mean-
ing that no player has an advantage, we know that we can already achieve
such a fairly balanced situation. If we are then searching a line in quiescence
search where the opponent captures our Queen, then replying by capturing a
Rook or a Pawn normally is not enough to bring the value back near alpha.
A non-futile reply in this case would be also capturing the opponents Queen.
Removing futile moves from quiescence search will cause it to only take up
about 10–30% of the entire search tree, typically.
33
4.7 Search Extensions
Another problem with fixed depth searches is the so-called horizon effect. The
horizon effect is what happens when the search, because of limited depth,
tries to avoid something unavoidable or in general tries to achieve an un-
achievable goal. Example: White searches to a fixed depth of 5 plies, and
chooses to sacrifice a Pawn and a Knight to avoid losing a Rook. If he had
searched to a depth of 7 plies, he would have seen that the loss of the Rook
was unavoidable, and would have kept his Pawn and Knight.
Quiescence search as described above, is a way of dealing with a specific
class of problems related to the horizon effect. A more general way, that is not
limited to the leaves of the normal search tree, is to extend the search depth
a little bit in subtrees where something “interesting” or forced happens. This
shapes the tree so that more time is used on “interesting” and forced lines
of play, and less on uninteresting or nonsensical lines.
The most common extensions, and the ones we use, are:
Check extension
Whenever a checking move is played, search depth is extended in this
particular subtree [22].
Recapture extension
If the opponent just captured one of our pieces, and we then capture
his capturing piece, search depth is extended. Repeated exchanges on
the same square are common in chess and often forced: each move in
the exchange is the only reasonable move [5].
Pawn extension
Whenever a Pawn is nearing promotion, search depth is extended. Typ-
ically, extension is done when a Pawn is moved to the 7th (or 2nd) rank
[14].
34
One problem to be aware of is that the search tree can explode in size if
extending is done too often or too much. One way to avoid this is to limit
the amount of extension done at each node or in each line of play. To control
extensions in more detail, so-called fractional extensions [14] can be used.
Instead of always extending an integral number of plies, we can use fractions
of one ply instead. Example: Both the check extension and the one reply
extension could be made to extend the search 0.75 plies. Then the search is
only actually extended if more than one of these extensions happen in the
line searched, because only an integral number of plies can be searched, of
course. In this way, even if either one of the extensions are of less than one
ply, the recursive accumulation of extensions will cause extensions in lines of
play that are particularly “interesting”.
35
Null move pruning is speculative and inaccurate for at least two reasons:
Firstly, we are making a reduced search depth to establish the value of the
position after a null move. A normal reduction used is two plies. Such
a reduced search is not as accurate as non-reduced search. Secondly, the
assumption that a pass is always bad is not always true. In certain very
special positions called zugzwang (German for “compulsion to move”) posi-
tions, which typically occurs in the endgame, doing nothing would actually
be the best move. In such a case, the null move search might tell us that
the position is very good, when in fact it is a zugzwang position where every
legal move at our disposal makes our game worse. To counter this problem,
null move pruning is typically disabled in endgame situations.
The great advantage of null move pruning comes from the fact that in
many positions, the full depth searches of all the legal moves is replaced by
one much cheaper reduced depth minimal window search. In practice, null
move pruning helps immensely and allows searches on the order of two plies
deeper within the same time.
36
Chapter 5
Data Structures
Throughout computer chess history, alpha-beta search has been the dom-
inant algorithmic framework, although a lot of improvements to the plain
algorithm have been found as described earlier. But even within this al-
gorithmic framework, the choice of the fundamental data structures have
varied wildly. The choice of data structures will affect what information will
be hard to retrieve, and what will be easier. Also, it will affect how fast the
implementation of the fundamental operations of the alpha-beta algorithm
can perform. These operations are:
• Evaluation of a position
37
1. Is there a piece of type x (meaning: of a certain type and color) on
square s?
38
8
0Z0Z0Z0Z
Z0Z0Z0Z0
8
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z 0Z0Z0Z0Z
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
0Z0Z0Z0Z 0Z0Z0Z0Z
5 5
Z0Z0Z0Z0 Z0Z0Z0Z0
4 4
0Z0Z0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0Z0Z0 Z0Z0Z0Z0
2 2
1 1
a b c d e f g h a b c d e f g h
Figure 5.1: The whitePawns bit- Figure 5.2: The a-file bitboard.
board.
a little harder, and can be answered in linear time. Questions no. 6, 7 and
8 are not as easily answered.
Of the fundamental operations, making and unmaking moves is fairly easy
with this implementation. Evaluation and generating legal moves is not so
easy. Many evaluation terms will be tedious to implement and will be slow to
execute. Generation of legal moves will be complicated since we essentially
have to be able to answer the above question 7 about a square being attacked,
to determine whether a move leaves the King in check, making it illegal.
Also, several problems related to systematically detecting the boundaries of
the board will occur, specifically with the Knight and the sliding pieces.
5.2 Bitboards
Many modern computers work natively or at least very fast with 64-bit in-
tegers. Bitboards [1, 22, 12, 10] are a way to represent information about
the chess board state, that utilizes this fact. A bitboard is simply a 64-bit
(unsigned) integer that holds boolean information about the board state. Ex-
ample: We might have a bitboard called whitePawns that holds information
about the white Pawns on the board. The 1st bit corresponds to the square
a1, the 2nd bit to the square b1 and so forth. Whenever a white Pawn is
39
8
0Z0Z0Z0Z
Z0Z0Z0Z0
8
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z 0Z0Z0Z0Z
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
0Z0Z0Z0Z 0Z0Z0Z0Z
5 5
Z0Z0Z0Z0 Z0Z0Z0Z0
4 4
0Z0Z0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0Z0Z0 Z0Z0Z0Z0
2 2
1 1
a b c d e f g h a b c d e f g h
Figure 5.3: The Pawn on the a-file Figure 5.4: The attacked squares.
has been removed.
40
cycles. The calculation deals with all the Pawns at once instead of one Pawn
at a time, and can be regarded as a form of simple parallelism.
If we wanted to answer the same question but with the white Pawns
capturing to the right, we would have masked the Pawn(s) on the h-file out
instead, and we would have made a logical shift 9 places to the left instead.
Besides the whitePawns bitboard, similar bitboards for all the other types
of pieces, both black and white, are of course also useful. Also useful are two
bitboards holding the positions of all the white pieces and black pieces, called
whitePieces and blackPieces respectively.
If we maintain these bitboards in addition to a traditional array of 64
squares, we can now easily answer question 3. The bitboard of pieces of type
x holds exactly this information. But of course, to be really useful, we have
to be able to retrieve the answer in the form of a set of squares, not just
a bitboard. For this, a function called firstBit() is used. It returns the
position of the first 1-bit in the bitboard. To get the position of all the 1-bits
in the bitboard, firstBit() is used repeatedly to find and clear the first
1-bit in the bitboard, until it is empty. If firstBit() was a very complex
function, bitboards would lose much of their potency. Fortunately, very fast
implementations of this function exist. The simplest is based on table lookups
and is typically very fast. On many architectures, firstBit() is even a
native machine code instruction. It can be regarded as a fast constant-time
operation. Answering question 3 with only the simple array representation
takes linear time proportional to the size of the board, because all squares
must be scanned one at a time. With bitboards and the firstBit() function,
it takes linear time proportional to the number of pieces of type x, because
we get the position of each of them in constant time.
Question 4 is very easily answered: The bitboard holding information
about the pieces of type x can be used as a boolean variable. If it is zero,
it means that there are no pieces of type x. If it is non-zero, it means that
there are some.
Answering question 5 with the simple array representation takes linear
time proportional to the size of the board. With bitboards, a function called
popCount() is used to answer this specific question. Again, fast implementa-
41
8
0Z0Z0Z0Z
Z0Z0Z0Z0
8
kZ0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z popopopo
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
0Z0Z0Z0Z 0Z0M0Z0Z
5 5
Z0Z0Z0Z0 Z0Z0Z0Z0
4 4
0Z0Z0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0Z0Z0 J0Z0Z0Z0
2 2
1 1
a b c d e f g h a b c d e f g h
Figure 5.5: The pre-computed Figure 5.6: White Knight and black
knightAttack[d4] bitboard. Pawns.
tions exist for this function that take linear time proportional to the number
of 1-bits in the bitboard. And it too is a native machine code instruction on
many architectures, in which case it is usually executes in constant time.
5.2.1 Attacks
Answering questions 6, 7 and 8 is where the power of bitboards is truly shown,
compared to the simple array representation. With the simple representation,
to know what squares a Queen attacks in a given board state, we would have
to iteratively try all squares in all directions that the Queen can move in,
until we hit a piece or the end of the board. This would take linear time in
the number of attacked squares. With bitboards we can use pre-computed
attack tables for all pieces.
Let us start with the Knight. We have an array called knightAttack
holding the attack patterns of the Knight. Figure 5.5 shows the bitboard
knightAttack[d4].
With this, we could quickly calculate which opponent pieces the Knight
attacks. If the position looks like Figure 5.6, we can find out which pieces
the Knight attacks by calculating
a t t a c k e d P i e c e s = ( knig ht At t a ck [ d4 ] & b l a c k P i e c e s ) ;
42
8
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z
7
Z0Z0Z0Z0
6
0Z0Z0Z0Z
5
Z0Z0Z0Z0
4
0Z0Z0Z0Z
3
Z0Z0Z0Z0
2
1
a b c d e f g h
Figure 5.7: The squares of the pieces that the Knight attacks.
43
8
0Z0ZkZ0Z
Z0ZpZ0Z0
8
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z 0Z0Z0Z0Z
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
qM0S0Zpm 0Z0Z0Z0Z
5 5
Z0L0Z0Z0 Z0Z0Z0Z0
4 4
0Z0O0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0J0Z0 Z0Z0Z0Z0
2 2
1 1
a b c d e f g h a b c d e f g h
Figure 5.8: Limiting the scope of the Figure 5.9: The bitboard of occu-
Rook. pied squares.
We then perform a logical shift right of the occupied bitboard until the
rank in question (the 4th rank in this case) is in the least significant bits of
the bitboard. In this case this means shifting 24 places to the right. The
resulting bitboard can be seen in Figure 5.10.
The 8 least significant bits of this bitboard are 11010011 or in decimal:
211. This can be used as an index into a pre-computed table called rankAt-
tack. The bitboard rankAttack[d4][211], shown in Figure 5.11, contains
exactly the rank attack pattern of the Rook in the position shown earlier.
The entire calculation can be expressed as:
rankAttack [ d4 ] [ ( o c c u p i e d > > 2 4 ) & 2 5 5 ];
This last bitboard suggests that the Rook attacks both b4 and g4, even
though the Knight on b4 is white, just as the Rook itself. This is because
we do not differentiate between differently colored pieces when extracting
the state. Given this attack bitboard, we can choose which pieces we want
to actually attack, if any. To generate non-captures, we would AND out all
pieces from the attack bitboard, using the occupied bitboard. If we want
to generate captures also, we would AND out only our own pieces, using the
whitePieces bitboard in this case.
44
8
0Z0Z0Z0Z
Z0Z0Z0Z0
8
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z 0Z0Z0Z0Z
7 7
Z0Z0Z0Z0 Z0Z0Z0Z0
6 6
0Z0Z0Z0Z 0Z0Z0Z0Z
5 5
Z0Z0Z0Z0 Z0Z0Z0Z0
4 4
0Z0Z0Z0Z 0Z0Z0Z0Z
3 3
Z0Z0Z0Z0 Z0Z0Z0Z0
2 2
1 1
a b c d e f g h a b c d e f g h
45
occupied45L. It may not be exactly clear what it means to rotate a bitboard
45 degrees. But the important thing to note is that the 64 squares of the
board of course still fits into 64 bits regardless of how they are mapped and
that we want to be able to easily extract the state of a diagonal.
If we rotate the board 45 degrees clockwise, the diagonal a1–h8 and all
diagonals parallel to it are now horizontal. We can then map the “first”
diagonal at the bottom (containing only h1) to the first bit of the rotated
occupied bitboard. The next diagonal (g1–h2) can be mapped at the two
next bits. The next diagonal (f1–h3) can be mapped at the three next bits
and so on, until the last diagonal (containing only a8) which is mapped at
the last bit.
We can now use this bitboard to get the state of a diagonal by logical
shifting. But because not all diagonals have the same length, the amount of
shifting needed will not be a simple multiple of 8. The first diagonal (con-
taining only h1) will need no shifting. The next diagonal (g1–h2) will need
to be shifted 1 place, because the previous diagonal contained 1 square. The
next diagonal (f1–h3) will need to be shifted 3 places because the previous
diagonals contained 3 squares in total. This pattern continues with 6, 10, 15,
21, 28, 36, 43, 49, 54, 58, 61 and 63 places. This pattern is of course kept in
a lookup table.
To make things easy, instead of exactly extracting the right number of
bits for indexing, we always extract 8 bits. With diagonals shorter than 8
squares, this means that the most significant bits (of the 8 state bits) can
contain “garbage” from other diagonals. But if we make our diagonal attack
lookup table hold redundant information so that it has the right answer
regardless of the extra “garbage” bits, things become easy again.
The calculation of the attacks on one diagonal from a Bishop on d4 be-
comes:
46
We can now calculate the attacks of all pieces. Rook attacks are calculated
by ORing file and rank attacks. Bishop attacks are calculated by ORing the
two diagonal attacks. Queen attacks are calculated by ORing Bishop and
Rook attacks. Knight and King attacks are taken directly from lookup tables
and Pawns can be dealt with by shifting and/or by lookup tables.
5.2.4 Answers
We can now answer questions 6, 7 and 8 fairly easy.
The answer to question 6 is exactly the piece attack bitboards described
in the last section.
Answering question 7 is done as follows: For each kind of piece, we pretend
that such a piece is standing on square s, and if this imaginary piece attacks
any piece of its own type, we know that the square is attacked. Example:
If an imaginary Queen standing in d4 would attack a real Queen, then it
means that this real Queen attacks d4. To be precise, we just AND the
attack bitboard of a Queen at d4 with the bitboard of Queens. To answer
question 7, we just have to do the same for all kinds of pieces (of a certain
color), until we find a piece that attacks s, in which case we can answer yes,
or until we have tried all kinds of piece and found none attacking s, in which
case we can answer no. All this is fairly fast with bitboards.
Question 8 is dealt with in much the same way: For all kinds of pieces,
we calculate the bitboard of such pieces attacking s, as above. We then just
OR together the results together to have a bitboard of all attackers of s.
5.2.5 Operations
Bitboards affect how the fundamental operations of the alpha-beta algorithm
are performed.
Move Generation
47
each piece found, we calculate the attack bitboard. We then use firstBit()
again to extract each attacked square. This gives us both the source and
the destination square of the move. Dealing with en passant captures and
promotions is trivial. The last move is castling. To check whether castling
is legal, we have to check whether the squares between the King and the
Rook in question, are empty. This is easily done by just ANDing a pre-
computed bitboard of these squares with the occupied bitboard. We also
have to ensure that the squares that the King travels through, including the
source square and the destination square, are not attacked by the opponent.
With the simple representation, this would be difficult and time consuming.
With bitboards, it is just a matter of answering question 7 for these squares.
To really generate only legal moves, we would have to check every move
to see if it left the King in check. Since this is time consuming and can be
avoided easily, we are actually generating pseudo-legal moves: moves that
are legal if leaving the King in check is not considered illegal. If we want
to know whether a move is really legal, we can make the move, check if it
leaves the King in check, and unmake the move again. Most of the time, true
legality of moves is only important when searching. And it is cheaper to let
the search itself check for legality of moves when needed, because the search
is already making and unmaking the moves. And if a cutoff occurs, we have
not wasted time checking the remaining moves for legality.
Making and unmaking moves is a little more involved in the bitboard repre-
sentation than in the simple representation. Apart from updating the simple
array of squares, we also have to update the bitboards of the piece types
involved, and also the four versions of the occupied bitboard. Updating the
bitboards when making a move is just a matter clearing the source square by
ANDing, setting the destination square by ORing, and in the case of the ro-
tated bitboards, looking up the rotation mapping. All this is typically very
fast on modern computers. When unmaking a move, the same things are
done, but with squares in opposite roles.
48
We have to do a little bit of trivial extra work in the case of the special
moves such as en passant, promotion and castling. But this would be true
with the simple representation too.
Bitboard Evaluation
49
mobility. This, ironically, make them the most forcing attackers. If a Pawn
attacks some opponent piece worth more than a Pawn, the opponent will
be wise to respond to this threat, because otherwise he will lose a strong
piece for a lowly Pawn. On the other hand, if the mighty Queen attacks
an opponent piece, this opponent piece will most often be worth less than a
Queen. And so, the opponent need not fear a trade if his piece is protected,
since it would be advantageous to him. The second reason why Pawns are
important is because they can only make slow, irreversible moves. If a Pawn
in the Pawn shield of the King has moved forward, the safety of the King
is compromised permanently. The Pawn can never go back. If the central
Pawns of both players are blocking each other in the center of the board,
mobility will in general be limited for all pieces on the board. If all central
Pawns are traded and off the board, the game can take on a much more
dynamic and tactical nature, because mobility is not being limited in the
center. The pawn structure changes only slowly and forms the strategic
terrain of the chess board. Evaluating the pawn structure can involve many
complicated calculations. Bitboards can make a lot of these calculations
much easier.
In general, many chess program authors choose bitboards not primarily
for their speed, but for the simplicity and cleanness it gives to the design of
the entire program in general and to the evaluation function in particular.
50
Chapter 6
Parallel Searching
• We have to find some way of dividing the work that has to be done, so
that it can be done in parallel. We want to keep all available processors
as busy as possible doing useful work at all times.
• There will be some waste processor time related to the startup of the
parallel algorithm. Not all processors will be doing useful work right
from the start. We would like to minimize this waste.
51
6.1 Parallel Alpha-Beta
The alpha-beta algorithm is inherently sequential in nature; the tree is tra-
versed in a certain order and knowledge about boundaries on the real value
is propagated forward through the search. Still, opportunities for parallelism
do exist. Many subtrees have to be searched even if we have near perfect
move ordering. Searching different subtrees in parallel is one way to obtain
parallelism. We split up the searching of all the legal moves of a node into
several threads. There are problems with this approach, though, that pre-
vents us from using it trivially. Some problems are equivalent to the general
problems mentioned earlier, some are specific to this domain:
• Time spent splitting is not spent searching. If it takes a very long time
to split, average search speed will be slow. We want splitting to be as
fast as possible.
• Some global data structures, most notably the hash table, have access
issues that we have to deal with in the parallel case. If two threads
write to the same hash table entry at the same time, the result can be
an inconsistent entry.
52
6.2 Alpha-Beta Revisited
In [13], Knuth & Moore analyzed the properties of randomly and perfectly
ordered game trees. Since we can achieve near-perfect move ordering in the
case of chess, the properties of perfectly ordered trees are interesting to us.
Knuth & Moore found that nodes could be divided into three categories:
Type 1
All the nodes on the principal variation (PV) are type 1 nodes. This
means that the root node is a type 1 node. The first child of a type 1
node is also a type 1 node, because we assume a perfectly ordered tree.
The rest of the children of a type 1 node, are type 2 nodes. Because
it is a node on the principal variation, no cutoffs can occur; all of the
children of a type 1 node have to be searched.
Type 2
A type 2 node is a child of a type 1 node or a type 3 node. A type 2
node is a node where a cutoff occurs. Since we assume perfect move
ordering, we will only have to search one child of a type 2 node. The
children of type 2 nodes are type 3 nodes.
Type 3
Type 3 nodes are children of type 2 nodes. No cutoffs will occur, they
require all of their children to be searched. Move ordering is irrelevant
at these nodes because they all have to be searched, and none of their
children are good enough to raise alpha; they will all be searched with
the same bounds.
In [16], nodes of type 1, 2 and 3 are called PV, CUT and ALL nodes
respectively. These names describe in a concise manner the difference of the
node types.
The categories outline a basic strategy for doing alpha-beta search in
parallel. Firstly, it is clear that type 2 (CUT) nodes are not suited for
splitting. When the first child has been searched, a cutoff is produced, and
any searches in parallel of the rest of the children is wasted work. Secondly,
53
Type 3 nodes are excellent candidates for splitting. All their children have
to be searched, and we can gain a lot by doing this in parallel. Thirdly,
since all children of type 1 nodes must also be searched, these seem to be
good candidates for splitting too. But they differ from type 3 nodes in an
important way: In type 3 nodes, alpha and beta stays the same throughout
searching of the children. In type 1 nodes, alpha and beta bounds have to be
established by searching the first child before the rest of the children can be
searched with these bounds. This means that the first child of a type 1 node
should be searched sequentially. After this has been done and bounds have
been established, we can split the node searching the rest of the children in
parallel.
Of course, we are not searching perfectly ordered trees. But because
of the the good move ordering heuristics in general and the hash table in
particular, in type 1 nodes, we will in most cases actually search the best
move first. And, as mentioned earlier, in about 90% of the nodes where a
cutoff occurs (type 2 nodes), it happens already after the first move, as in
the perfect case. Nodes of type 3 will have to be searched thoroughly in any
case.
54
6.4 Splitting With Bitboards
The time it takes to perform a split will depend on various implementational
and architectural details. A very important part of splitting, though, is to
transfer information about the node from the splitting thread to the idle
threads that will help search the node. Necessary information to transfer
is the board state, the side to move, the en passant square, the castling
rights, the half move clock to be able to detect draw by the 50 moves rule,
and enough of the game history to enable us to detect repetitions. Memory
transfers are never free, so we want to minimize the amount of information
transferred. The bitboard implementation is in direct opposition here be-
cause it takes up much more memory to represent the board state, than the
simple representation does. In the simple case, the board state consists of
64 squares usually represented as bytes, taking a total of 64 bytes. In the
bitboard implementation, we have this array too, and in addition we have the
bitboards: 12 bitboards for the pieces, 2 occupied bitboards, one for each
color, and 3 rotated occupied bitboards. In total: 17 bitboards of 8 bytes (64
bits) each. This means that we have to transfer 136 bytes more than with
the simple representation. It is difficult to assess in advance whether bit-
boards will be an overall advance or a drawback in a parallel chess program.
Splitting will take longer, but faster search and evaluation may balance this
out. We will try to answer this question in the next chapter on experiments.
55
6.5.1 The Hash Table
There are many different ways to deal with the hash table in the parallel
case. One is to distribute the hash table among different threads so that
each thread is responsible for a part of the table. Another idea is to keep
the table global and then ensure mutually exclusive access from different
threads with mutex locks. There are problems with these solutions, though.
In the distributed case, inter-thread communication when one thread probes
an entry that belongs to another thread, is slow. The mutex solution also
introduces a performance penalty because mutex locks are slow on most
architectures.
A recent idea [11], and the idea we use, is to keep the hash table global
in shared memory and ignore the multiple simultaneous writes issue, and
instead do a cheap but efficient form of error-detection on the hash table
entries: Each hash table entry is 128 bits in size. We can think of this as
consisting of two 64-bit parts. The first part is the hash key of the node
about which information is stored, the second part is the actual information
about the node. When storing a new entry, instead of just storing both parts
normally, we store only the second 64-bit part normally. The first 64-bit
part is XORed with the second 64-bit part before storing. When the table
is later probed and this entry is found, we can check the consistency of the
entry: The second part of the entry is just read normally. The first part of
the entry is read and then XORed with the second part. This way, if the
entry is consistent, we get the original first part because XORing with the
same value twice is the same as doing nothing. The original first part was the
hash key of the position about which information is stored in the entry. If
the hash key of the position that is probing the hash table matches the hash
key stored in the entry, we have a hash hit and the information in the entry
can be used. If it does not match, it can either be because the information
was stored from a different position hashing to the same slot or because the
entry has been corrupted by simultaneous writes. In either case, we have a
hash miss and we do not use the stored information.
A potential problem could be that too many nodes are corrupted, and
56
while we do not use corrupted entries due to the error detection, the number
of useful entries could become so low that the hash table is not fulfilling
its purpose. On the other hand, corrupted entries will be overwritten with
consistent nodes sooner or later so the problem may not be serious. We will
return to this question in the next chapter.
57
serious way either.
We will return to these question of move ordering in the next chapter.
6.6 Design
The overall algorithmic design of our parallel implementation is YBWC. But
there are still many design decisions to take at the more detailed level. In this
chapter we will describe our parallel design in more detail, without bothering
too much with implementation details.
58
the node into so-called split blocks. Each idle thread gets a split block to
work on. The splitting thread is marking itself idle when splitting so it gets
a split block too. A thread that has been assigned a split block calls a special
search function that does essentially as it would in the sequential case at a
normal node. It repeatedly searches a child. The difference is that instead
of generating legal moves, it asks the original splitting node for a move to
search. It then just gets the next move in the list. All threads participating
in this split does this, until there are no more moves left.
Searching a child is done by performing the move and then calling the
regular search function. From here on, the thread acts exactly the same way
as the main thread described above. Most importantly, it itself can split at
some point, if there are idle threads. The helping threads can split too, and
so on recursively.
When a thread requests a new move to search, and finds that there are
no more, it returns its findings to the splitting node. The splitting node
remembers the best move (and its value) found yet by any helper thread.
In this way, all the children of the node gets searched in parallel and the
best move and value is found. The helping thread returns to the idle loop
waiting for new work. Note that the subtrees of the different children can
vary greatly in size so one helper thread can finish long before another. Since
it goes back into the idle loop, it can help out other helper threads that are
not finished searching, when these split.
Finally, we have to deal with the unfortunate case of finding a cutoff in
a helper thread. Ideally, we should never find a cutoff in a node that has
been split. But when it happens we want to stop the other helper threads
as quickly as possible. We do this by signaling stop to all the other helper
threads before returning. The helper threads may have split at some point
too so they tell their helper threads to stop, and so on recursively.
59
60
Chapter 7
Experiments
We have implemented a bitboard chess program that uses all the techniques
described in Chapters 3–6. In this chapter we will describe the experiments
that we have done on this implementation, and attempt to explain the results.
7.1 Description
The basic experiment was done as follows: 10 selected chess positions were
searched in an iterative deepening manner to a nominal depth of 11 plies.
This was done using from 1 and up to 24 processors.
By searching to a nominal depth of 11 plies we mean that the search
function is called with a depth parameter of 11 plies. Extensions will shape
the tree, though, such that some lines are searches more than 11 plies. Also,
quiescence search may selectively search some lines even longer. On the
other hand, alpha-beta pruning will prune some lines long before they reach
11 plies. Thus, searching to a nominal depth of 11 plies, as in our case,
is very different from searching a tree 11 plies deep without extensions and
quiescence search.
The positions were selected from the 24 positions of the well-known
Bratko-Kopec test suite [7]. Not all positions of this test-suite are suited
for experiments in parallelism. The ones containing an unambiguous short-
term tactical solution such as a mate in 5 plies, are not suited because the
61
solution is found very early on in the 11 ply iterative deepening, leaving only
little work for the rest of the search. This means that many processors will
have little or nothing to do. Such positions will be solved very fast in the
parallel case as well, but they will not show the full utility of searching in
parallel. We are not parallelizing to solve such easy positions in roughly the
same time as in the sequential case, but to solve more difficult positions faster
than in the sequential case.
From running these experiments, we get some information after searching
each position: the time used, the number of nodes visited, the hash table hit
rate, the move ordering ratio and the percentage of quiescence nodes of the
total number of nodes searched. From this information, we can obtain other
important information, most notably the search speed (nodes per second)
from which we will later obtain information about the speedup of the algo-
rithm on multiple processors. We will not be looking at the results of the
individual positions, but instead at the sum or average of results over all 10
positions.
7.2 Results
Our original plan was just to run the experiments once on a number of
dedicated processors on a dedicated machine. Preliminary testing on a non-
dedicated machine showed that the implementation worked and that speedup
was reasonable with several processors. But when the results came in from
the main experiment on the dedicated processors, it was obvious that some-
thing was not ideal. The results from the preliminary experiment and the
results from the main experiment looked rather different. Because of this,
we have decided to include both the results of the main experiment and the
results of two other experiments as well, in the discussion of some of the
measurements below.
The hardware used in the three experiments was this:
Main experiment
Sun Enterprise 15K with 384 GB RAM and 68 UltraSPARC-IIICu
62
1.8e+06
1.6e+06
1.4e+06
1.2e+06
Nodes per second
1e+06
800000
600000
400000
200000
0
0 5 10 15 20 25
Processors
Preliminary experiment
Sun Fire 6800 with 48 GB RAM and 22 UltraSPARC-IIICu processors
running at 1200 MHz.
12 Processors experiment
Sun Enterprise 4500 with 24 GB RAM and 12 UltraSPARC-II proces-
sors running at 400 MHz.
63
2.5e+06
2e+06
Nodes per second
1.5e+06
1e+06
500000
0
0 5 10 15 20 25
Processors
64
would give a smoother graph than running on set of processors shared with
other users. The graphs shows the opposite to be true in this case. A pos-
sible explanation is this: The parallel alpha-beta algorithm demands a lot
of memory bandwidth, for memory copying between threads when splitting
and for probing the shared hash table among other things. Even if we have a
number of processors dedicated exclusively to the experiment, we cannot re-
serve memory bandwidth on the machine. It is not unlikely that the intensive
scientific computations done on the rest of the processors (parallel inversion
of huge matrices etc.) uses a lot of memory bandwidth on the machine. And
while the processors used in the preliminary case were not dedicated to our
experiment, processor load was low because of the nature of the applications
running, and it is not unlikely that memory bandwidth use by these applica-
tions was low as well. Our guess is that the systematic drops in search speed
in the main experiment (at 7 processors, 10–12 processors, 14–16 processors
and 19–20 processors and maybe 23–24 processors), was due to heavy jobs
running at these times, using lots of memory bandwidth.
If this is true, it suggests that we would need an exclusively dedicated
machine (both processors and memory bandwidth) to do the experiment
in the most accurate way. Unfortunately, the only machine we could get
exclusive access to was a machine with only 12 processors. For comparison,
the results of running the experiment on this machine is shown in Figure 7.3.
Although we cannot know what would happen with more processors, it is
clear that this graph is smoother and shows better scaling as more processors
are used, than is the case with the main experiment and the preliminary
experiment.
All of these three experiments have drawbacks: The preliminary exper-
iment was only done on 22 processors and neither processors or memory
bandwidth were dedicated exclusively to the experiment. The main exper-
iment was done on 24 dedicated processors, but on a machine with a lot
of other work was done on other processors by other people, limiting the
memory bandwidth of the machine. In the last experiment, processors and
memory bandwidth was exclusively used for the experiment, but the machine
had only 12 processors. Ideally we would have liked to run the experiment
65
1e+06
900000
800000
700000
Nodes per second
600000
500000
400000
300000
200000
100000
0
0 5 10 15 20 25
Processors
66
1.8e+06
1.6e+06
1.4e+06
1.2e+06
Nodes per second
1e+06
800000
600000
400000
200000
0
0 5 10 15 20 25
Processors
7.2.2 Speedup
One typical way of assessing the success of a parallel algorithm is to look
at the speedup gained by using more and more processors, compared to the
base speed of the first processor. Using n processors, we can at most hope for
a speedup of n, giving ideal speedup. In practice, most parallel algorithms
have sub-linear speedup, and this also true in our case. Figures 7.5, 7.6 and
7.7 show the speedup for the main experiment, the preliminary experiment
and the 12 processors experiment respectively.
We have plotted the function y = x in dashed style so that we can see
how far we are from ideal speedup. Essentially, the speedup measurements
are just the search speed measurements divided by the search speed at one
processor, a constant. So the same systematical drops are of course present
here too.
One important thing that these graphs show is that even at 12 processors
the main experiment and the preliminary experiment shows poorer speedup
than the experiment on the dedicated 12 processor machine. In fact, we get a
67
25
20
15
Speedup
10
0
0 5 10 15 20 25
Processors
25
20
15
Speedup
10
0
0 5 10 15 20 25
Processors
68
25
20
15
Speedup
10
0
0 5 10 15 20 25
Processors
69
7e+07
6e+07
5e+07
4e+07
Nodes
3e+07
2e+07
1e+07
0
0 5 10 15 20 25
Processors
70
same linear trend as the measurements from 15 or fewer processors. To know
more about these fluctuations, we would have to make more experiments, on
a truly dedicated machine.
What can be seen from the graph, and what is very important, is that the
average tree size is approximately doubled at 10 processors and tripled at 20
processors, a fast rate. The average tree size at 24 processors is approximately
250% larger than at one processor. This is just as important for practical
purposes as the speedup, because the actual time used to search a position
will depend on both speedup and tree size. Since speedup was sub-linear and
tree size growth is roughly linear, it will become infeasible at some point to
use more processors.
71
Scaling
60
50
40
Seconds
30
20
10
0
0 5 10 15 20 25
Processors
100
80
60
Percentage
40
20
0
0 5 10 15 20 25
Processors
72
100
90
80
70
60
Percentage
50
40
30
20
10
0
0 5 10 15 20 25
Processors
73
100
80
60
Percentage
40
20
0
0 5 10 15 20 25
Processors
74
of these are roughly constant and the third is only affected slightly by the
parallelization, so we attribute the major part of the tree size growth to the
overhead caused by splitting type 2 (CUT) nodes. In other words, rapid
tree growth seems to be inherent in this parallelization of the alpha-beta
algorithm. Even if speedup had been more impressive, possibly by not using
the bitboard representation, the linear tree size growth would have dominated
the speed sooner or later, giving no further drop in time usage.
75
76
Chapter 8
Conclusions
77
The combined effect of the sub-linear speedup and the linear growth in
tree size in the main experiment means that the time needed to search a
particular position to a particular nominal depth, is only dropping slightly
when using more than 10 processors. We do, however, use the least amount
of time at 17 processors.
Measurements of the hash hit rate, the move ordering quality and the
quiescence search tree size (all of them major parameters affecting the overall
tree size) showed that these was not affected much by the parallelization.
Thus, the tree size growth seemed to mostly be an effect of the YBWC
parallelization itself: too many nodes containing a cutoff are searched in
parallel.
8.1 Extensions
There are some interesting statistics that were not gathered from our exper-
iments, that it would be interesting to have. It would be relatively easy to
implement and perform experiments to gather these experiments:
• We could have gathered statistics about how much time was used on
splitting. These numbers would be interesting in their own right, and
in relation to the total search time, or to the search time of a node;
or we could calculate the ratio between search time and split time in
every split node.
78
entries in the table. This would help confirm that the decline in hash
hit rate is due to inconsistent entries.
79
of testing and tweaking to find the right ratio between calculated and
transferred information.
These experiments might not uncover the truth about bitboards in par-
allel chess programs, but they would most certainly help in understanding
the problem better.
80
Appendix A
Data
A.1 Positions
The 10 positions used for the experiments are given here in standard notation
(FEN).
2q1rr1k/3bbnnp/p2p1pp1/2pPp3/PpP1P1P1/1P2BNNP/2BQ1PRK/7R b - - id "BK03"
2kr1bnr/pbpq4/2n1pp2/3p3p/3P1P1B/2N2N1Q/PPP3PP/2KR1B1R w - - id "BK09"
3rr1k1/pp3pp1/1qn2np1/8/3p4/PP1R1P2/2P1NQPP/R1B3K1 b - - id "BK10"
2r1nrk1/p2q1ppp/bp1p4/n1pPp3/P1P1P3/2PBB1N1/4QPPP/R4RK1 w - - id "BK11"
r3r1k1/ppqb1ppp/8/4p1NQ/8/2P5/PP3PPP/R3R1K1 b - - id "BK12"
r2q1rk1/4bppp/p2p4/2pP4/3pP3/3Q4/PP1B1PPP/R3R1K1 w - - id "BK13"
r2q1rk1/1ppnbppp/p2p1nb1/3Pp3/2P1P1P1/2N2N1P/PPB1QP2/R1B2RK1 b - - id "BK17"
3rr3/2pq2pk/p2p1pnp/8/2QBPP2/1P6/P5PP/4RRK1 b - - id "BK19"
r1bqk2r/pp2bppp/2p5/3pP3/P2Q1P2/2N1B3/1PP3PP/R4RK1 b kq - id "BK23"
r2qnrnk/p2b2b1/1p1p2pp/2pPpp2/1PP1P3/PRNBB3/3QNPPP/5RK1 w - - id "BK24"
A.2 Results
In Figure A.1, A.2 and A.3 is shown all of the collected data from the three
experiments discussed. The eight columns of the tables contain the number of
processors, search speed, tree size, search time, hash hit rate, move ordering
and quiescence nodes percentage respectively. All of these are averages over
all 10 positions, except of course column 1 and 3 (CPUs and Speedup).
81
CPUs NPS Speedup Nodes Time Hash Order Qn
1 183788.6 1.000 9944716.9 55.808 34.7 91.9 18.0
2 354635.9 1.930 11035105.8 31.989 35.2 91.9 18.0
3 506924.0 2.758 12355723.8 24.884 35.7 91.6 18.0
4 627200.3 3.413 13834612.9 22.572 35.2 91.5 18.0
5 752641.6 4.095 14505082.6 19.718 34.1 91.3 18.2
6 918063.8 4.995 16036321.2 17.792 34.2 91.3 18.2
7 894707.3 4.868 17400039.4 19.529 33.3 91.1 18.7
8 1071176.5 5.828 18385655.9 17.482 33.3 91.2 19.0
9 1223009.4 6.654 18912869.1 15.658 33.5 91.1 18.6
10 1116726.9 6.076 19763484.5 18.048 32.1 90.8 18.5
11 1077282.7 5.862 21325735.7 19.132 32.1 90.7 19.0
12 1084126.6 5.899 21689951.2 19.578 32.1 90.8 19.0
13 1428279.8 7.771 23015273.8 16.152 31.1 90.6 18.7
14 1393070.8 7.580 22414490.7 16.303 31.0 90.5 19.1
15 1231179.7 6.699 24741002.4 20.265 30.9 90.5 19.5
16 1026975.4 5.588 26116843.8 24.822 30.3 90.3 19.4
17 1626518.5 8.850 25292034.3 15.240 30.5 90.4 19.5
18 1633365.0 8.887 32615379.9 19.434 29.3 90.1 19.1
19 1615603.8 8.791 31944211.0 18.585 28.6 90.0 19.2
20 1534064.5 8.347 31035152.2 19.865 28.5 90.1 19.8
21 1675977.3 9.119 38191994.3 22.418 28.0 89.7 19.9
22 1688979.4 9.190 43804388.8 23.840 27.8 89.9 19.9
23 1611493.2 8.768 35752776.6 21.704 27.6 89.8 20.2
24 1460198.5 7.945 60330499.7 39.401 27.4 89.4 19.7
Figure A.1: Main experiment.
82
CPUs NPS Speedup Nodes Time Hash Order Qn
1 261581.2 1.000 9406997.9 37.082 28.4 91.4 17.7
2 499379.7 1.909 11532789.6 23.809 29.0 91.1 18.1
3 725614.4 2.774 12518008.9 17.822 28.3 91.1 18.2
4 924121.0 3.533 14030847.1 15.632 27.3 90.9 18.5
5 1107147.5 4.233 13934632.1 12.897 27.8 91.1 17.7
6 1262521.9 4.827 16015514.1 12.792 27.0 90.9 17.6
7 1460324.3 5.583 17835364.1 12.453 26.9 90.5 18.1
8 1536594.2 5.874 18083099.2 11.918 26.0 90.6 17.8
9 1745540.1 6.673 20267343.1 11.871 25.9 90.6 18.2
10 1836671.2 7.021 20688437.1 11.539 25.4 90.4 18.2
11 1963157.1 7.505 25600183.9 12.957 24.8 90.2 18.3
12 2010524.4 7.686 24955506.7 12.532 24.7 90.3 18.3
13 2165128.2 8.277 23806082.6 10.957 24.3 90.2 18.2
14 2054958.8 7.856 24417188.9 11.906 24.3 90.2 18.5
15 2077675.7 7.943 28147756.5 13.413 24.1 90.1 18.1
16 2181676.9 8.340 26484931.7 11.765 23.9 90.1 19.0
17 1953368.7 7.468 25102096.7 12.428 23.6 89.9 18.9
18 1829261.6 6.993 25323823.5 13.519 23.7 89.9 18.7
19 1961120.2 7.497 30313409.5 15.530 24.0 90.0 19.0
20 1892223.5 7.234 36279246.7 18.216 22.7 89.5 19.1
21 1761243.7 6.733 38705262.0 20.462 22.4 89.8 19.1
22 1895451.8 7.246 43013644.6 21.676 22.1 89.5 19.1
83
84
Appendix B
Source Code
B.1 bitboard.cpp
// $Id: bitboard.cpp,v 1.5 2004/02/24 12:09:20 s958547 Exp $
#include <iostream>
#include "base.h"
#include "bitboard.h"
int firstBit_[65536];
BitBoard mask_[Squareend];
void initBitBoard()
{
// Fill the firstBit table
firstBit_[0] = -1;
for (unsigned int i = 1; i < 65536; ++i)
{
int count = 0;
int j = i;
while (!(j & 0x0001))
{
j = j >> 1;
++count;
}
firstBit_[i] = count;
}
85
forall (Square,s)
mask_[s] = BitBoard(1ULL << s);
}
B.2 evaluate.cpp
// $Id: evaluate.cpp,v 1.7 2004/07/26 14:26:10 s958547 Exp $
#include <iostream>
#include "evaluate.h"
#include "base.h"
86
},
// knight
{
-120, -120, -120, -120, -120, -120, -120, -120,
-120, 25, 25, 25, 25, 25, 25, -120,
-120, 25, 50, 50, 50, 50, 25, -120,
-120, 25, 50, 100, 100, 50, 25, -120,
-120, 25, 50, 100, 100, 50, 25, -120,
-120, 25, 50, 50, 50, 50, 25, -120,
-120, 25, 25, 25, 25, 25, 25, -120,
-120, -120, -120, -120, -120, -120, -120, -120
},
// bishop
{
-40, -40, -40, -40, -40, -40, -40, -40,
-40, 20, 20, 20, 20, 20, 20, -40,
-40, 20, 30, 30, 30, 30, 20, -40,
-40, 20, 30, 45, 45, 30, 20, -40,
-40, 20, 30, 45, 45, 30, 20, -40,
-40, 20, 30, 30, 30, 30, 20, -40,
-40, 20, 20, 20, 20, 20, 20, -40,
-40, -40, -40, -40, -40, -40, -40, -40
},
// rook
{
0, 0, 10, 15, 15, 10, 0, 0,
0, 0, 10, 15, 15, 10, 0, 0,
0, 0, 10, 15, 15, 10, 0, 0,
0, 0, 10, 15, 15, 10, 0, 0,
0, 0, 10, 15, 15, 10, 0, 0,
0, 0, 10, 15, 15, 10, 0, 0,
0, 0, 10, 15, 15, 10, 0, 0,
0, 0, 10, 15, 15, 10, 0, 0
},
// queen
{
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 75, 75, 75, 75, 0, 0,
0, 0, 75, 100, 100, 75, 0, 0,
0, 0, 75, 100, 100, 75, 0, 0,
0, 0, 75, 75, 75, 75, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
},
// king
{
-900, -900, -900, -900, -900, -900, -900, -900,
-900, -900, -900, -900, -900, -900, -900, -900,
87
-900, -900, -900, -900, -900, -900, -900, -900,
-900, -900, -900, -900, -900, -900, -900, -900,
-900, -900, -900, -900, -900, -900, -900, -900,
-700, -700, -700, -700, -700, -700, -700, -700,
-200, -200, -500, -500, -500, -500, -200, -200,
200, 300, 100, -300, 300, 100, 300, 200
}
// king (endgame)
/*
{
0, 30, 50, 200, 200, 50, 30, 0,
30, 50, 100, 300, 300, 100, 50, 30,
50, 100, 200, 400, 400, 200, 100, 50,
200, 300, 400, 500, 500, 400, 300, 200,
200, 300, 400, 500, 500, 400, 300, 200,
50, 100, 200, 400, 400, 200, 100, 50,
30, 50, 100, 300, 300, 100, 50, 30,
0, 30, 50, 200, 200, 50, 30, 0
}
*/
};
forall (Color,c)
{
BitBoard pieces = pos.occupied(c);
while (pieces)
{
Square s = firstBit(pieces);
Square rs =
c == white ? Square((7 - getRank(s))*8 + getFile(s)) : s;
if (c == white)
score += pieceSquareTable(pos[s].piece(),rs);
else
score -= pieceSquareTable(pos[s].piece(),rs);
clear(pieces,s);
}
}
return score;
}
88
Score evaluate(const Position& pos)
{
Score score = Score(pos.material(white) - pos.material(black));
score += pieceSquareTables(pos);
if (pos.turn() == white)
return Score(score);
else
return Score(-score);
}
B.3 hash.cpp
// $Id: hash.cpp,v 1.10 2004/07/26 14:26:10 s958547 Exp $
#include <algorithm>
#include "hash.h"
#include "random.h"
HashKey hashPiece_[Colorend][Pieceend][Squareend];
HashKey hashEpSquare_[Squareend];
HashKey hashCastling_[Colorend][CastlingRightend];
HashKey hashColor_[Colorend];
void initHash()
{
initRandom(68);
forall (Color,c)
forall (Piece,p)
forall (Square,s)
if (p == none)
hashPiece_[c][p][s] = 0;
else
hashPiece_[c][p][s] = random64();
forall (Square,s)
hashEpSquare_[s] = random64();
forall (Color,c)
{
forall (CastlingRight,cr)
hashCastling_[c][cr] = random64();
89
hashColor_[c] = random64();
}
}
TransRefTable::TransRefTable()
{
if (sizeof(HashEntry) != 16)
{
std::cout << "sizeof(HashEntry) == " << sizeof(HashEntry) << " != 16."
<< newline;
}
hashEnabled = true;
// hashEnabled = false;
clear();
}
bestMove = nullMove;
if (!hashEnabled)
return noHash;
HashEntry* hashTable;
if (pos.turn() == white)
hashTable = hashTableWhite;
else
hashTable = hashTableBlack;
++node.transRefProbes;
if (hashKey == pos.hashKey()) do
{
bestMove = Move(hashTable[index].data.move);
90
Score hashScore = Score(hashTable[index].data.score - maxScore);
++node.transRefHits;
switch (hashTable[index].data.type)
{
case exact:
alpha = hashScore;
if (alpha > mate - 300)
alpha = Score(alpha - ply);
else
if (alpha < -mate + 300)
alpha = Score(alpha + ply);
return exact;
case lowerBound:
if (hashScore >= beta)
{
beta = hashScore;
return lowerBound;
}
break;
case upperBound:
if (hashScore <= alpha)
{
alpha = hashScore;
return upperBound;
}
break;
}
} while (false);
if (hashKey == pos.hashKey()) do
{
bestMove = Move(hashTable[index].data.move);
++node.transRefHits;
91
switch (hashTable[index].data.type)
{
case exact:
alpha = hashScore;
if (alpha > mate - 300)
alpha = Score(alpha - ply);
else
if (alpha < -mate + 300)
alpha = Score(alpha + ply);
return exact;
case lowerBound:
if (hashScore >= beta)
{
beta = hashScore;
return lowerBound;
}
break;
case upperBound:
if (hashScore <= alpha)
{
alpha = hashScore;
return upperBound;
}
break;
}
} while (false);
return noHash;
}
HashEntry* hashTable;
if (pos.turn() == white)
hashTable = hashTableWhite;
else
hashTable = hashTableBlack;
92
static_cast<unsigned int>
(hashTable[index].hashKey % (hashTableSize - hashTableSplit))
+ hashTableSplit;
hashTable[splitIndex] = hashTable[index];
}
else
index = int(pos.hashKey() % (hashTableSize - hashTableSplit))
+ hashTableSplit;
hashTable[index].data.age = hashID;
hashTable[index].data.move = bestMove.representation();
hashTable[index].data.type = type;
hashTable[index].data.depth = depth;
if (score == illegal)
{
hashTable[index].data.score = score + maxScore;
hashTable[index].hashKey = pos.hashKey() ^ hashTable[index].dataul64;
return;
}
Score hashScore;
switch(type)
{
case exact:
if (score > mate - 300)
hashScore = Score(score + ply);
else
if (score < -mate + 300)
hashScore = Score(score - ply);
else
hashScore = score;
break;
case upperBound:
hashScore = max(score,Score(-mate + 300));
break;
case lowerBound:
hashScore = min(score,Score(mate - 300));
break;
default:
hashScore = illegal;
cout << "HashStore: type is illegal." << endl;
}
hashTable[index].data.score = hashScore + maxScore;
93
void TransRefTable::clear()
{
for (int i = 0; i < hashTableSize; ++i)
{
hashTableWhite[i].data.age = 0;
hashTableWhite[i].data.depth = 0;
hashTableWhite[i].hashKey = 0;
hashTableWhite[i].data.move = 0;
hashTableWhite[i].data.score = 0;
hashTableWhite[i].data.type = 0;
hashTableBlack[i].data.age = 0;
hashTableBlack[i].data.depth = 0;
hashTableBlack[i].hashKey = 0;
hashTableBlack[i].data.move = 0;
hashTableBlack[i].data.score = 0;
hashTableBlack[i].data.type = 0;
}
}
HashKey hashEpSquare(Square s)
{
return hashEpSquare_[s];
}
forall (Square,s)
result ^= hashPiece(pos[s].color(),pos[s].piece(),s);
result ^= hashEpSquare(pos.epSquare());
result ^= hashCastling(white,pos.castlingRight(white));
result ^= hashCastling(black,pos.castlingRight(black));
94
return result;
}
B.4 main.cpp
// $Id: main.cpp,v 1.27 2004/06/18 11:54:25 s958547 Exp $
#include <pthread.h>
#include <iostream>
#include <vector>
#include <sstream>
#include <fstream>
#include "moves.h"
#include "position.h"
#include "evaluate.h"
#include "search.h"
#include "hash.h"
#include "parallel.h"
bool done;
Color computer = black;
bool force = true;
string::size_type start = 0;
string::size_type end = 0;
return tokens;
}
95
string line;
getline(cin,line);
vector<string> tokens = tokenize(line," \n\t");
if (tokens.empty())
return;
if (tokens[0] == "!")
cout << pos << newline;
else if (tokens[0] == "quit")
done = true;
else if (tokens[0] == "perft")
{
if (tokens.size() > 1)
{
int d;
stringstream ss;
ss.str(tokens[1]);
ss >> d;
perft(pos,d);
}
}
else if (tokens[0] == "moves")
{
Moves mvs;
generateMoves(pos,mvs);
mvs.removeIllegal(pos);
cout << "Generated " << mvs.size() << " moves:" << newline;
for (int i = 0; i < mvs.size(); ++i)
{
cout << i+1 << ": " << mvs[i] << " ";
if (((i + 1) % 8) == 0)
cout << newline;
}
cout << newline;
}
else if (tokens[0] == "setboard")
{
string FEN;
for (int i = 1; i < tokens.size(); ++i)
FEN += tokens[i]+" ";
pos.setupFEN(FEN);
}
else if (tokens[0] == "openfen")
{
if (tokens.size() > 1)
{
ifstream in;
in.open(tokens[1].c_str());
if (!in)
96
{
cout << "Couldn’t open ’" << tokens[1] << "’" << newline;
}
else
{
string fen;
getline(in,fen);
pos.setupFEN(fen);
}
in.close();
}
}
else if (tokens[0] == "go")
{
computer = pos.turn();
force = false;
}
else if (tokens[0] == "force")
{
force = true;
}
else if (tokens[0] == "eval")
{
cout << "eval: " << evaluate(pos) << endl;
}
else if (tokens[0] == "see")
{
if (tokens.size() > 1)
{
stringstream ss(tokens[1]);
int no = -1;
ss >> no;
if (no != -1)
{
Moves mvs;
generateMoves(pos,mvs);
mvs.removeIllegal(pos);
97
{
Moves mvs;
generateMoves(pos,mvs);
mvs.removeIllegal(pos);
stringstream ss(tokens[0]);
int no = -1;
ss >> no;
if (no != -1)
{
if (no > 0 && no <= mvs.size())
{
int i = no - 1;
cout << mvs[i] << newline;
pos.makeMove(mvs[i]);
++pos.rootPly_;
return;
}
}
98
string fen;
bool batchMode = false;
if (argc > 1)
{
if (argc != 3)
{
cout << "Wrong number of arguments..." << endl;
exit(0);
}
ifstream in;
in.open(argv[1]);
if (!in)
cout << "Couldn’t open ’" << argv[1] << endl;
else
{
cout << "Reading ’" << argv[1] << "’" << endl;
getline(in,fen);
force = false;
batchMode = true;
}
in.close();
stringstream ss(argv[2]);
int t;
ss >> t;
if (t < 1 || t > maxThreads)
{
cout << "Illegal number of threads..." << endl;
exit(0);
}
nThreads = t;
maxThreadsPerNode = t;
}
initBitBoard();
initMoves();
initHash();
initParallel();
if (batchMode)
{
cout << "Setting up FEN: " << fen << endl;
pos.setupFEN(fen);
computer = pos.turn();
99
}
done = false;
while (!done)
{
if (pos.turn() == computer && !force)
think(rootNode);
else
processInput(pos);
if (batchMode)
break;
}
}
B.5 moves.cpp
// $Id: moves.cpp,v 1.17 2004/07/26 14:26:10 s958547 Exp $
#include <iostream>
#include "base.h"
#include "moves.h"
100
f1,g2,h3,
e1,f2,g3,h4,
d1,e2,f3,g4,h5,
c1,d2,e3,f4,g5,h6,
b1,c2,d3,e4,f5,g6,h7,
a1,b2,c3,d4,e5,f6,g7,h8,
a2,b3,c4,d5,e6,f7,g8,
a3,b4,c5,d6,e7,f8,
a4,b5,c6,d7,e8,
a5,b6,c7,d8,
a6,b7,c8,
a7,b8,
a8
};
101
};
BitBoard pawnAttack_[consts::squares][Colorend];
BitBoard knightAttack_[consts::squares];
BitBoard kingAttack_[consts::squares];
BitBoard diagonalAttack_a1h8_[consts::squares][256];
BitBoard diagonalAttack_h1a8_[consts::squares][256];
BitBoard rankAttack_[consts::squares][256];
BitBoard fileAttack_[consts::squares][256];
BitBoard minus9Direction_[consts::squares];
BitBoard plus9Direction_[consts::squares];
BitBoard minus7Direction_[consts::squares];
BitBoard plus7Direction_[consts::squares];
BitBoard minus1Direction_[consts::squares];
BitBoard plus1Direction_[consts::squares];
BitBoard minus8Direction_[consts::squares];
BitBoard plus8Direction_[consts::squares];
int directions_[consts::squares][consts::squares];
Square rotate45Left_[consts::squares];
Square rotate45Right_[consts::squares];
Square rotate90Left_[consts::squares];
102
BitBoard fileMask_[consts::files];
BitBoard rankMask_[consts::ranks];
int rankShift(Square s)
{
return rankShift_[s];
}
int fileShift(Square s)
{
return fileShift_[s];
}
int diagonalShift_a1h8(Square s)
{
return diagonalShift_a1h8_[s];
}
int diagonalShift_h1a8(Square s)
{
return diagonalShift_h1a8_[s];
}
Square unrotate45Left(Square s)
{
return unrotate45Left_[s];
}
Square unrotate45Right(Square s)
{
return unrotate45Right_[s];
}
Square unrotate90Left(Square s)
{
return unrotate90Left_[s];
}
103
os << "O-O";
else
os << "O-O-O";
return os;
}
if (m.piece() != pawn)
os << m.piece();
else if (m.capture())
os << getFile(m.from());
if (m.capture() != none)
os << "x";
os << m.to();
if (m.promote())
os << "=" << m.promote();
return os;
}
void initMoves()
{
// Generate the rotation maps for squares
forall(Square,s)
{
rotate45Left_[unrotate45Left_[s]] = s;
rotate45Right_[unrotate45Right_[s]] = s;
rotate90Left_[unrotate90Left_[s]] = s;
}
// Pawn Attacks
forall (Square,s)
{
pawnAttack_[s][white] = empty;
pawnAttack_[s][black] = empty;
104
// Knight Attacks
const int knightOffset[] = {-17,-15,-10,-6,6,10,15,17};
forall(Square,from)
{
knightAttack_[from] = empty;
for (int i = 0; i != 8; ++i)
{
Square to = Square(from + knightOffset[i]);
if (a1 <= to && to <= h8 && dist(from,to) == 2)
knightAttack_[from] |= mask(to);
}
}
// King Attacks
const int kingOffset[] = {-9,-8,-7,-1,1,7,8,9};
forall(Square,from)
{
kingAttack_[from] = empty;
for (int i = 0; i != 8; ++i)
{
Square to = Square(from + kingOffset[i]);
if (a1 <= to && to <= h8 && dist(from,to) == 1)
kingAttack_[from] |= mask(to);
}
}
forall (Square,from)
forall (Square,to)
directions_[from][to] = 0;
105
directions_[from][to] = -9;
}
}
for (Square to = Square(from + 9); getFile(to) != fileA && to <= h8;
to = Square(to + 9))
{
tmp |= mask(to);
if (mask(rotate45Right(to)) & shiftState)
break;
if (state == 0)
{
plus9Direction_[from] |= mask(to);
directions_[from][to] = 9;
}
}
diagonalAttack_a1h8_[from][int(state)] = tmp;
}
}
106
directions_[from][to] = 7;
}
}
diagonalAttack_h1a8_[from][int(state)] = tmp;
}
}
107
BitBoard tmp = empty;
BitBoard shiftState = BitBoard(state) << fileShift(from);
for (Square to = Square(from - 8); to >= a1;
to = Square(to - 8))
{
tmp |= mask(to);
if (mask(rotate90Left(to)) & shiftState)
break;
if (state == 0)
{
minus8Direction_[from] |= mask(to);
directions_[from][to] = -8;
}
}
for (Square to = Square(from + 8); to <= h8; to = Square(to + 8))
{
tmp |= mask(to);
if (mask(rotate90Left(to)) & shiftState)
break;
if (state == 0)
{
plus8Direction_[from] |= mask(to);
directions_[from][to] = 8;
}
}
fileAttack_[from][int(state)] = tmp;
}
}
forall (Rank,r)
rankMask_[r] = empty;
forall (Square,s)
{
rankMask_[getRank(s)] |= mask(s);
fileMask_[getFile(s)] |= mask(s);
}
}
Square rotate45Left(Square s)
{
return rotate45Left_[s];
}
108
Square rotate45Right(Square s)
{
return rotate45Right_[s];
}
Square rotate90Left(Square s)
{
return rotate90Left_[s];
}
109
if (bitboard & mask(s))
tmp |= mask(unrotate45Right(s));
}
return tmp;
}
/*
inline BitBoard pawnAttack(Square s, Color c)
{
BitBoard result = empty;
if (getFile(s) != fileA)
result |= mask(Square(c == white ? s + 7 : s - 9));
if (getFile(s) != fileH)
result |= mask(Square(c == white ? s + 9 : s - 7));
return result;
}
*/
110
{
return knightAttack_[s];
}
BitBoard fileMask(File f)
111
{
return fileMask_[f];
}
BitBoard rankMask(Rank r)
{
return rankMask_[r];
}
112
| pos.bitboard(black,rook) | pos.bitboard(black,queen);
return(attacks |
(fileAttack(from,pos) & pieces & plus8Direction_[from]));
case 9:
pieces = pos.bitboard(white,bishop) | pos.bitboard(white,queen)
| pos.bitboard(black,bishop) | pos.bitboard(black,queen);
return(attacks |
(diagonalAttack_a1h8(from,pos) & pieces & plus9Direction_[from]));
case -1:
pieces = pos.bitboard(white,rook) | pos.bitboard(white,queen)
| pos.bitboard(black,rook) | pos.bitboard(black,queen);
return(attacks |
(rankAttack(from,pos) & pieces & minus1Direction_[from]));
case -7:
pieces = pos.bitboard(white,bishop) | pos.bitboard(white,queen)
| pos.bitboard(black,bishop) | pos.bitboard(black,queen);
return(attacks |
(diagonalAttack_h1a8(from,pos) & pieces & minus7Direction_[from]));
case -8:
pieces = pos.bitboard(white,rook) | pos.bitboard(white,queen)
| pos.bitboard(black,rook) | pos.bitboard(black,queen);
return(attacks |
(fileAttack(from,pos) & pieces & minus8Direction_[from]));
case -9:
pieces = pos.bitboard(white,bishop) | pos.bitboard(white,queen)
| pos.bitboard(black,bishop) | pos.bitboard(black,queen);
return(attacks |
(diagonalAttack_a1h8(from,pos) & pieces & minus9Direction_[from]));
}
return(attacks);
}
int nc = 1;
Score swap_list[32]; // No more than 32 pieces on the board
113
Score attacked_piece = pieceValueSEE[pos[target].piece()];
swap_list[0] = attacked_piece;
attacked_piece = pieceValueSEE[pos[source].piece()];
clear(attacks,source);
int direction = directions_[target][source];
if (direction)
attacks = xrayAttacks(pos,attacks,source,direction);
Color color = opposite(pos.turn());
Square square;
while (attacks)
{
if (color == white)
{
if (pos.bitboard(white,pawn) & attacks)
square = firstBit(pos.bitboard(white,pawn) & attacks);
else if (pos.bitboard(white,knight) & attacks)
square = firstBit(pos.bitboard(white,knight) & attacks);
else if (pos.bitboard(white,bishop) & attacks)
square = firstBit(pos.bitboard(white,bishop) & attacks);
else if (pos.bitboard(white,rook) & attacks)
square = firstBit(pos.bitboard(white,rook) & attacks);
else if (pos.bitboard(white,queen) & attacks)
square = firstBit(pos.bitboard(white,queen) & attacks);
else if (pos.bitboard(white,king) & attacks)
square = firstBit(pos.bitboard(white,king) & attacks);
else break;
}
else
{
if (pos.bitboard(black,pawn) & attacks)
square = firstBit(pos.bitboard(black,pawn) & attacks);
else if (pos.bitboard(black,knight) & attacks)
square = firstBit(pos.bitboard(black,knight) & attacks);
else if (pos.bitboard(black,bishop) & attacks)
square = firstBit(pos.bitboard(black,bishop) & attacks);
else if (pos.bitboard(black,rook) & attacks)
square = firstBit(pos.bitboard(black,rook) & attacks);
else if (pos.bitboard(black,queen) & attacks)
square = firstBit(pos.bitboard(black,queen) & attacks);
else if (pos.bitboard(black,king) & attacks)
square = firstBit(pos.bitboard(black,king) & attacks);
else break;
}
114
direction = directions_[target][square];
if (direction)
attacks = xrayAttacks(pos,attacks,square,direction);
++nc;
color = opposite(color);
}
while(--nc)
if(swap_list[nc] > -swap_list[nc - 1])
swap_list[nc - 1] = Score(-swap_list[nc]);
return (swap_list[0]);
}
if (bishopAttack(s,pos) &
(pos.bitboard(color,bishop) | pos.bitboard(color,queen)))
return true;
if (rookAttack(s,pos) &
(pos.bitboard(color,rook) | pos.bitboard(color,queen)))
return true;
return false;
}
115
void generateMoves(Piece piece, const Position& pos,
Moves& moves, bool captures)
{
BitBoard pieces = pos.bitboard(pos.turn(),piece);
const BitBoard targets =
captures ? pos.occupied(opposite(pos.turn())) : pos.nonOccupied();
while (pieces)
{
Square from = firstBit(pieces);
BitBoard attacks = attack(piece,from,pos) & targets;
while (attacks)
{
Square to = firstBit(attacks);
moves.push_back(Move(from,to,piece,pos[to].piece(),none));
clear(attacks,to);
}
clear(pieces,from);
}
}
while (attacks)
{
Square to = firstBit(attacks);
Square from = Square(turn == white ? to - 8 : to + 8);
for (Piece p = queen; p >= knight; --p)
moves.push_back(Move(from,to,pawn,none,p));
clear(attacks,to);
}
}
if (pos.epSquare() != noEpSquare)
116
targets |= mask(pos.epSquare());
while (attacks)
{
Square to = firstBit(attacks);
Square from = Square(turn == white ? to - 7 : to + 9);
if (getRank(to) == (turn == white ? rank8 : rank1))
{
for (Piece p = queen; p >= knight; --p)
moves.push_back(Move(from,to,pawn,pos[to].piece(),p));
}
else
{
if (to == pos.epSquare())
moves.push_back(Move(from,to,pawn,pawn,none));
else
moves.push_back(Move(from,to,pawn,pos[to].piece(),none));
}
clear(attacks,to);
}
while (attacks)
{
Square to = firstBit(attacks);
Square from = Square(turn == white ? to - 9 : to + 7);
if (getRank(to) == (turn == white ? rank8 : rank1))
{
for (Piece p = queen; p >= knight; --p)
moves.push_back(Move(from,to,pawn,pos[to].piece(),p));
}
else
{
if (to == pos.epSquare())
moves.push_back(Move(from,to,pawn,pawn,none));
else
moves.push_back(Move(from,to,pawn,pos[to].piece(),none));
}
clear(attacks,to);
}
}
117
else // Non-captures
{
const BitBoard targets = pos.nonOccupied();
BitBoard attacks = turn == white ?
(pos.bitboard(white,pawn) << 8) & targets :
(pos.bitboard(black,pawn) >> 8) & targets;
while (attacks)
{
Square to = firstBit(attacks);
Square from = Square(turn == white ? to - 8 : to + 8);
if (getRank(to) == (turn == white ? rank8 : rank1))
{
for (Piece p = queen; p >= knight; --p)
moves.push_back(Move(from,to,pawn,none,p));
}
else
moves.push_back(Move(from,to,pawn,none,none));
clear(attacks,to);
}
while (attacks2)
{
Square to = firstBit(attacks2);
Square from = Square(turn == white ? to - 16: to + 16);
moves.push_back(Move(from,to,pawn,none,none));
clear(attacks2,to);
}
}
}
118
clear(attacks,to);
}
clear(pieces,from);
}
}
119
while (pieces)
{
Square from = firstBit(pieces);
BitBoard attacks = queenAttack(from,pos) & targets;
while (attacks)
{
Square to = firstBit(attacks);
moves.push_back(Move(from,to,queen,pos[to].piece(),none));
clear(attacks,to);
}
clear(pieces,from);
}
}
if (pos.turn() == white)
{
if (!captures && (pos.castlingRight(white) & kingside))
{
if (!(pos.occupied() & (mask(f1) | mask(g1))))
if (!isAttacking(black,e1,pos) &&
!isAttacking(black,f1,pos) &&
!isAttacking(black,g1,pos))
moves.push_back(Move(e1,g1,king,none,none));
}
120
moves.push_back(Move(e1,c1,king,none,none));
}
}
else
{
if (!captures && (pos.castlingRight(black) & kingside))
{
if (!(pos.occupied() & (mask(f8) | mask(g8))))
if (!isAttacking(white,e8,pos) &&
!isAttacking(white,f8,pos) &&
!isAttacking(white,g8,pos))
moves.push_back(Move(e8,g8,king,none,none));
}
// forall (Piece,p)
// generateMoves(p,pos,moves,false);
generatePawnMoves(pos,moves,true);
generateKnightMoves(pos,moves,true);
generateBishopMoves(pos,moves,true);
generateRookMoves(pos,moves,true);
generateQueenMoves(pos,moves,true);
generateKingMoves(pos,moves,true);
generatePawnMoves(pos,moves,false);
generateKnightMoves(pos,moves,false);
generateBishopMoves(pos,moves,false);
generateRookMoves(pos,moves,false);
generateQueenMoves(pos,moves,false);
generateKingMoves(pos,moves,false);
121
void generateNonQuietMoves(const Position& pos, Moves& moves)
{
generateNonCapturePromotions(pos,moves);
generatePawnMoves(pos,moves,true);
generateKnightMoves(pos,moves,true);
generateBishopMoves(pos,moves,true);
generateRookMoves(pos,moves,true);
generateQueenMoves(pos,moves,true);
generateKingMoves(pos,moves,true);
}
B.6 parallel.cpp
// $Id: parallel.cpp,v 1.23 2004/07/26 14:26:10 s958547 Exp $
#include <iostream>
#include <sstream>
#include "parallel.h"
#include "position.h"
#include "search.h"
int nThreads = 1;
int maxThreadsPerNode = 1;
pthread_mutex_t globalMutex;
pthread_mutex_t outputMutex;
Node* rNode;
parent.pValue = child.cValue;
parent.bestMove = child.bestMove;
}
122
parent.nodes += child.nodes;
parent.qNodes += child.qNodes;
parent.failHighs += child.failHighs;
parent.failHighFirsts += child.failHighFirsts;
parent.transRefProbes += child.transRefProbes;
parent.transRefHits += child.transRefHits;
child.used = 0;
}
if (from == 0)
from = 1;
int free;
for (free = from; free < to; ++free)
if (!splitBlock[free]->used)
break;
// copy stuff
child.pos = parent.pos;
123
child.nodes = 0;
child.qNodes = 0;
child.failHighs = 0;
child.failHighFirsts = 0;
child.transRefProbes = 0;
child.transRefHits = 0;
child.alpha = parent.alpha;
child.beta = parent.beta;
child.ply = parent.ply;
child.depth = parent.depth;
child.totalExtension = parent.totalExtension;
child.cValue = Score(-illegal);
return &child;
}
while (true)
{
pthread_mutex_lock(&node.parent->mutex);
if (i >= moves.size())
{
pthread_mutex_unlock(&node.parent->mutex);
break;
}
124
moves.sortFrom(i);
Move move = moves[i];
++i;
pthread_mutex_unlock(&node.parent->mutex);
pos.makeMove(move);
if (inCheck(opposite(pos.turn()),pos))
{
pos.unMakeMove(move);
continue;
}
Score value;
value = Score(-search(node, Score(-alpha - 1), Score(-alpha),
Depth(depth - onePly), ply + 1, true,
totalExtension));
if (node.stopped)
{
pos.unMakeMove(move);
break;
}
pos.unMakeMove(move);
125
{
for (int t = 0; t < nThreads; ++t)
if (node.parent->children[t] && t != node.threadNumber)
threadStop(node.parent->children[t]);
}
pthread_mutex_unlock(&node.parent->mutex);
pthread_mutex_unlock(&globalMutex);
break;
}
alpha = value;
node.pv.update(ply,node.bestMove);
}
}
}
pthread_mutex_lock(&globalMutex);
if (!thread[threadNumber])
thread[threadNumber] = waiting;
--idleThreads;
pthread_mutex_unlock(&globalMutex);
if (thread[threadNumber] == waiting)
return 0;
pthread_mutex_lock(&globalMutex);
pthread_mutex_lock(&thread[threadNumber]->parent->mutex);
childToParent(*thread[threadNumber]->parent,
*thread[threadNumber]);
thread[threadNumber]->parent->workers--;
thread[threadNumber]->parent->children[threadNumber] = 0;
pthread_mutex_unlock(&thread[threadNumber]->parent->mutex);
126
thread[threadNumber] = 0;
pthread_mutex_unlock(&globalMutex);
}
}
threadLoop(threadNumber,0);
return 0;
}
void initParallel()
{
idleThreads = 0;
splitBlock[0]->parent = 0;
splitBlock[0]->used = true;
splitBlock[0]->stopped = 0;
splitBlock[0]->workers = 0;
splitBlock[0]->threadNumber = 0;
pthread_mutex_init(&globalMutex,0);
pthread_mutex_init(&outputMutex,0);
pthread_attr_t pthread_attr;
pthread_attr_init(&pthread_attr);
pthread_attr_setdetachstate(&pthread_attr, PTHREAD_CREATE_DETACHED);
pthread_attr_setscope(&pthread_attr, PTHREAD_SCOPE_SYSTEM);
127
for (int t = 1; t < nThreads; ++t)
{
cout << "Creating worker thread " << t << endl;
thread[t] = 0;
pthread_t pthread;
pthread_create(&pthread,&pthread_attr,threadInit,(void*) t);
}
thread[0] = splitBlock[0];
rNode = splitBlock[0];
}
thread[node->threadNumber] = 0;
node->workers = 0;
int nblocks = 0;
for (t = 0; t < nThreads && nblocks < maxThreadsPerNode; ++t)
{
node->children[t] = 0;
if (thread[t] == 0)
{
Node* block = parentToChild(*node,t);
if (!block) // No available blocks
continue;
node->children[t] = block;
block->threadNumber = t;
block->pos.threadNumber = t;
block->parent = node;
++node->workers;
++nblocks;
}
}
128
node->pValue = node->alpha;
if (!nblocks)
{
thread[node->threadNumber] = node;
pthread_mutex_unlock(&globalMutex);
return false;
}
pthread_mutex_unlock(&globalMutex);
threadLoop(node->threadNumber, node);
return true;
}
B.7 position.cpp
// $Id: position.cpp,v 1.27 2004/07/26 14:26:10 s958547 Exp $
#include <iostream>
#include <sstream>
#include <cctype>
#include <ctime>
#include <sstream>
#include <string>
#include "position.h"
#include "moves.h"
#include "evaluate.h"
#include "hash.h"
BitBoard bitboard_[Colorend][Pieceend];
BitBoard occupied_[Colorend];
BitBoard occupiedRotate45Left_;
BitBoard occupiedRotate45Right_;
BitBoard occupiedRotate90Left_;
129
forall (Color,c)
{
occupied_[c] = empty;
forall (Piece,p)
bitboard_[c][p] = empty;
}
occupiedRotate45Left_ = empty;
occupiedRotate45Right_ = empty;
occupiedRotate90Left_ = empty;
forall(Square,s)
{
if (pos[s].piece() != none)
{
bitboard_[pos[s].color()][pos[s].piece()] |= mask(s);
occupied_[pos[s].color()] |= mask(s);
occupiedRotate45Left_ |= mask(rotate45Left(s));
occupiedRotate45Right_ |= mask(rotate45Right(s));
occupiedRotate90Left_ |= mask(rotate90Left(s));
}
}
forall (Color,c)
forall (Piece,p)
if (pos.bitboard(c,p) != bitboard_[c][p])
ss << "bitboard(c,p) mismatch: " << c << "," << p << endl;
forall (Color,c)
if (pos.occupied(c) != occupied_[c])
{
ss << "occupied(c) mismatch: " << c << endl;
ss << pos.occupied(c) << endl;
ss << occupied_[c] << endl;
}
if (pos.occupiedRotate45Left() != occupiedRotate45Left_)
ss << "occupied45Left(c) mismatch: " << endl;
if (pos.occupiedRotate45Right() != occupiedRotate45Right_)
ss << "occupied45Right(c) mismatch: " << endl;
if (pos.occupiedRotate90Left() != occupiedRotate90Left_)
ss << "occupied90Left(c) mismatch: " << endl;
int mat[Colorend];
forall (Color,c)
mat[c] = 0;
130
forall (Square,s)
mat[pos[s].color()] += pieceValue(pos[s].piece());
forall (Color,c)
{
if (pos.material(c) != Score(mat[c]))
{
ss << "material[" << c << "] mismatch" << endl;
ss << "material: " << pos.material(c) << endl;
ss << "calc : " << Score(mat[c]) << endl;
}
}
string s = ss.str();
if (!s.empty())
{
cout << desc << endl << "threadID: " << pos.threadNumber << endl;
cout << pos << endl;
cout << "Move: " << m << endl;
cout << s << endl;
exit(EXIT_FAILURE);
}
void Position::makeNullMove()
{
gameRecord_[gamePly_].epSquare = epSquare_;
gameRecord_[gamePly_].halfmoveClock = halfmoveClock_;
gameRecord_[gamePly_].move = nullMove;
gameRecord_[gamePly_].hashKey = hashKey_;
++gamePly_;
++halfmoveClock_;
hashKey_ ^= hashEpSquare(epSquare_);
epSquare_ = noEpSquare;
hashKey_ ^= hashEpSquare(epSquare_);
turn_ = opposite(turn_);
131
checkIntegrity(*this,nullMove,"makeNullMove");
#endif
}
void Position::unMakeNullMove()
{
turn_ = opposite(turn_);
--halfmoveClock_;
--gamePly_;
epSquare_ = gameRecord_[gamePly_].epSquare;
halfmoveClock_ = gameRecord_[gamePly_].halfmoveClock;
hashKey_ = gameRecord_[gamePly_].hashKey;
void Position::makeMove(Move m)
{
const Square from = m.from();
const Square to = m.to();
const Piece piece = m.piece();
const Piece capture = m.capture();
const Piece promote = m.promote();
gameRecord_[gamePly_].epSquare = epSquare_;
gameRecord_[gamePly_].castlingRight[white] = castlingRight_[white];
gameRecord_[gamePly_].castlingRight[black] = castlingRight_[black];
gameRecord_[gamePly_].halfmoveClock = halfmoveClock_;
gameRecord_[gamePly_].move = m;
gameRecord_[gamePly_].hashKey = hashKey_;
132
++gamePly_;
switch (piece)
{
case pawn:
if (promote)
{ // Promotion
// Put the piece on destination square
board_[to] = ColoredPiece(color,promote);
set(bitboard_[color][promote],to);
set(occupied_[color],to);
hashKey_ ^= hashPiece(color,promote,to);
if (!capture)
{
set(occupiedRotate45Left_,rotate45Left(to));
set(occupiedRotate45Right_,rotate45Right(to));
set(occupiedRotate90Left_,rotate90Left(to));
}
else
{ // Remove the captured piece from the destination square
clear(bitboard_[oppColor][capture],to);
clear(occupied_[oppColor],to);
material_[oppColor] -= pieceValue(capture);
hashKey_ ^= hashPiece(oppColor,capture,to);
}
break;
}
else if (to == epSquare_)
{ // En Passant
// Put the piece on destination square
133
board_[to] = ColoredPiece(color,piece);
set(bitboard_[color][piece],to);
set(occupied_[color],to);
set(occupiedRotate45Left_,rotate45Left(to));
set(occupiedRotate45Right_,rotate45Right(to));
set(occupiedRotate90Left_,rotate90Left(to));
hashKey_ ^= hashPiece(color,piece,to);
material_[oppColor] -= pieceValue(capture);
hashKey_ ^= hashPiece(oppColor,capture,capSquare);
break;
}
// Normal pawn move:
case knight:
case bishop:
case rook:
case queen:
case king:
// Put the piece on destination square
board_[to] = ColoredPiece(color,piece);
set(bitboard_[color][piece],to);
set(occupied_[color],to);
hashKey_ ^= hashPiece(color,piece,to);
if (!capture)
{
set(occupiedRotate45Left_,rotate45Left(to));
set(occupiedRotate45Right_,rotate45Right(to));
set(occupiedRotate90Left_,rotate90Left(to));
}
else
{ // Remove the captured piece from the destination square
clear(bitboard_[oppColor][capture],to);
clear(occupied_[oppColor],to);
material_[oppColor] -= pieceValue(capture);
134
hashKey_ ^= hashPiece(oppColor,capture,to);
}
hashKey_ ^= hashEpSquare(epSquare_);
epSquare_ = noEpSquare;
if (piece == pawn && rankDist(from,to) == 2)
{
// maybe only if opponents pawn present? probably, for hashing
epSquare_ = Square(color == white ? from + 8 : from - 8);
}
hashKey_ ^= hashEpSquare(epSquare_);
hashKey_ ^= hashCastling(white,castlingRight_[white]);
if (castlingRight_[white])
135
{
if ((mask(from) | mask(to)) & (mask(a1) | mask(e1)))
castlingRight_[white] =
CastlingRight(castlingRight_[white] & kingside);
hashKey_ ^= hashCastling(black,castlingRight_[black]);
if (castlingRight_[black])
{
if ((mask(from) | mask(to)) & (mask(a8) | mask(e8)))
castlingRight_[black] =
CastlingRight(castlingRight_[black] & kingside);
turn_ = oppColor;
void Position::unMakeMove(Move m)
{
const Square from = m.from();
const Square to = m.to();
const Piece piece = m.piece();
const Piece capture = m.capture();
const Piece promote = m.promote();
turn_ = color;
--gamePly_;
#if !defined NDEBUG
if (gamePly_ < 0)
{
cout << endl << "gamePly_ < 0!" << endl;
136
}
#endif
epSquare_ = gameRecord_[gamePly_].epSquare;
castlingRight_[white] = gameRecord_[gamePly_].castlingRight[white];
castlingRight_[black] = gameRecord_[gamePly_].castlingRight[black];
halfmoveClock_ = gameRecord_[gamePly_].halfmoveClock;
hashKey_ = gameRecord_[gamePly_].hashKey;
switch (piece)
{
case pawn:
if (promote)
{ // Promotion
// Remove the piece from destination square
clear(bitboard_[color][promote],to);
clear(occupied_[color],to);
if (!capture)
{
board_[to] = emptySquare;
clear(occupiedRotate45Left_,rotate45Left(to));
clear(occupiedRotate45Right_,rotate45Right(to));
clear(occupiedRotate90Left_,rotate90Left(to));
}
else
{ // Put the captured piece back on the destination square
board_[to] = ColoredPiece(oppColor,capture);
set(bitboard_[oppColor][capture],to);
set(occupied_[oppColor],to);
material_[oppColor] += pieceValue(capture);
137
break;
}
else if (to == epSquare_)
{ // En Passant
// Remove the piece from destination square
board_[to] = emptySquare;
clear(bitboard_[color][piece],to);
clear(occupied_[color],to);
clear(occupiedRotate45Left_,rotate45Left(to));
clear(occupiedRotate45Right_,rotate45Right(to));
clear(occupiedRotate90Left_,rotate90Left(to));
material_[oppColor] += pieceValue(capture);
break;
}
// Normal pawn move:
case knight:
case bishop:
case rook:
case queen:
case king:
// Remove the piece from destination square
clear(bitboard_[color][piece],to);
clear(occupied_[color],to);
if (!capture)
{
board_[to] = emptySquare;
clear(occupiedRotate45Left_,rotate45Left(to));
clear(occupiedRotate45Right_,rotate45Right(to));
clear(occupiedRotate90Left_,rotate90Left(to));
}
else
{ // Put the captured piece back on the destination square
board_[to] = ColoredPiece(oppColor,capture);
set(bitboard_[oppColor][capture],to);
set(occupied_[oppColor],to);
material_[oppColor] += pieceValue(capture);
138
}
139
os << " +---+---+---+---+---+---+---+---+" << newline;
}
os << " a b c d e f g h" << newline;
os << "\nFEN: " << pos.getFEN() << newline;
return os;
}
if (r != rank1)
FEN << "/";
}
if (turn_ == white)
FEN << "w ";
else
FEN << "b ";
140
FEN << " ";
if (epSquare_ == noEpSquare)
FEN << "- ";
else
FEN << epSquare_ << " ";
return FEN.str();
}
void Position::updateBitBoards()
{
forall(Color,c)
{
occupied_[c] = empty;
forall (Piece,p)
bitboard_[c][p] = empty;
}
occupiedRotate45Left_ = empty;
occupiedRotate45Right_ = empty;
occupiedRotate90Left_ = empty;
forall(Square,s)
{
if (board_[s].piece() != none)
{
bitboard_[board_[s].color()][board_[s].piece()] |= mask(s);
occupied_[board_[s].color()] |= mask(s);
occupiedRotate45Left_ |= mask(rotate45Left(s));
occupiedRotate45Right_ |= mask(rotate45Right(s));
occupiedRotate90Left_ |= mask(rotate90Left(s));
}
}
}
class ParseError
{
public:
ParseError(string ss) : s(ss) {}
string error() { return s; }
private:
string s;
};
141
void parseRank(stringstream& ss, ColoredPiece rank[])
{
char c;
File f = fileA;
while (f <= fileH)
{
ss >> c;
if (c >= ’1’ && c <= ’8’)
{
int count = c - ’0’;
if (f + count - 1 > fileH)
throw ParseError("Too many squares on rank");
for (int i = 0; i < count; ++i)
rank[f++] = emptySquare;
continue;
}
switch (c)
{
case ’P’ : rank[f++] = ColoredPiece(white,pawn); break;
case ’p’ : rank[f++] = ColoredPiece(black,pawn); break;
case ’N’ : rank[f++] = ColoredPiece(white,knight); break;
case ’n’ : rank[f++] = ColoredPiece(black,knight); break;
case ’B’ : rank[f++] = ColoredPiece(white,bishop); break;
case ’b’ : rank[f++] = ColoredPiece(black,bishop); break;
case ’R’ : rank[f++] = ColoredPiece(white,rook); break;
case ’r’ : rank[f++] = ColoredPiece(black,rook); break;
case ’Q’ : rank[f++] = ColoredPiece(white,queen); break;
case ’q’ : rank[f++] = ColoredPiece(black,queen); break;
case ’K’ : rank[f++] = ColoredPiece(white,king); break;
case ’k’ : rank[f++] = ColoredPiece(black,king); break;
142
switch (c)
{
case ’w’: color = white; break;
case ’b’: color = black; break;
default: throw ParseError("Unknown color");
}
}
char c;
ss >> c;
if (c == ’-’)
return;
if (c == ’K’) { crw = CastlingRight(crw | kingside); ss >> c; }
if (c == ’Q’) { crw = CastlingRight(crw | queenside); ss >> c; }
if (c == ’k’) { crb = CastlingRight(crb | kingside); ss >> c; }
if (c == ’q’) { crb = CastlingRight(crb | queenside); ss >> c; }
ss.putback(c);
}
ss >> c;
if (c == ’-’)
{
s = noEpSquare;
return;
}
ss >> c;
if (c >= ’1’ && c <= ’8’)
r = Rank(c - ’1’);
143
else
throw ParseError("Unknown square");
s = toSquare(f,r);
}
try
{
for (Rank r = rank8; r >= rank1; --r)
{
ColoredPiece rank[Fileend];
parseRank(ss,rank);
forall (File,f)
tmpPos.board_[toSquare(f,r)] = rank[f];
if (r > rank1)
parseChar(’/’,ss);
}
parseColor(ss,tmpPos.turn_);
parseCastling
(ss,tmpPos.castlingRight_[white],tmpPos.castlingRight_[black]);
144
parseEpSquare(ss,tmpPos.epSquare_);
parseHalfmoveClock(ss,tmpPos.halfmoveClock_);
parseFullmoveClock(ss);
}
catch (ParseError p)
{
cout << "Error: " << p.error() << endl;
return;
}
forall (Square,s)
board_[s] = tmpPos.board_[s];
turn_ = tmpPos.turn_;
epSquare_ = tmpPos.epSquare_;
forall (Color,c)
castlingRight_[c] = tmpPos.castlingRight_[c];
halfmoveClock_ = tmpPos.halfmoveClock_;
updateBitBoards();
int m[Colorend];
forall (Color,c)
m[c] = 0;
forall (Square,s)
m[board_[s].color()] += pieceValue(board_[s].piece());
forall (Color,c)
material_[c] = Score(m[c]);
gamePly_ = 0;
rootPly_ = 0;
hashKey_ = calcHashKey(*this);
checkIntegrity(*this,nullMove,"setupFEN");
}
int totalMoves;
145
return
(isAttacking(opposite(color),firstBit(pos.bitboard(color,king)),pos));
}
if (depth > 1)
generateSubtree(pos,depth - 1, ply + 1);
else
if (!inCheck(opposite(pos.turn()),pos))
++totalMoves;
pos.unMakeMove(moves[i]);
if (ply == 0)
{
mcount = totalMoves - mcount;
cout << mcount << endl;
}
}
}
totalMoves = 0;
generateSubtree(pos,depth,0);
clock_t after = clock();
double timeUsed =
146
(after - before) / static_cast<double>(CLOCKS_PER_SEC);
cout << "total moves = " << totalMoves
<< " time=" << timeUsed << endl;
}
B.8 random.cpp
// $Id: random.cpp,v 1.1 2004/04/22 12:57:11 s958547 Exp $
#include "random.h"
/* Period parameters */
#define N 624
#define M 397
#define MATRIX_A 0x9908b0df /* constant vector a */
#define UPPER_MASK 0x80000000 /* most significant w-r bits */
#define LOWER_MASK 0x7fffffff /* least significant r bits */
/* Tempering parameters */
#define TEMPERING_MASK_B 0x9d2c5680
#define TEMPERING_MASK_C 0xefc60000
#define TEMPERING_SHIFT_U(y) (y >> 11)
#define TEMPERING_SHIFT_S(y) (y << 7)
#define TEMPERING_SHIFT_T(y) (y << 15)
#define TEMPERING_SHIFT_L(y) (y >> 18)
static unsigned long mt[N]; /* the array for the state vector */
static int mti=N+1; /* mti==N+1 means mt[N] is not initialized */
147
if (mti >= N) { /* generate N words at one time */
int kk;
for (kk=0;kk<N-M;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+M] ^ (y >> 1) ^ mag01[y & 0x1];
}
for (;kk<N-1;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1];
}
y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);
mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1];
mti = 0;
}
y = mt[mti++];
y ^= TEMPERING_SHIFT_U(y);
y ^= TEMPERING_SHIFT_S(y) & TEMPERING_MASK_B;
y ^= TEMPERING_SHIFT_T(y) & TEMPERING_MASK_C;
y ^= TEMPERING_SHIFT_L(y);
return y;
}
uint64 random64()
{
return ((uint64(random32()) << 32) | random32());
}
B.9 search.cpp
// $Id: search.cpp,v 1.45 2004/07/26 14:26:10 s958547 Exp $
148
#include <iostream>
#include <sys/time.h>
#include "search.h"
#include "moves.h"
#include "evaluate.h"
#include "hash.h"
#include "parallel.h"
//#define TRACE
#define TDEPTH 2
namespace shared
{
int maxDepth, maxQDepth;
int historyScore[Colorend][consts::squares][consts::squares];
Depth iterationDepth;
}
os << setfill(’0’);
os << setw(2) << hours << ":";
os << setw(2) << minutes << ":";
os << setw(2) << seconds << ".";
os << setw(2) << hundreths;
os << setfill(’ ’);
return os;
}
149
class Clock
{
public:
Clock()
{
timeval tv;
gettimeofday(&tv,0);
baseline = tv.tv_sec;
start = get();
}
void reset()
{
start = get();
}
private:
Time start;
time_t baseline;
};
Clock clck;
int treeCount = 0;
int gameCount = 0;
150
{
if (pos.gameRecord_[i].hashKey == pos.hashKey())
{
if (i > pos.rootPly_)
++treeCount; // internal tree repetition
else
++gameCount; // repetitions of a game history positions
}
}
moves.score(i,score);
}
}
shared::historyScore[c][m.from()][m.to()] += (d/onePly)*(d/onePly);
151
if (node.killer1[ply] != m)
{
node.killer2[ply] = node.killer1[ply];
node.killer1[ply] = m;
}
}
void scoreMoves(const Node& node, Moves& moves, int ply, Move hashMove)
{
const Position& pos = node.pos;
if (score >= 0)
score += 1000000;
else
score -= 1000000;
moves.score(i,score);
}
else
{
if (moves[i] == node.killer1[ply])
moves.score(i,100000);
else
if (moves[i] == node.killer2[ply])
moves.score(i,50000);
else
moves.score
(i,shared::historyScore[pos.turn()]
[moves[i].from()][moves[i].to()]);
}
}
}
152
void searchTrace(int ply,Score alpha, Score beta, Move move, string text)
{
if (ply < TDEPTH)
{
for (int t = 0; t < ply; ++t)
cout << " ";
cout << "(" << alpha << "," << beta << ") " << move << ": "
<< text << endl;
}
}
#else
#define searchTrace(ply,alpha,beta,move,text)
#endif
Variation& pv = node.pv;
pv.init(ply);
searchTrace(ply,eval,eval,nullMove,"Eval");
return eval;
}
alpha = eval;
}
int legalMoves = 0;
153
moves.sortFrom(i);
if (moves.getScore(i) < 0)
{
break;
}
++node.nodes;
++node.qNodes;
searchTrace(ply,alpha,beta,moves[i],"Qmove");
pos.makeMove(moves[i]);
if (inCheck(opposite(pos.turn()),pos))
{
pos.unMakeMove(moves[i]);
continue;
}
++legalMoves;
Score value =
(Score)(-quiescenceSearch(node, Score(-beta), Score(-alpha),
ply + 1));
pos.unMakeMove(moves[i]);
154
}
searchTrace(ply,alpha,beta,nullMove,"Qreturn");
return alpha;
}
TransRefTable transRef;
++node.nodes;
Variation& pv = node.pv;
pv.init(ply);
if (drawCheck(pos,ply))
{
searchTrace(ply,alpha,beta,nullMove,"Repetition");
return draw;
}
//
// Extensions
//
if (inChk)
{
extension += checkExtension;
Moves evasions;
generateMoves(pos,evasions);
evasions.removeIllegal(pos);
if (evasions.size() == 1)
extension += oneReplyExtension;
}
155
const Move previousMove = pos.gameRecord_[pos.gamePly_ - 2].move;
if (lastMove.piece() == pawn)
{
if (pos.turn() == white)
{
if (getRank(lastMove.to()) >= rank7)
extension += pawnExtension;
}
else
{
if (getRank(lastMove.to()) <= rank2)
extension += pawnExtension;
}
}
if (ply > 1
&& (lastMove.to() == previousMove.to())
&& (pieceValue(lastMove.capture())
== pieceValue(previousMove.capture()))
)
{
extension += recaptureExtension;
}
depth += extension;
totalExtension += extension;
//
// Probe the hashtable
//
switch(transRef.probe(node,depth,ply,alpha,beta,hashMove))
{
case exact:
return alpha;
break;
case lowerBound:
156
return beta;
break;
case upperBound:
return alpha;
break;
}
//
// Null move pruning
//
pos.makeNullMove();
Score value =
Score(-search(node, Score(-beta), Score(-beta + 1),
Depth(depth - onePly - R), ply + 1, false,
totalExtension));
pos.unMakeNullMove();
if (node.stopped)
return draw;
generateMoves(pos,moves);
scoreMoves(node,moves,ply,hashMove);
157
int legalMoves = 0;
bool isExact = false;
Move bestMove = nullMove;
for (i = 0; i < moves.size(); ++i)
{
moves.sortFrom(i);
searchTrace(ply,alpha,beta,moves[i],"Move");
pos.makeMove(moves[i]);
if (inCheck(opposite(pos.turn()),pos))
{
pos.unMakeMove(moves[i]);
continue;
}
++legalMoves;
Score value;
if (i == 0)
{
value = (Score)(-search(node, Score(-beta), Score(-alpha),
Depth(depth - onePly), ply + 1, true,
totalExtension));
}
else
{
value = Score(-search(node, Score(-alpha - 1), Score(-alpha),
Depth(depth - onePly), ply + 1, true,
totalExtension));
if (node.stopped)
{
pos.unMakeMove(moves[i]);
return draw;
}
pos.unMakeMove(moves[i]);
158
if (node.stopped)
return draw;
updateHistoryAndKillers(node,moves[i],pos.turn(),depth,ply);
transRef.store(pos,depth,ply,lowerBound,value,moves[i]);
searchTrace(ply,alpha,beta,moves[i],"Beta cut");
return value;
}
alpha = value;
pv.update(ply,moves[i]);
bestMove = moves[i];
isExact = true;
}
if (success)
{
if (node.stopped)
return draw;
value = node.pValue;
if (value > alpha)
{
bestMove = node.bestMove;
159
{
++node.failHighs;
updateHistoryAndKillers(node,bestMove,pos.turn(),depth,ply);
transRef.store(pos,depth,ply,lowerBound,value,bestMove);
searchTrace(ply,alpha,beta,bestMove,"Beta cut");
return value;
}
alpha = value;
isExact = true;
}
break;
}
}
}
if (legalMoves == 0)
{
if (inChk)
alpha = Score(-mate + ply);
else
alpha = draw;
}
if (isExact)
transRef.store(pos,depth,ply,exact,alpha,bestMove);
else
transRef.store(pos,depth,ply,upperBound,alpha,nullMove);
return alpha;
}
Variation& pv = node.pv;
pv.reset();
160
Move bestMove = nullMove;
for (int i = 0; i < moves.size(); ++i)
{
moves.sortFrom(i);
searchTrace(ply,alpha,beta,moves[i],"Rmove");
pos.makeMove(moves[i]);
Score value;
if (i == 0)
value = (Score)(-search(node, Score(-beta), Score(-alpha),
Depth(depth - onePly), ply + 1, true,
Depth(0)));
else
{
value = Score(-search(node, Score(-alpha - 1), Score(-alpha),
Depth(depth - onePly), ply + 1, true,
Depth(0)));
pos.unMakeMove(moves[i]);
updateHistoryAndKillers(node,moves[i],pos.turn(),depth,ply);
transRef.store(pos,depth,ply,lowerBound,value,moves[i]);
return value;
}
161
alpha = value;
pv.update(ply,moves[i]);
isExact = true;
bestMove = moves[i];
cout << setw(11) << clck.elapsed() << setw(14) << node.nodes << " "
<< setfill(’0’)
<< setw(2) << depth/onePly << "|"
<< setw(2) << shared::maxDepth << "|"
<< setw(2) << shared::maxQDepth
<< " " << alpha
<< " " << pv << newline;
}
}
bestmove = pv.bestMove();
if (isExact)
transRef.store(pos,depth,ply,exact,alpha,bestMove);
else
transRef.store(pos,depth,ply,upperBound,alpha,nullMove);
return alpha;
}
rootNode.nodes = 0;
rootNode.qNodes = 0;
rootNode.failHighs = 0;
rootNode.failHighFirsts = 0;
clck.reset();
162
forall (Color,c)
forall (Square,from)
forall (Square,to)
shared::historyScore[c][from][to] = 0;
rootNode.transRefProbes = 0;
rootNode.transRefHits = 0;
transRef.nextGeneration();
Moves moves;
generateMoves(pos,moves);
moves.removeIllegal(pos);
for (int i = 0; i < moves.size(); ++i)
moves.score(i,0);
Move bestMove;
Score value =
Score(pos.material(pos.turn())
- pos.material(opposite(pos.turn())));
value = rootSearch
(rootNode,alpha,beta,shared::iterationDepth,moves,bestMove);
163
<< newline;
// storePV(pos,pv);
}
cout << newline << "My move: " << bestMove << newline << newline;
pos.makeMove(bestMove);
}
B.10 base.h
// $Id: base.h,v 1.15 2004/05/15 00:30:53 s958547 Exp $
#include <iostream>
#include <cstdlib>
#include <iomanip>
#include <algorithm>
#include "enums.h"
namespace consts
164
{
const int files = 8;
const int ranks = 8;
const int squares = files*ranks;
const int max_moves = 300;
}
enum Square
{
a1,b1,c1,d1,e1,f1,g1,h1,
a2,b2,c2,d2,e2,f2,g2,h2,
a3,b3,c3,d3,e3,f3,g3,h3,
a4,b4,c4,d4,e4,f4,g4,h4,
a5,b5,c5,d5,e5,f5,g5,h5,
165
a6,b6,c6,d6,e6,f6,g6,h6,
a7,b7,c7,d7,e7,f7,g7,h7,
a8,b8,c8,d8,e8,f8,g8,h8,
Squarebegin = a1, Squareend = h8 + 1
};
DeclareEnumTricks(Square)
class ColoredPiece
166
{
public:
ColoredPiece() {}
ColoredPiece(Color c, Piece p) : color_(c), piece_(p) {}
Color color() const { return color_; }
Piece piece() const { return piece_; }
private:
Color color_;
Piece piece_;
};
enum Score
{
minScore = -131072,
draw = 0,
pawnValue = 1000,
mate = 130000,
167
illegal = 130100,
maxScore = +131072
};
168
inline std::ostream& newline(std::ostream& os)
{
return os << ’\n’;
}
#endif
B.11 bitboard.h
// $Id: bitboard.h,v 1.15 2004/07/26 14:26:10 s958547 Exp $
#include <iostream>
#include "base.h"
#else
class BitBoard
{
public:
BitBoard() { undefined = true; }
explicit BitBoard(uint64 ulli) :
bb(ulli) { undefined = false; }
169
BitBoard operator ~() const
{ c(); return BitBoard(~bb); }
BitBoard operator <<(int i) const
{ c(); return BitBoard(bb << i); }
BitBoard operator >>(int i) const
{ c(); return BitBoard(bb >> i); }
bool operator ==(const BitBoard& rhs) const
{ return (bb == rhs.bb); }
private:
void c() const
{
if (undefined)
std::cout << "Uninitialized BitBoard read from:"
<< bb << std::endl;
}
bool undefined;
uint64 bb;
};
#endif
//#define CALCMASK
#if defined CALCMASK
inline BitBoard mask(Square s)
{
return BitBoard(1ULL << s);
}
#else
extern BitBoard mask_[Squareend];
170
else if (int(b >> 48) & 0xffff)
return Square(firstBit_[int(b >> 48) & 0xffff] + 48);
return Square_end();
}
void initBitBoard();
#endif
B.12 enums.h
// $Id: enums.h,v 1.2 2004/02/15 11:20:55 s958547 Exp $
#define DeclareEnumTricks(T) \
inline T& operator++(T& e) \
{ \
e = T(e+1); \
return e; \
} \
171
\
inline T operator++(T& e, int) \
{ \
T old = e; \
e = T(e+1); \
return old;\
} \
\
inline T& operator--(T& e) \
{ \
e = T(e-1); \
return e; \
} \
\
inline T operator--(T& e, int) \
{ \
T old = e; \
e = T(e-1); \
return old; \
} \
\
inline T T##_end() \
{ \
return T##end; \
} \
\
inline T T##_begin() \
{ \
return T##begin; \
}
#endif
B.13 evaluate.h
// $Id: evaluate.h,v 1.2 2004/02/24 12:09:20 s958547 Exp $
#include "position.h"
172
{
Score(0),
pawnValue,
Score(pawnValue*3),
Score(pawnValue*3),
Score(pawnValue*5),
Score(pawnValue*9),
Score(0)
};
return pieceValue_[p];
}
#endif
B.14 hash.h
// $Id: hash.h,v 1.7 2004/06/16 11:27:44 s958547 Exp $
#include <iostream>
#include "base.h"
#include "move.h"
#include "position.h"
#include "parallel.h"
class TransRefTable
{
public:
TransRefTable();
void nextGeneration()
{
hashID = (hashID + 1) & 7;
173
if (hashID == 0)
++hashID;
}
void clear();
int getUsage()
{
int count = 0;
for (int i = 0; i < hashTableSize; ++i)
{
if (hashTableWhite[i].data.age == hashID)
++count;
if (hashTableBlack[i].data.age == hashID)
++count;
}
return count;
}
private:
// Asumption: sizeof(unsigned long) == 8 (64 bits)
struct HashEntry
{
union
{
struct
{
unsigned long score : 18;
unsigned long depth : 14;
HashKey hashKey;
};
bool hashEnabled;
int hashTableSize;
int hashTableSplit;
HashEntry* hashTableWhite;
HashEntry* hashTableBlack;
unsigned int hashID;
};
174
void initHash();
HashKey calcHashKey(const Position& pos);
HashKey hashPiece(Color c, Piece p, Square s);
HashKey hashEpSquare(Square s);
HashKey hashCastling(Color c, CastlingRight cr);
#endif
B.15 move.h
// $Id: move.h,v 1.8 2004/05/19 20:42:09 s958547 Exp $
#include "base.h"
class Move
{
public:
static Move nullMove;
Move() {}
explicit Move(unsigned int m) : move(m) {}
Move(Square from, Square to, Piece piece,
Piece capture, Piece promote) :
move ((((((((promote << 3) | capture) << 3)
| piece) << 6) | to) << 6) | from)
{
// move = ((((((((promote << 3) | capture) << 3)
// | piece) << 6) | to) << 6) | from);
}
175
#endif
B.16 moves.h
// $Id: moves.h,v 1.14 2004/07/26 14:26:10 s958547 Exp $
#include <iostream>
#include <algorithm>
#include "base.h"
#include "position.h"
#include "move.h"
#include "evaluate.h"
class Moves
{
public:
Moves() { sz = 0; }
void push_back(Move m) { mvs[sz++].move = m; }
int size() const { return sz; }
Move operator[](int i) const { return mvs[i].move; }
void score(int i, int score) { mvs[i].score = score; }
int getScore(int i) const { return mvs[i].score; }
176
}
}
}
void reset() { sz = 0; }
void printScores()
{
std::cout << sz << " moves: ";
for (int i = 0; i < sz; ++i)
std::cout << mvs[i].score << " ";
std::cout << std::endl;
}
void checkSort(int to)
{
for (int i = 0; i < to && i+1 < sz; ++i)
if (mvs[i].score < mvs[i+1].score)
std::cout << "doh!" << std::endl;
}
private:
int sz;
struct ScoredMove
{
Move move;
int score;
};
ScoredMove mvs[consts::max_moves];
};
class Variation
{
public:
Variation() {}
void reset() { length_[0] = 0; }
void init(int ply) { length_[ply] = ply; }
void update(int ply, Move m)
{
v_[ply][ply] = m;
for (int j = ply + 1; j < length_[ply + 1]; ++j)
v_[ply][j] = v_[ply + 1][j];
length_[ply] = length_[ply + 1];
}
Move bestMove() { return v_[0][0]; }
int length() const { return length_[0]; }
Move operator[](int i) const { return v_[0][i]; }
void copy(const Variation& pv, int ply)
{
for (int i = ply; i < pv.length_[ply]; ++i)
v_[ply][i] = pv.v_[ply][i];
177
length_[ply] = pv.length_[ply];
return eval;
}
void initMoves();
void generateMoves(const Position& pos, Moves& moves);
void generateNonQuietMoves(const Position& pos, Moves& moves);
Square rotate45Left(Square s);
Square rotate45Right(Square s);
Square rotate90Left(Square s);
bool isAttacking(Color color, Square s, const Position& pos);
Score SEE(const Position &pos, Move move);
178
os << v[i] << " ";
return os;
}
#endif
B.17 parallel.h
// $Id: parallel.h,v 1.18 2004/06/18 11:54:26 s958547 Exp $
#include <pthread.h>
#include "position.h"
#include "base.h"
#include "moves.h"
struct Node
{
Node* parent;
Node* children[maxThreads];
volatile bool stopped;
volatile int workers;
bool used;
int threadNumber;
pthread_mutex_t mutex;
Position pos;
Variation pv;
int nodes;
int qNodes;
int failHighs;
int failHighFirsts;
int transRefProbes;
int transRefHits;
Move killer1[maxPly],killer2[maxPly];
Score pValue;
Score cValue;
Score alpha;
Score beta;
179
Depth depth;
int ply;
Depth totalExtension;
Moves moves[maxPly];
int nextMove[maxPly];
Move bestMove;
};
void initParallel();
bool split(Node* node);
#endif
B.18 position.h
// $Id: position.h,v 1.25 2004/07/26 14:26:10 s958547 Exp $
#include <iostream>
#include <string>
#include "bitboard.h"
#include "move.h"
class Position
{
public:
Position()
{
setupFEN
("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1");
}
Position(std::string FEN)
{
setupFEN(FEN);
180
}
void initial();
181
{ return occupiedRotate90Left_; }
// data:
struct HistoryEntry
{
Square epSquare;
CastlingRight castlingRight[Colorend];
int halfmoveClock;
Move move;
HashKey hashKey;
};
int gamePly_;
HistoryEntry gameRecord_[50]; // not enough for real play,
// but enough for testing
int rootPly_;
int threadNumber;
private:
void updateBitBoards();
ColoredPiece board_[consts::squares];
BitBoard bitboard_[Colorend][Pieceend];
BitBoard occupied_[Colorend];
BitBoard occupiedRotate45Left_;
BitBoard occupiedRotate45Right_;
BitBoard occupiedRotate90Left_;
Color turn_;
Square epSquare_;
CastlingRight castlingRight_[Colorend];
int halfmoveClock_;
182
Score material_[Colorend];
HashKey hashKey_;
};
#endif
B.19 random.h
// $Id: random.h,v 1.1 2004/04/22 12:57:11 s958547 Exp $
#include "base.h"
#endif
B.20 search.h
// $Id: search.h,v 1.6 2004/05/15 22:41:45 s958547 Exp $
#include "position.h"
#include "parallel.h"
#endif
183
184
Bibliography
[8] Feldmann, R., Mysliwietz, P., and Monien, B. Game tree search
on a massively parallel system. In Advances in Computer Chess 7 (1994),
University of Limburg, pp. 203–218.
185
[9] Hartmann, D. Notions of Evaluation Functions Tested Against Grand-
master Games. In Advances in Computer Chess 5 (1989), Elsevier Sci-
ence Publishers, pp. 91–141.
186
[20] Shams, R., Kaindl, H., and Horacek, H. Using aspiration windows
for minimax algorithms. In Proceedings of the 12th IJCAI (Sidney,
Australia, 1991), pp. 192–197.
187