-
-
Notifications
You must be signed in to change notification settings - Fork 5
Evaluation & Testing
Evaluation is handled by the Tester
class, with additional functionality for two-player environments implemented in the TwoPlayerTester
class.
Evaluation can be run every epoch as part of a training process, or can be run independently via --mode=test
. Both are configured the same way. Multi-Player environments are tested against any number of baselines, while single-player environments are evaluated using additional self-play episodes with separate hyperparameters from training episodes (perhaps with temperature = 0.0 or less to no added noise).
-
algo_config
: hyperparameters/configuration for the trained algorithm, see algorithm wiki page for more details -
episodes_per_epoch
: number of episodes to collect for each test/baseline - [Multi-Player envs]
baselines
: list of baseline evaluator configurations, see below
The random baseline chooses a random move from the set of legal moves in a given state.
None
Greedy baselines choose a move based upon a specified heuristic. Environments implement specific heuristics to choose from. For example, one of the heuristics implemented by Othello is corners
, which rates actions higher when they place a tile in the corner. Each environment wiki page lists the implemented heuristics
-
heuristic
: the environment-specific heuristic used to evaluate the current state
Much like AlphaZero, uses Monte Carlo Tree Search to explore game states, but rather than use a neural network to evaluate a state, Greedy MCTS uses a heuristic at each leaf node. Greedy MCTS just relies on backpropogating Q-values, using a uniform distribution as the policy for any given state.
-
heuristic
: the environment-specific heuristic used to evaluate a leaf node -
num_iters
: an iteration of vectorized MCTS moves along an edge from one node to the next. A budget of sizenum_iters
is given, which determines how many edges will be traversed during search. Note: this differs from other implementations of MCTS, where an iteration is defined as the process of traveling from the root node to a leaf node, and iteration for M iters will yield a search tree with M+1 nodes. -
max_nodes
: max size of the search tree -
dirichlet epsilon
: proportion of root node policy composed of dirichlet noise. This is more useful for AlphaZero/LazyZero, I would advise setting this to 0.0 (no noise) when using Greedy MCTS. -
dirichlet alpha
: magnitude of dirichlet noise
Here are a few example Evaluation configuration files: