Skip to content

Commit 088e24f

Browse files
authored
Add run individual step only option (tensorflow#4049)
* Add run individual step only option * Fix comments and update readme * Add valiation argument * Address comments * Make code shorter * Fix more lints
1 parent 5be3c06 commit 088e24f

File tree

6 files changed

+271
-160
lines changed

6 files changed

+271
-160
lines changed

research/minigo/README.md

Lines changed: 69 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
11
# MiniGo
22
This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
33

4-
MiniGo is a minimalist Go engine modeled after AlphaGo Zero, built on MuGo. The current implementation consists of three main modules: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently the **model** part is our focus.
4+
MiniGo is a minimalist Go engine modeled after AlphaGo Zero, ["Mastering the Game of Go without Human
5+
Knowledge"](https://www.nature.com/articles/nature24270). An useful one-diagram overview of Alphago Zero can be found in the [cheat sheet](https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0).
56

6-
This implementation maintains the features of model training and validation, and also provides evaluation of two Go models.
7+
The implementation of MiniGo consists of three main components: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently, the **DualNet model** is our focus.
78

89

9-
## DualNet Model
10+
## DualNet Architecture
11+
DualNet is the neural network used in MiniGo. It's based on residual blocks with two heads output. Following is a brief overview of the DualNet architecture.
12+
13+
### Input Features
1014
The input to the neural network is a [board_size * board_size * 17] image stack
1115
comprising 17 binary feature planes. 8 feature planes consist of binary values
1216
indicating the presence of the current player's stones; A further 8 feature
1317
planes represent the corresponding features for the opponent's stones; The final
1418
feature plane represents the color to play, and has a constant value of either 1
15-
if black is to play or 0 if white to play. Check `features.py` for more details.
19+
if black is to play or 0 if white to play. Check [features.py](features.py) for more details.
1620

21+
### Neural Network Structure
1722
In MiniGo implementation, the input features are processed by a residual tower
1823
that consists of a single convolutional block followed by either 9 or 19
1924
residual blocks.
@@ -31,8 +36,9 @@ Each residual block applies the following modules sequentially to its input:
3136
6. A skip connection that adds the input to the block
3237
7. A rectifier non-linearity
3338

34-
Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
39+
Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size in MiniGo implementation.
3540

41+
### Dual Heads Output
3642
The output of the residual tower is passed into two separate "heads" for
3743
computing the policy and value respectively. The policy head applies the
3844
following modules:
@@ -51,64 +57,82 @@ The value head applies the following modules:
5157
6. A fully connected linear layer to a scalar
5258
7. A tanh non-linearity outputting a scalar in the range [-1, 1]
5359

54-
The overall network depth, in the 10 or 20 block network, is 19 or 39
60+
In MiniGo, the overall network depth, in the 10 or 20 block network, is 19 or 39
5561
parameterized layers respectively for the residual tower, plus an additional 2
5662
layers for the policy head and 3 layers for the value head.
5763

5864
## Getting Started
5965
This project assumes you have virtualenv, TensorFlow (>= 1.5) and two other Go-related
6066
packages pygtp(>=0.4) and sgf (==0.5).
6167

62-
6368
## Training Model
64-
One iteration of reinforcement learning consists of the following steps:
65-
- Bootstrap: initializes a random model
66-
- Selfplay: plays games with the latest model, producing data used for training
69+
One iteration of reinforcement learning (RL) consists of the following steps:
70+
- Bootstrap: initializes a random DualNet model. If the estimator directory has exist, the model is initialized with the last checkpoint.
71+
- Selfplay: plays games with the latest model or the best model so far identified by evaluation, producing data used for training
6772
- Gather: groups games played with the same model into larger files of tfexamples.
68-
- Train: trains a new model with the selfplay results from the most recent N
69-
generations.
73+
- Train: trains a new model with the selfplay results from the most recent N generations.
7074

71-
Run `minigo.py`.
75+
To run the RL pipeline, issue the following command:
7276
```
73-
python minigo.py
77+
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256
7478
```
79+
Arguments:
80+
* `--base_dir`: Base directory for MiniGo data and models. If not specified, it's set as /tmp/minigo/ by default.
81+
* `--board_size`: Go board size. It can be either 9 or 19. By default, it's 9.
82+
* `--batch_size`: Batch size for model training. If not specified, it's calculated based on go board size.
83+
Use the `--help` or `-h` flag to get a full list of possible arguments. Besides all these arguments, other parameters about RL pipeline and DualNet model can be found and configured in [model_params.py](model_params.py).
84+
85+
Suppose the base directory argument `base_dir` is `$HOME/minigo/` and we use 9 as the `board_size`. After model training, the following directories are created to store models and game data:
86+
87+
$HOME/minigo # base directory
88+
89+
├── 9_size # directory for 9x9 board size
90+
│ │
91+
│ ├── data
92+
│ │ ├── holdout # holdout data for model validation
93+
│ │ ├── selfplay # data generated by selfplay of each model
94+
│ │ └── training_chunks # gatherd tf_examples for model training
95+
│ │
96+
│ ├── estimator_model_dir # estimator working directory
97+
│ │
98+
│ ├── trained_models # all the trained models
99+
│ │
100+
│ └── sgf # sgf (smart go files) folder
101+
│ ├── 000000-bootstrap # model name
102+
│ │ ├── clean # clean sgf files of model selfplay
103+
│ │ └── full # full sgf files of model selfplay
104+
│ ├── ...
105+
│ └── evaluate # clean sgf files of model evaluation
106+
107+
└── ...
75108

76109
## Validating Model
77-
Run `minigo.py` with `--validation` argument
110+
To validate the trained model, issue the following command with `--validation` argument:
78111
```
79-
python minigo.py --validation
80-
```
81-
The `--validation` argument is to generate holdout dataset for model validation
82-
83-
## Evaluating MiniGo Models
84-
Run `minigo.py` with `--evaluation` argument
112+
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --validation
85113
```
86-
python minigo.py --evaluation
87-
```
88-
The `--evaluation` argument is to invoke the evaluation between the latest model and the current best model.
89-
90-
## Testing Pipeline
91-
As the whole RL pipeline may takes hours to train even for a 9x9 board size, we provide a dummy model with a `--debug` mode for testing purpose.
92114

93-
Run `minigo.py` with `--debug` argument
94-
```
95-
python minigo.py --debug
96-
```
97-
The `--debug` argument is for testing purpose with a dummy model.
115+
## Evaluating Models
116+
The performance of two models are compared with evaluation step. Given two models, one plays black and the other plays white. They play several games (# of games can be configured by parameter `eval_games` in [model_params.py](model_params.py)), and the one wins by a margin of 55% will be the winner.
98117

99-
Validation and evaluation can also be tested with the dummy model by combing their corresponding arguments with `--debug`.
100-
To test validation, run the following commands:
101-
```
102-
python minigo.py --debug --validation
103-
```
104-
To test evaluation, run the following commands:
118+
To include the evaluation step in the RL pipeline, `--evaluation` argument can be specified to compare the performance of the `current_trained_model` and the `best_model_so_far`. The winner is used to update `best_model_so_far`. Run the following command to include evaluation step in the pipeline:
105119
```
106-
python minigo.py --debug --evaluation
107-
```
108-
To test both validation and evaluation, run the following commands:
109-
```
110-
python minigo.py --debug --validation --evaluation
120+
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --evaluation
111121
```
112122

113-
## MCTS and Go features (TODO)
114-
Code clean up on MCTS and Go features.
123+
## Testing Pipeline
124+
As the whole RL pipeline may take hours to train even for a 9x9 board size, a `--test` argument is provided to test the pipeline quickly with a dummy neural network model.
125+
126+
To test the RL pipeline with a dummy model, issue the following command:
127+
```
128+
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --test
129+
```
130+
131+
## Running Self-play Only
132+
Self-play only option is provided to run selfplay step individually to generate training data in parallel. Issue the following command to run selfplay only with the latest trained model:
133+
```
134+
python minigo.py --selfplay
135+
```
136+
Other optional arguments:
137+
* `--selfplay_model_name`: The name of the model used for selfplay only. If not specified, the latest trained model will be used for selfplay.
138+
* `--selfplay_max_games`: The maximum number of games selfplay is required to generate. If not specified, the default parameter `max_games_per_generation` is used.

research/minigo/dualnet.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -191,24 +191,24 @@ def export_model(working_dir, model_path):
191191
tf.gfile.Copy(filename, destination_path)
192192

193193

194-
def train(working_dir, tf_records, generation_num, params):
194+
def train(working_dir, tf_records, generation, params):
195195
"""Train the model for a specific generation.
196196
197197
Args:
198198
working_dir: The model working directory to save model parameters,
199199
drop logs, checkpoints, and so on.
200200
tf_records: A list of tf_record filenames for training input.
201-
generation_num: The generation to be trained.
201+
generation: The generation to be trained.
202202
params: hyperparams of the model.
203203
204204
Raises:
205-
ValueError: if generation_num is not greater than 0.
205+
ValueError: if generation is not greater than 0.
206206
"""
207-
if generation_num <= 0:
207+
if generation <= 0:
208208
raise ValueError('Model 0 is random weights')
209209
estimator = tf.estimator.Estimator(
210210
dualnet_model.model_fn, model_dir=working_dir, params=params)
211-
max_steps = (generation_num * params.examples_per_generation
211+
max_steps = (generation * params.examples_per_generation
212212
// params.batch_size)
213213
profiler_hook = tf.train.ProfilerHook(output_dir=working_dir, save_secs=600)
214214

research/minigo/gtp_wrapper.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,7 @@ def __init__(self, board_size):
4949
def set_size(self, n):
5050
if n != self.board_size:
5151
raise ValueError((
52-
'''Can't handle boardsize {n}!Restart with env var BOARD_SIZE={n}'''
53-
).format(n=n))
52+
"Can't handle boardsize {}! Please check the board size.").format(n))
5453

5554
def set_komi(self, komi):
5655
self.komi = komi
@@ -75,7 +74,7 @@ def accomodate_out_of_turn(self, color):
7574
self.position.flip_playerturn(mutate=True)
7675

7776
def make_move(self, color, vertex):
78-
c = coords.from_pygtp(vertex)
77+
c = coords.from_pygtp(self.board_size, vertex)
7978
# let's assume this never happens for now.
8079
# self.accomodate_out_of_turn(color)
8180
return self.play_move(c)
@@ -85,7 +84,7 @@ def get_move(self, color):
8584
move = self.suggest_move(self.position)
8685
if self.should_resign():
8786
return gtp.RESIGN
88-
return coords.to_pygtp(move)
87+
return coords.to_pygtp(self.board_size, move)
8988

9089
def final_score(self):
9190
return self.position.result_string()

0 commit comments

Comments
 (0)