You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
3
3
4
-
MiniGo is a minimalist Go engine modeled after AlphaGo Zero, built on MuGo. The current implementation consists of three main modules: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently the **model** part is our focus.
4
+
MiniGo is a minimalist Go engine modeled after AlphaGo Zero, ["Mastering the Game of Go without Human
5
+
Knowledge"](https://www.nature.com/articles/nature24270). An useful one-diagram overview of Alphago Zero can be found in the [cheat sheet](https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0).
5
6
6
-
This implementation maintains the features of model training and validation, and also provides evaluation of two Go models.
7
+
The implementation of MiniGo consists of three main components: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently, the **DualNet model** is our focus.
7
8
8
9
9
-
## DualNet Model
10
+
## DualNet Architecture
11
+
DualNet is the neural network used in MiniGo. It's based on residual blocks with two heads output. Following is a brief overview of the DualNet architecture.
12
+
13
+
### Input Features
10
14
The input to the neural network is a [board_size * board_size * 17] image stack
*`--base_dir`: Base directory for MiniGo data and models. If not specified, it's set as /tmp/minigo/ by default.
81
+
*`--board_size`: Go board size. It can be either 9 or 19. By default, it's 9.
82
+
*`--batch_size`: Batch size for model training. If not specified, it's calculated based on go board size.
83
+
Use the `--help` or `-h` flag to get a full list of possible arguments. Besides all these arguments, other parameters about RL pipeline and DualNet model can be found and configured in [model_params.py](model_params.py).
84
+
85
+
Suppose the base directory argument `base_dir` is `$HOME/minigo/` and we use 9 as the `board_size`. After model training, the following directories are created to store models and game data:
86
+
87
+
$HOME/minigo # base directory
88
+
│
89
+
├── 9_size # directory for 9x9 board size
90
+
│ │
91
+
│ ├── data
92
+
│ │ ├── holdout # holdout data for model validation
93
+
│ │ ├── selfplay # data generated by selfplay of each model
94
+
│ │ └── training_chunks # gatherd tf_examples for model training
95
+
│ │
96
+
│ ├── estimator_model_dir # estimator working directory
97
+
│ │
98
+
│ ├── trained_models # all the trained models
99
+
│ │
100
+
│ └── sgf # sgf (smart go files) folder
101
+
│ ├── 000000-bootstrap # model name
102
+
│ │ ├── clean # clean sgf files of model selfplay
103
+
│ │ └── full # full sgf files of model selfplay
104
+
│ ├── ...
105
+
│ └── evaluate # clean sgf files of model evaluation
106
+
│
107
+
└── ...
75
108
76
109
## Validating Model
77
-
Run `minigo.py`with `--validation` argument
110
+
To validate the trained model, issue the following command with `--validation` argument:
78
111
```
79
-
python minigo.py --validation
80
-
```
81
-
The `--validation` argument is to generate holdout dataset for model validation
The `--evaluation` argument is to invoke the evaluation between the latest model and the current best model.
89
-
90
-
## Testing Pipeline
91
-
As the whole RL pipeline may takes hours to train even for a 9x9 board size, we provide a dummy model with a `--debug` mode for testing purpose.
92
114
93
-
Run `minigo.py` with `--debug` argument
94
-
```
95
-
python minigo.py --debug
96
-
```
97
-
The `--debug` argument is for testing purpose with a dummy model.
115
+
## Evaluating Models
116
+
The performance of two models are compared with evaluation step. Given two models, one plays black and the other plays white. They play several games (# of games can be configured by parameter `eval_games` in [model_params.py](model_params.py)), and the one wins by a margin of 55% will be the winner.
98
117
99
-
Validation and evaluation can also be tested with the dummy model by combing their corresponding arguments with `--debug`.
100
-
To test validation, run the following commands:
101
-
```
102
-
python minigo.py --debug --validation
103
-
```
104
-
To test evaluation, run the following commands:
118
+
To include the evaluation step in the RL pipeline, `--evaluation` argument can be specified to compare the performance of the `current_trained_model` and the `best_model_so_far`. The winner is used to update `best_model_so_far`. Run the following command to include evaluation step in the pipeline:
105
119
```
106
-
python minigo.py --debug --evaluation
107
-
```
108
-
To test both validation and evaluation, run the following commands:
As the whole RL pipeline may take hours to train even for a 9x9 board size, a `--test` argument is provided to test the pipeline quickly with a dummy neural network model.
125
+
126
+
To test the RL pipeline with a dummy model, issue the following command:
Self-play only option is provided to run selfplay step individually to generate training data in parallel. Issue the following command to run selfplay only with the latest trained model:
133
+
```
134
+
python minigo.py --selfplay
135
+
```
136
+
Other optional arguments:
137
+
*`--selfplay_model_name`: The name of the model used for selfplay only. If not specified, the latest trained model will be used for selfplay.
138
+
*`--selfplay_max_games`: The maximum number of games selfplay is required to generate. If not specified, the default parameter `max_games_per_generation` is used.
0 commit comments