ipcoder
diff --git a/‎research/minigo/README.md
Lines changed: 69 additions & 45 deletions b/‎research/minigo/README.md
Lines changed: 69 additions & 45 deletions
diff --git a/‎research/minigo/dualnet.py
Lines changed: 5 additions & 5 deletions b/‎research/minigo/dualnet.py
Lines changed: 5 additions & 5 deletions
diff --git a/‎research/minigo/gtp_wrapper.py
Lines changed: 3 additions & 4 deletions b/‎research/minigo/gtp_wrapper.py
Lines changed: 3 additions & 4 deletions
@@ -1,19 +1,24 @@
 # MiniGo
 This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
 
-MiniGo is a minimalist Go engine modeled after AlphaGo Zero, built on MuGo. The current implementation consists of three main modules: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently the **model** part is our focus.
+MiniGo is a minimalist Go engine modeled after AlphaGo Zero, ["Mastering the Game of Go without Human
+Knowledge"](https://www.nature.com/articles/nature24270). An useful one-diagram overview of Alphago Zero can be found in the [cheat sheet](https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0).
 
-This implementation maintains the features of model training and validation, and also provides evaluation of two Go models.
+The implementation of MiniGo consists of three main components: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently, the **DualNet model** is our focus.
 
 
-## DualNet Model
+## DualNet Architecture
+DualNet is the neural network used in MiniGo. It's based on residual blocks with two heads output. Following is a brief overview of the DualNet architecture.
+
+### Input Features
 The input to the neural network is a [board_size * board_size * 17] image stack
 comprising 17 binary feature planes. 8 feature planes consist of binary values
 indicating the presence of the current player's stones; A further 8 feature
 planes represent the corresponding features for the opponent's stones; The final
 feature plane represents the color to play, and has a constant value of either 1
-if black is to play or 0 if white to play. Check `features.py` for more details.
+if black is to play or 0 if white to play. Check [features.py](features.py) for more details.
 
+### Neural Network Structure
 In MiniGo implementation, the input features are processed by a residual tower
 that consists of a single convolutional block followed by either 9 or 19
 residual blocks.
@@ -31,8 +36,9 @@ Each residual block applies the following modules sequentially to its input:
   6. A skip connection that adds the input to the block
   7. A rectifier non-linearity
 
-Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
+Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size in MiniGo implementation.
 
+### Dual Heads Output
 The output of the residual tower is passed into two separate "heads" for
 computing the policy and value respectively. The policy head applies the
 following modules:
@@ -51,64 +57,82 @@ The value head applies the following modules:
   6. A fully connected linear layer to a scalar
   7. A tanh non-linearity outputting a scalar in the range [-1, 1]
 
-The overall network depth, in the 10 or 20 block network, is 19 or 39
+In MiniGo, the overall network depth, in the 10 or 20 block network, is 19 or 39
 parameterized layers respectively for the residual tower, plus an additional 2
 layers for the policy head and 3 layers for the value head.
 
 ## Getting Started
 This project assumes you have virtualenv, TensorFlow (>= 1.5) and two other Go-related
 packages pygtp(>=0.4) and sgf (==0.5).
 
-
 ## Training Model
-One iteration of reinforcement learning consists of the following steps:
- - Bootstrap: initializes a random model
- - Selfplay: plays games with the latest model, producing data used for training
+One iteration of reinforcement learning (RL) consists of the following steps:
+ - Bootstrap: initializes a random DualNet model. If the estimator directory has exist, the model is initialized with the last checkpoint.
+ - Selfplay: plays games with the latest model or the best model so far identified by evaluation, producing data used for training
  - Gather: groups games played with the same model into larger files of tfexamples.
- - Train: trains a new model with the selfplay results from the most recent N
-   generations.
+ - Train: trains a new model with the selfplay results from the most recent N generations.
 
- Run `minigo.py`.
+To run the RL pipeline, issue the following command:
  ```
- python minigo.py
+ python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256
  ```
+ Arguments:
+   * `--base_dir`: Base directory for MiniGo data and models. If not specified, it's set as /tmp/minigo/ by default.
+   * `--board_size`: Go board size. It can be either 9 or 19. By default, it's 9.
+   * `--batch_size`: Batch size for model training. If not specified, it's calculated based on go board size.
+ Use the `--help` or `-h` flag to get a full list of possible arguments. Besides all these arguments, other parameters about RL pipeline and DualNet model can be found and configured in [model_params.py](model_params.py).
+
+Suppose the base directory argument `base_dir` is `$HOME/minigo/` and we use 9 as the `board_size`. After model training, the following directories are created to store models and game data:
+
+    $HOME/minigo                  # base directory
+    │
+    ├── 9_size                    # directory for 9x9 board size
+    │   │
+    │   ├── data
+    │   │   ├── holdout           # holdout data for model validation
+    │   │   ├── selfplay          # data generated by selfplay of each model
+    │   │   └── training_chunks   # gatherd tf_examples for model training
+    │   │
+    │   ├── estimator_model_dir   # estimator working directory
+    │   │
+    │   ├── trained_models        # all the trained models
+    │   │
+    │   └── sgf                   # sgf (smart go files) folder
+    │       ├── 000000-bootstrap  # model name
+    │       │      ├── clean      # clean sgf files of model selfplay
+    │       │      └── full       # full sgf files of model selfplay
+    │       ├── ...
+    │       └── evaluate          # clean sgf files of model evaluation
+    │
+    └── ...
 
 ## Validating Model
- Run `minigo.py` with `--validation` argument
+To validate the trained model, issue the following command with `--validation` argument:
  ```
- python minigo.py --validation
- ```
- The `--validation` argument is to generate holdout dataset for model validation
-
-## Evaluating MiniGo Models
- Run `minigo.py` with `--evaluation` argument
+ python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --validation
  ```
- python minigo.py --evaluation
- ```
- The `--evaluation` argument is to invoke the evaluation between the latest model and the current best model.
-
-## Testing Pipeline
-As the whole RL pipeline may takes hours to train even for a 9x9 board size, we provide a dummy model with a `--debug` mode for testing purpose.
 
- Run `minigo.py` with `--debug` argument
- ```
- python minigo.py --debug
- ```
- The `--debug` argument is for testing purpose with a dummy model.
+## Evaluating Models
+The performance of two models are compared with evaluation step. Given two models, one plays black and the other plays white. They play several games (# of games can be configured by parameter `eval_games` in [model_params.py](model_params.py)), and the one wins by a margin of 55% will be the winner.
 
-Validation and evaluation can also be tested with the dummy model by combing their corresponding arguments with `--debug`.
-To test validation, run the following commands:
- ```
- python minigo.py --debug --validation
- ```
-To test evaluation, run the following commands:
+To include the evaluation step in the RL pipeline, `--evaluation` argument can be specified to compare the performance of the `current_trained_model` and the `best_model_so_far`. The winner is used to update `best_model_so_far`. Run the following command to include evaluation step in the pipeline:
  ```
- python minigo.py --debug --evaluation
- ```
-To test both validation and evaluation, run the following commands:
- ```
- python minigo.py --debug --validation --evaluation
+ python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --evaluation
  ```
 
-## MCTS and Go features (TODO)
-Code clean up on MCTS and Go features.
+## Testing Pipeline
+As the whole RL pipeline may take hours to train even for a 9x9 board size, a `--test` argument is provided to test the pipeline quickly with a dummy neural network model.
+
+To test the RL pipeline with a dummy model, issue the following command:
+```
+ python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --test
+```
+
+## Running Self-play Only
+Self-play only option is provided to run selfplay step individually to generate training data in parallel. Issue the following command to run selfplay only with the latest trained model:
+```
+ python minigo.py --selfplay
+```
+Other optional arguments:
+   * `--selfplay_model_name`: The name of the model used for selfplay only. If not specified, the latest trained model will be used for selfplay.
+   * `--selfplay_max_games`: The maximum number of games selfplay is required to generate. If not specified, the default parameter `max_games_per_generation` is used.
@@ -191,24 +191,24 @@ def export_model(working_dir, model_path):
     tf.gfile.Copy(filename, destination_path)
 
 
-def train(working_dir, tf_records, generation_num, params):
+def train(working_dir, tf_records, generation, params):
   """Train the model for a specific generation.
 
   Args:
     working_dir: The model working directory to save model parameters,
       drop logs, checkpoints, and so on.
     tf_records: A list of tf_record filenames for training input.
-    generation_num: The generation to be trained.
+    generation: The generation to be trained.
     params: hyperparams of the model.
 
   Raises:
-    ValueError: if generation_num is not greater than 0.
+    ValueError: if generation is not greater than 0.
   """
-  if generation_num <= 0:
+  if generation <= 0:
     raise ValueError('Model 0 is random weights')
   estimator = tf.estimator.Estimator(
       dualnet_model.model_fn, model_dir=working_dir, params=params)
-  max_steps = (generation_num * params.examples_per_generation
+  max_steps = (generation * params.examples_per_generation
                // params.batch_size)
   profiler_hook = tf.train.ProfilerHook(output_dir=working_dir, save_secs=600)
 
 
@@ -49,8 +49,7 @@ def __init__(self, board_size):
   def set_size(self, n):
     if n != self.board_size:
       raise ValueError((
-          '''Can't handle boardsize {n}!Restart with env var BOARD_SIZE={n}'''
-          ).format(n=n))
+          "Can't handle boardsize {}! Please check the board size.").format(n))
 
   def set_komi(self, komi):
     self.komi = komi
@@ -75,7 +74,7 @@ def accomodate_out_of_turn(self, color):
       self.position.flip_playerturn(mutate=True)
 
   def make_move(self, color, vertex):
-    c = coords.from_pygtp(vertex)
+    c = coords.from_pygtp(self.board_size, vertex)
     # let's assume this never happens for now.
     # self.accomodate_out_of_turn(color)
     return self.play_move(c)
@@ -85,7 +84,7 @@ def get_move(self, color):
     move = self.suggest_move(self.position)
     if self.should_resign():
       return gtp.RESIGN
-    return coords.to_pygtp(move)
+    return coords.to_pygtp(self.board_size, move)
 
   def final_score(self):
     return self.position.result_string()