Skip to content

LazyZero

Jacob Marshall edited this page Jan 21, 2024 · 1 revision

Before implementing vectorized AlphaZero, I built a simpler implementation called LazyZero. LazyZero forgoes Monte Carlo Tree Search due to its complexity, and instead only utilizes the PUCT algorithm at the root node. Exploration from the root node is done via n fixed-depth, policy-based rollouts of depth d.

LazyZero can be effective in stochastic environments like 2048, as a critical mass of policy rollouts will capture the environment's stochasticity. Needless to say, more sophisticated techniques such as stochastic AlphaZero are much more effective.

You can find the source code for LazyZero and LazyMCTS at:

Configuration Parameters

  • num_policy_rollouts: $n$, number of policy-based rollouts from the root node
  • rollout_depth: $d$, depth of each rollout
  • puct_coeff: $c_{puct}$ coefficient governing exploration, similar to AlphaZero
  • temperature: $\tau$, temperature applied to visit counts, similar to AlphaZero
Clone this wiki locally