LazyZero

Before implementing vectorized AlphaZero, I built a simpler implementation called LazyZero. LazyZero forgoes Monte Carlo Tree Search due to its complexity, and instead only utilizes the PUCT algorithm at the root node. Exploration from the root node is done via n fixed-depth, policy-based rollouts of depth d.

LazyZero can be effective in stochastic environments like 2048, as a critical mass of policy rollouts will capture the environment's stochasticity. Needless to say, more sophisticated techniques such as stochastic AlphaZero are much more effective.

You can find the source code for LazyZero and LazyMCTS at:

Configuration Parameters

num_policy_rollouts: $n$, number of policy-based rollouts from the root node
rollout_depth: $d$, depth of each rollout
puct_coeff: $c_{puct}$ coefficient governing exploration, similar to AlphaZero
temperature: $\tau$, temperature applied to visit counts, similar to AlphaZero

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LazyZero

Configuration Parameters

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally