DeepMind algorithm learns top-level chess, shogi and go . in hours

Spread the love

DeepMind has developed the AlphaZero algorithm that can learn games purely based on the rules of the game without any prior knowledge. In a few hours, the program can outperform a human or the best program yet.

In chess, AlphaZero managed to outperform Stockfish 8 after 4 hours of training, or 400,000 steps. The open source software Stockfish is known as the best chess program of the moment. Out of 100 games, AlphaZero won 28 and lost 0, the rest ended in a draw. AlphaZero does its job much more efficiently: the program finds 80,000 positions per second in chess, against the 70 million of Stockfish. DeepMind’s program focuses only on the most promising moves, which the team behind the program says is “more human.”

But the software is not only fast in learning chess from scratch. The developers also tested the properties on the Japanese chess game shogi. It took 110,000 steps, or 2 hours, to reach the level of the shogi program Elmo. AlphaZero won 90 matches, with a loss in 8 matches.

Also in go AlphaZero makes itself lord and master in a short time. The training time here is 8 hours, equal to 165,000 steps, which is a bit longer, but after this the program is better than the existing algorithms AlphaGo Lee and AlphaGo Zero, also from DeepMind. AlphaGo Zero is the improved version of AlphaGo, with which DeepMind defeated the champion Lee Sedol. AlphaGo Zero also does not require any prior knowledge of the game programmed by humans.

Even after a training time of three days, against AlphaZero’s 8 hours, AlphaGo Zero lost out in 60 out of 100 go games. The Zero variant won 40 times. During the training, AlphaZero was able to deploy 5000 tpu’s of the first generation and 64 of those units of the second generation. TPU stands for tensor processing unit. These chips were developed by Google to handle deep learning computations. The trained algorithm eventually ran on a system with four tpu’s.

British AI company DeepMind, part of Google since 2014, publishes the results of its research in an article called Mastering Chess and Shogi by Self-Play with a General Reinforcement Learing Algorithm.

You might also like