DeepMind model MuZero can learn visual Atari games without knowing any rules

Google has taken a new step with DeepMind’s deep learning algorithm. A new variant of it called MuZero can learn games not only by playing them a lot, but also without knowing the rules beforehand. MuZero can even play visual games.

MuZero is a new AI of Google’s machine learning algorithm created by subsidiary DeepMind. The algorithm is a spiritual successor to AlphaGo and AlphaZero, which learned games like Go and chess to beat world champions. MuZero can learn chess and Go, as well as more complex visual games from Atari. In addition, Google says that MuZero can learn the rules of the game itself by trying certain strategies.

According to Google’s scientists, MuZero uses a model-based planning model, as opposed to a lookahead search. With the latter, an AI makes a decision based on possible outcomes of decisions, and that is the model that AlphaGo and AlphaZero also use. According to the researchers, algorithms based on such decision trees work especially well based on pre-sorted models with defined rules. Games such as Chess and Go have such rules, which is why AlphaGo and AlphaZero are so good at it. Therefore, the algorithm must have received training data about the problem to be solved in advance.

In the ‘real world’, according to the researchers, problems do not have such defined rules. That is why MuZero uses model based planning, but in its own limited way. The AI ​​first makes a model of an environment and the possible actions, in order to make a choice about the best next step based on that.

This can still be done in framed environments such as a game like Go, but in visual environments such as a computer game it becomes more difficult because there are so many different aspects to take into account. “MuZero uses a different approach to get over those kinds of limits,” the scientists write. “Instead of modeling a complete environment, MuZero creates a model based only on the aspects that are important to the decision-making process.” The AI ​​specifically looks at the value of the current position, the value calculation of what the best action is to perform, and then a value of the result of the previous action. In this way, MuZero can also work in an environment where it does not know in advance what the parameters and restrictions are.

The researchers then unleashed MuZero on some of Atari’s visual games, including Ms Pac-Man. There, the AI ​​itself had to learn what the best actions were to take. The result, according to the researchers, is that the more workouts MuZero can perform itself, the smarter the AI ​​can play the game. The researchers got MuZero to play 57 Atari games, including Defender, Alien, Space Invaders and Yars Revenge, according to the temporary paper that was published last year.

Learning curves of MuZero in Atari across a selection of games. Total reward is on the y-axis, the millions of training steps on the x-axis. Line marks the average score of 1000 evaluation games, colored regions the standard deviation.