DeepMind: Bots beat humans in Quake III Arena’s ‘capture the flag mode’

Spread the love

DeepMind researchers have published a paper showing some results about the bots they’ve trained to learn to play Quake III Arena entirely on their own. The AI ​​proves better at playing Capture the Flag mode than human players.

DeepMind writes that the agents have learned to play capture the flag to a very high standard. In doing so, they respond to the dimensions of the map played and the number of team members. The results show that the bots have very fast reaction times and are very accurate in tagging, sending opponents back to their starting point. For the agents, that precision was 80 percent; in humans at 48 percent. The researchers think that the good performance of the agents is related to this.

The AI’s superior performance also comes from faster visual processing and motion control. However, that was not the only reason for the success, as it turned out after the precision was limited and the reaction time increased. Bots with a deliberately implemented extra delay of 267ms also still outperformed humans. With that increase, the response time came to 500ms, a comparable time to human players. Strong human players won an average of 21 percent of the games, and intermediate level players won only 12 percent of the games.

Human players had little to say against the AI ​​on maps none of the players or bots had ever seen before. A team of two people managed to capture an average of sixteen flags less per game than a team of two agents. Only in a team of both humans and bots did the humans beat a team with only bots. The researchers conclude from this that trained agents are probably well able to collaborate with teammates they have not ‘seen’ before, such as people. Even a team of two professional gamers with full communication between them managed to win only a quarter of the matches against the AI ​​after twelve hours of practice.

Incidentally, the agents are called ‘FTW agents’ by the researchers, which stands for for the win. This refers to the training architecture of the bots, in which recurrent neural networks on slow and fast time planes are used in combination with a points system. The game points are translated into internal rewards. Each agent learned their own internal reward signal, which allows them to set their own internal goals, such as capturing the flag. This involves a two-pronged optimization process to optimize those internal rewards for winning. By applying reinforcement learning, the agents were taught through trial and error to perform actions in order to get a maximum reward. A total of thirty different agents have been trained with a total of 450,000 matches.

The research is published in the scientific journal of Science, under the title Human-level performance in 3D multiplayer games with population-based reinforcement learning. DeepMind previously published a paper on the training of the bots.

You might also like