DeepMind has trained AlphaStar Final, an AI agent who has reached the Grandmaster level at rts StarCraft II and outperforms 99.8 percent of all active players of the game. The agent played anonymously online via Blizzard’s Battle.net platform.
DeepMind used a combination of learning through reinforcement self-play, multi-agent game play through its own League, and imitating human strategies to make AlphaStar a Grandmaster, the highest level achievable on the StarCraft II ladder. The agent succeeded with the Terran, Zerg and Protoss, the three factions in the game.
Google’s subsidiary reports that the results provide strong evidence that these types of general learning techniques can be used to make AI systems suitable for work in complex, dynamic, multi-subject environments. In addition, the company expects the advances to help make artificial intelligence safer and more robust.
AlphaStar Final played unmodified StarCraft II matches based on a field of view similar to that of humans and with restrictions to bring the action rate to that of human players. That speed of response is one of the qualities that makes it easy for machines to outperform humans. Professional StarCraft II players have helped DeepMind create the constraints that lead to balanced conditions. The number of actions per minute was then limited to a maximum of 22 per 5 seconds. Furthermore, AlphaStar can only perform an action after 110ms after a frame has been observed and can react delayed to unexpected situations because agents decide ahead of time where to observe.
DeepMind reports that it is difficult for AI agents to come up with winning strategies, as they can perform more than 1026 possible actions at any point in the game. In addition, the standard learning techniques have their own drawbacks. For example, learning by playing against oneself can lead to “forgetfulness,” in which an agent falls into a loop of recurring, winning strategies without learning anything new.
Playing against oneself based on a random mix of previous strategies, called fictitious self-play, can help with this. However, playing just to win can be limiting in and of itself, claims DeepMind. That’s why the company developed exploit agents that play purely to expose the vulnerabilities of another agent. Furthermore, AlphaStar became smarter by learning based on imitation. The artificial intelligence examined the use of strategies of human opponents in the matches against itself, in which analysis of opening moves played a role, among other things.
Professional player Dario ‘TLO’ Wuensch says he didn’t feel like he played against a superhuman opponent, and Diego ‘Kelazhur’ Schwimer adds that playing against the AI leads to very unusual gameplay and the agent has his own playstyles and strategies .
The finding that AlphaStar can rank among StarCraft II’s top players in real online game rankings follows a demonstration in January this year when a Team Liquid professional player lost five games but won a live match. That demonstration was criticized because the comparisons of actions per minute between humans and AI would be unfair. The restrictions for AlphaStar have been adjusted accordingly.
DeepMind publishes the results and test in an article and on Nature titled Grandmaster level in StarCraft II using multi-agent reinforcement learning. The company also makes all AlphaStar replays available.