Google shows AI model RT-2 that can convert commands into robot actions
Google Deepmind has shown the AI model Robotic Transformer 2, which should make it possible to translate language commands into specific actions for robots. The company calls this the first ‘vision-language-action’ model.
According to Google Like language models, RT-2 uses data from the internet to understand text commands and convert them into specific robot actions. The company says that training robots has so far been very complex and required billions of data points. Even with Google’s previous robot AI models, such as RT-1 and PaLM-E, the robot still needed a lot of specific data to carry out concrete actions.
In previous models, for example, in order for the robot to throw away waste, it had to be explicitly trained to identify, pick up and throw away waste. However, the new model could allow the robot to perform all such actions with a small amount of training data, even if it has never been explicitly trained for such actions. To do this, the model uses images from a large corpus of web data to understand when something can be labeled as ‘waste’.
Google tested the new robot model with ‘more than’ 6,000 different tasks. For tasks that explicitly appear in the training data, the new model performs just as well as the RT-1 model, Google says. But under new, unforeseen scenarios, the success rate would almost double; from 32 percent at RT-1 to 62 percent at RT-2. According to Google, a lot of work is still required before robots can be helpful in human-centered environments.
A pre-trained visual language model has been fine-tuned for the RT-2 model to work with robots and web data. The model receives photos from the robot and then gives it actions to perform.