Google has recently demonstrated the capabilities of its new robotics model, RT-2, a one-armed robot with problem-solving abilities, powered by state-of-the-art artificial intelligence (AI) technology.
Until recently, robots struggled with handling unfamiliar objects and lacked the capacity to draw logical conclusions. The RT-2, however, has moved beyond these limitations, as demonstrated in a recent display where it was directed to “Pick up the extinct animal” among a set of plastic figurines and successfully identified and picked up a dinosaur.
This breakthrough is the result of Google integrating its robots with language models, the same kind of AI system that powers notable chatbots like ChatGPT and Bard. This infusion of AI has resulted in smarter robots with a heightened understanding and ability to solve problems.
The RT-2 model, recently unveiled, marks a step towards a significant transformation in the construction and programming of robots. This change has led to a reconsideration of Google’s entire research program. Vincent Vanhoucke, Google DeepMind’s head of robotics, stated that a lot of their previous work had been “entirely invalidated.”
Despite robots still falling short of human-level dexterity, Google’s application of AI language models to equip robots with new skills for reasoning and improvisation represents a promising advancement in robotics. Ken Goldberg, a robotics professor at the University of California, Berkeley, lauded the development, stating, “What’s very impressive is how it links semantics with robots.”
In the past, robots were programmed with a specific set of instructions to accomplish mechanical tasks, but this approach was slow and labor-intensive. However, the concept of using AI language models that have been trained on large swaths of internet text to learn new skills emerged. Google Research Scientist Karol Hausman mentioned that they began “connecting them to robots” about two years ago.
Google’s latest robotics model, RT-2, is referred to as a “vision-language-action” model, which can not only analyze the world around it but also instruct a robot on how to move. It translates the robot’s movements into a sequence of numbers, which are incorporated into the same training data as the language model. Just as chatbots can predict the next word in a sentence, RT-2 can guess how a robot’s arm should move to execute a particular action.
The robot is capable of performing impressive tasks, following complex instructions, and even making abstract connections between related concepts. However, it is not flawless, with occasional inaccuracies observed in object identification and comprehension.