Researchers at the University of Tokyo have developed a humanoid robot, Alter3, capable of generating spontaneous motion using the Large Language Model (LLM) GPT-4. The project, led by Takahide Yoshida, Atsushi Masumori, and Takashi Ikegami, is a collaboration between the university’s Department of General Systems Science and Alternative Machine Inc.
By integrating GPT-4 with Alter3, the researchers have enabled the robot to generate movements based on verbal instructions. Unlike traditional robot control, which often requires detailed manual programming for each motion, Alter3 can map human actions described in natural language onto its body. This allows the robot to perform a variety of poses, such as taking a selfie or pretending to be a ghost, without explicit programming for each body part.
The system demonstrates zero-shot learning capabilities, where the robot can generate new movements directly from linguistic input without needing iterative adjustments. Alter3 can execute complex actions, such as mimicking the act of playing metal music or pretending to be a snake, based on verbal descriptions.
Previously, controlling such robots required the precise manual adjustment of multiple movement axes. With GPT-4’s integration, the robot can now respond to natural language protocols known as Chain of Thought (CoT), allowing it to perform actions without pre-defined learning cycles.
Linguistic Feedback for Motion Adjustment
Alter3 also incorporates linguistic feedback to refine its movements. Although it cannot visually assess its actions, users can provide verbal suggestions, such as asking the robot to raise its arm higher when performing a task like taking a selfie. The robot processes these instructions, adjusts its motion code, and stores the improved sequence in its motion memory for future reference.
The system is supported by a database that stores these revised motions as labeled actions, allowing the robot to retrieve them when necessary. Alter3’s ability to refine movements based on feedback provides a more efficient method of enhancing its performance over time.
The researchers evaluated the robot’s ability to generate movements by comparing nine different actions created using GPT-4 with control group videos featuring random movements. Participants rated the motions on a five-point scale, and the results showed that the actions generated by GPT-4 were rated higher in expressiveness than those in the control group.
Alter3 was able to perform both human-like actions, such as taking selfies and drinking tea, as well as non-human movements, like pretending to be a ghost or snake. The robot also demonstrated the ability to reflect emotional cues, such as joy or embarrassment, based on verbal instructions.
Potential Applications
The development of Alter3 presents possibilities for improving human-robot interaction, with potential uses in various fields, including entertainment, caregiving, and service industries. The research team noted that Alter3’s architecture could be adapted for other humanoid robots, potentially broadening its applicability. This work also contributes to ongoing discussions on the role of embodiment in large language models, showing that LLMs like GPT-4 can drive humanoid robots to generate a wide range of movements without the need for additional training.