Researchers at the Massachusetts Institute of Technology (MIT) have developed a technique to train multipurpose robots more effectively by combining various sources of data using generative AI models, specifically diffusion models. This approach aims to address the challenge of integrating disparate datasets from different domains such as simulations, robotic teleoperation, and human demonstrations.
Traditionally, robotic datasets vary widely in modalities and environments, making it difficult to incorporate them into a single machine-learning model. As a result, robots trained with limited task-specific data often struggle with new tasks in unfamiliar settings.
The new technique, known as Policy Composition (PoCo), involves training separate diffusion models on individual datasets to learn specific strategies or policies for completing tasks. These individual policies are then combined into a general policy that enables the robot to perform multiple tasks across different environments. Diffusion models are generative AI models that refine their output iteratively, and in this context, they generate trajectories for robots rather than images.
MIT’s approach has demonstrated a 20 percent improvement in task performance compared to baseline methods, both in simulations and real-world experiments. The technique allows for the combination of policies trained on various datasets, such as those from human demonstrations and robotic teleoperation, enabling the robot to adapt to new tasks it did not encounter during training. This flexibility means that additional data from new modalities or domains can be incorporated by training a new Diffusion Policy, without starting the process from scratch.
The research, which builds on previous work on Diffusion Policy by MIT, Columbia University, and the Toyota Research Institute, highlights the potential for combining policies to achieve superior results. For example, a policy trained on real-world data might offer greater dexterity, while one trained on simulations might provide better generalization. The team tested PoCo on tasks such as using a hammer to pound a nail and flipping an object with a spatula, noting significant improvements in performance.
Looking ahead, the researchers plan to apply this technique to longer-horizon tasks where robots use multiple tools sequentially and to incorporate larger robotics datasets to further enhance performance. This work represents a step forward in the quest to effectively combine internet data, simulation data, and real robot data for improved robotics capabilities. The findings will be presented at the Robotics: Science and Systems Conference.
Image credit: MIT