Home Bots & Brains New Framework for Vision-Language Navigation in Legged Robots

New Framework for Vision-Language Navigation in Legged Robots

by Pieter Werner

Researchers from UC San Diego, USC, and NVIDIA have introduced NaVILA, a Vision-Language-Action (VLA) framework designed to enable legged robots to interpret natural language commands and navigate complex environments. This system integrates high-level language processing with precise motion control, aiming to address challenges in robotic navigation across varied terrains.

Legged robots are well-suited for navigating cluttered and uneven terrains, but translating human language into precise actions has been a persistent challenge. NaVILA addresses this by introducing a two-level system. The high-level component processes visual and language inputs to produce mid-level instructions, such as “move forward 75 centimeters.” These instructions are then executed by a low-level locomotion policy that translates them into precise joint movements. This modular design allows NaVILA to be applied across different robotic platforms.

A distinctive feature of NaVILA is its dual-frequency operation. The high-level visual reasoning operates at a slower rate, generating navigation commands based on visual input and language instructions. Simultaneously, the low-level locomotion policy operates in real time, enabling dynamic obstacle avoidance and environment adaptation. This design facilitates a balance between high-level planning and immediate execution.

The NaVILA system was trained using a diverse set of data sources, including real-world navigation videos, simulated environments, and datasets focused on vision-language tasks. This approach enhances the model’s ability to generalize across environments and tasks.

In tests conducted in simulated environments, NaVILA showed improved performance over existing systems. On the newly developed VLN-CE-Isaac benchmark, designed specifically for legged robots, it achieved up to 14% higher success rates compared to policies that rely solely on proprioception. Real-world evaluations demonstrated an 88% success rate across a variety of tasks, including those requiring navigation of complex, multi-step instructions. NaVILA was tested in settings ranging from indoor spaces to outdoor environments and performed effectively across these scenarios.

 

Misschien vind je deze berichten ook interessant