OpenAI’s new o3 model has become the first AI system to surpass human performance on the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI), scoring 76% accuracy compared to the average human score of slightly above 75%. The achievement was confirmed in a test formally coordinated by OpenAI and François Chollet, the creator of ARC-AGI and a researcher at Google.
ARC-AGI is designed to evaluate an AI system’s ability to adapt to novel tasks, testing abstract reasoning and pattern recognition in a visual domain rather than through natural language processing. OpenAI’s o3 represents a departure from the architecture of previous GPT models, employing methods that allow it to handle these challenges with a new level of adaptability.
Chollet described o3’s performance as a “step-function increase in AI capabilities” and attributed its success to innovations in its underlying architecture, which he suggests involve sophisticated search processes during problem-solving. Despite its accomplishment, Chollet noted that o3’s results were achieved with the aid of training on ARC-related datasets, raising questions about how it might perform without such preparation.
While o3’s achievement is being hailed as a breakthrough in AI capabilities, Chollet and OpenAI emphasize that it does not represent the arrival of Artificial General Intelligence (AGI). The model still fails certain straightforward tasks within the ARC-AGI framework, underscoring its limitations compared to human cognition. OpenAI plans to release a “mini” version of o3 by the end of January 2025, with the full version to follow later.