Researchers at MIT, Harvard, and Cornell University have found that despite their advanced capabilities, large language models (LLMs) do not construct accurate internal representations, or ‘world models’, that capture real-world structure and rules, which can lead to unexpected errors in similar tasks.
The study, led by Ashesh Rambachan, an assistant professor at MIT’s Laboratory for Information and Decision Systems (LIDS), challenges the assumption that LLMs like GPT-4 possess an understanding of the world based on their complex outputs, such as composing poetry or providing navigation directions.
In the study, researchers tested a transformer-based LLM on tasks involving city navigation and board game rules. While the model could offer nearly perfect driving directions for New York City, performance dropped significantly when minor changes, such as closed streets or added detours, were introduced. Investigations revealed that the model’s internal map of New York included fictional elements like non-existent streets and erroneous connections, indicating a flawed representation of the actual city layout.
The research introduced two new metrics to evaluate whether transformers form accurate world models. The first metric, sequence distinction, assesses if a model can recognize distinct states (such as different board configurations in a game of Othello). The second, sequence compression, evaluates whether the model can recognize identical states with similar possible next steps. Applying these metrics to navigation and game tasks, the team observed that transformers trained with random sequences performed better in forming world models than those trained on structured data, likely due to exposure to a broader set of scenarios.
The findings suggest that LLMs may excel at generating plausible outputs without a true grasp of the underlying rules. Rambachan emphasized the importance of critically examining these models’ capabilities before applying them to scientific research or other areas requiring precise rule-following. Going forward, the researchers plan to expand their studies to tasks involving partially understood rules and real-world scientific problems, aiming to develop models that can more accurately represent structured knowledge.
The research is supported by the Harvard Data Science Initiative, the National Science Foundation, the Vannevar Bush Faculty Fellowship, the Simons Collaboration, and the MacArthur Foundation. The work will be presented at the Conference on Neural Information Processing Systems.