Researchers at Penn State have developed a novel method for training artificial intelligence (AI) visual systems inspired by early childhood visual experiences. This new approach, which incorporates spatial position data, has been shown to significantly enhance the efficiency and performance of AI models in identifying objects and navigating environments. The findings, published in the May issue of the journal *Patterns*, highlight the potential for advanced AI systems in exploring extreme environments or distant worlds.
The study draws parallels between the early visual experiences of children and AI training. In the first two years of life, children encounter a limited range of objects and faces, but they view them from multiple perspectives and under various lighting conditions. Leveraging this insight, the researchers devised a machine learning strategy that uses spatial information to train AI systems more effectively. AI models trained with this new method outperformed traditional models by up to 14.99%.
The team, led by Lizhen Zhu, a doctoral candidate in the College of Information Sciences and Technology at Penn State, introduced a contrastive learning algorithm. This self-supervised learning method enables AI to identify visual patterns by recognizing when two images are different versions of the same base image. Traditional algorithms often fail to recognize such images as related when viewed from different angles or lighting conditions. The new algorithm, however, incorporates environmental data such as location, which allows the AI to identify these images correctly despite variations in camera position, lighting, and focal length.
The researchers created virtual environments using the ThreeDWorld platform to simulate the visual experiences of infants. They set up environments named House14K, House100K, and Apartment14K, where ‘14K’ and ‘100K’ indicate the number of sample images taken. The new algorithm was tested against base models in these simulations, and it consistently outperformed the traditional methods. For instance, in recognizing rooms in a virtual apartment, the enhanced model achieved an average performance of 99.35%, a 14.99% improvement over the base model.
James Wang, a distinguished professor of information sciences and technology at Penn State and Zhu’s advisor, emphasized the importance of this work in creating more energy-efficient and flexible AI training methods. He noted that this approach could be particularly useful for autonomous robots operating in unfamiliar environments with limited resources. The team plans to refine their model further, incorporating more diverse environments to enhance its applicability.
Image provided by the research team, using tree photos (B) by Federico Adolfi and head bust photos (C) by James Z. Wang.