OpenAI has announced the introduction of Sora, an advanced text-to-video model. This innovative AI model is designed to generate videos up to a minute long, maintaining high visual quality and adhering closely to user prompts. Sora’s capabilities include generating complex scenes featuring multiple characters, specific types of motion, and detailed environments.
The model translates user prompts into videos that not only visually represent the request but also shows how these elements interact in the physical world. Despite its advanced features, Sora does have limitations. It may struggle with accurately simulating complex physical interactions and understanding specific cause-and-effect scenarios. For instance, it might not always correctly represent changes in objects over time, such as showing the aftermath of a cookie being bitten.
In terms of safety and ethical considerations, the model will undergo testing by red teamers. These experts, specializing in areas like misinformation and bias, will adversarially test the model to identify potential harms. OpenAI is also developing tools to detect misleading content generated by Sora, including a detection classifier and plans to incorporate C2PA metadata in future deployments.
To further ensure safety, OpenAI will employ existing safety methods developed for DALL·E 3, such as text and image classifiers to review and filter content that violates usage policies.
Sora is a diffusion model that starts with a video resembling static noise and gradually refines it into a clear, coherent video. This model can generate entire videos at once or extend existing ones, maintaining consistency in subjects even when they temporarily leave the view. SORA treats videos and images as collections of data patches, similar to tokens in GPT models, allowing for training on a diverse range of visual data.
Building upon research from DALL·E and GPT models, Sora employs the recaptioning technique from DALL·E 3 for more accurate adherence to text instructions in video generation. It can also animate still images or modify existing videos with remarkable detail and accuracy.