OpenAI launches Sora text-to-video AI model; Content creators in trouble?

Sora

OpenAI is rolling out a fresh video-generation model dubbed Sora. The AI firm claims Sora “can create realistic and imaginative scenes from text instructions.” This text-to-video model enables users to generate photorealistic videos up to a minute in length, all based on the prompts they’ve written.

According to OpenAI’s introductory blog post, Sora can produce “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.”

The company also notes that the model can understand how objects “exist in the physical world,” as well as “accurately interpret props and generate compelling characters that express vibrant emotions.”

The model can also produce a video from a single image, as well as fill in missing frames in an existing video or extend it. The Sora-generated demos showcased in OpenAI’s blog post feature an aerial view of California during the gold rush, a video that appears to be taken from inside a Tokyo train, and more.

Some of the videos have noticeable signs of AI, such as a floor that moves suspiciously in a museum video. OpenAI acknowledges that the model “may struggle with accurately simulating the physics of a complex scene,” but the overall results are quite impressive.

A few years back, it was text-to-image generators like Midjourney that were leading the way in models’ capability to transform words into images. However, video technology has been rapidly improving lately: companies like Runway and Pika have showcased impressive text-to-video models, and Google’s Lumiere is poised to be one of OpenAI’s main rivals in this field. Like Sora, Lumiere offers users text-to-video tools and also allows them to create videos from a single image.

Sora is currently only accessible to “red teamers” who are evaluating the model for potential harms and risks. OpenAI is also providing access to some visual artists, designers, and filmmakers to gather feedback. It points out that the current model might not accurately simulate the physics of a complex scene and may not properly interpret certain instances of cause and effect.

Earlier this month, OpenAI revealed it’s putting watermarks on its text-to-image tool DALL-E 3, but mentions that they can “easily be removed.” Similar to its other AI products, OpenAI will need to deal with the fallout of fake, AI-generated photorealistic videos being confused for the genuine article.

Rohan Sharma