Last year I faked up my baby’s stuffed animal to make it look like his plush deer was on vacation.
It was an experiment to see if I could recreate the events depicted in the Gemini ad that Google was running, and I’d never shown Buddy the deer’s videos of his adventures to my four-year-old. But it was a revealing exercise that made me think a lot about the difference between some harmless fun with generative AI and complete regression. Maybe this Venn diagram is a perfect circle! Maybe not. But what I know for sure is that the tools to create realistic videos are surprisingly good, and require very little effort and knowledge. This trend continues in the Omni era in Gemini.
Omni is a new family of generative models that will one day be able to turn any type of input – images, video, text – into anything else. But for starters, it’s just creating a video. Omni Flash is the first of these models released by Google, and it’s now available in the company’s AI video creation and editing platform, Flow. You can still use the previous model, Veo, if you wish, but the Omni improves on Veo in several ways.
With Omni, you can upload a video and use it along with a text message as a starting point for AI-generated creativity. Google also claims that Omni incorporates more real-world knowledge when producing videos and can do a better job at keeping characters consistent throughout the video as a result. There was only one way to find out if these claims were really true: I brought AI Buddy to pack his AI-generated backpacks for another adventure.
The results are such a mixed bag that they are puzzling. Some of it was very good, and was more consistent and true to my claims than it had been when I was testing Veo five months earlier. But even the best clips Omni prepared for me still contain some AI jump scares, like when Buddy suddenly changes direction while skydiving.
In another video, I gave Omni some artistic freedom. “Create a montage of Buddy packing for vacation and heading off on a cruise ship for a tropical vacation. The mood is cute and fun. Buddy packs something funny in his suitcase that plays later in the clip.” It had Buddy packing a jar of honey; Later in the clip, he reaches for it as if it were a bottle of sunscreen. “Ah,” the character says as he sprinkles honey on his hoof.
Honestly, it’s not a bad bit. However, the honey bottle constantly changes throughout the video, from a jar to a clear spray bottle filled with water, then back to a squeeze bottle filled with honey. And I can’t even begin to describe how the model came up with the final frame of the video – almost as if he put together a bunch of elements of the sequence he’d just created.
You can use text prompts to suggest edits to your videos, and I’ll give Google credit: This works better with the Omni than it did when I tested the Veo 3. But the results were… bad With Veo – so bad that I found it easier to render a new video from scratch every time I wanted to change something. The Omni will take your adjustments into account, but the results aren’t always visible.
I had her emphasize Buddy’s facial reactions in his vacation clips, and the results ended up looking weird. He’ll also give Buddy horns from time to time, which he doesn’t have. Friend is childThank you very much. When I asked him to remove the horns that appeared in one scene, he obliged, and then added the horns in all the other scenes.
The thing is, none of this is free. Creating videos costs anywhere from 15 to 40 credits depending on the length of the scene and the “components” you start with. One round of modifications costs 40 credits. I have the $20/month AI Pro plan which comes with 1,000 points every month. After about 20 clips created and some tweaking, I reached 145 clips. If you have specific ideas about the video you want Omni to create, you may be looking at a lot of costly changes with the model to get a video that is close to your vision.
I can honestly say that I was not prepared for what I saw
One of Omni’s purported strengths is adding AI-generated objects to real videos, so I gave Buddy a break and did the deepfake myself. Starting with a selfie video with a neutral expression, I had Omni create videos of me eating a plate of spaghetti, sitting in an airplane seat, standing in front of the Eiffel Tower and taking a bite of baguette. And I can honestly say that I was not prepared for what I saw.
There are AI stories in my fake videos. The sound of the fork hitting the bowl of pasta is a little manufactured. There is a woman in the background of the plane video that appears twice. But aside from those little errors and the vaguely vague feel of it, it’s very compelling.
I showed my husband the pasta clip; He knew I was testing an AI video tool, but I didn’t tell him what the AI had generated in the scene. Without knowing what the AI had generated about it, he believed I was sitting in front of the camera eating pasta, and said his only clue that something was up was that the bowl looked unfamiliar. Eating pasta itself seemed real enough to convince him my husband. The guy who basically looked up to me in real life Every day for the past decade.
Other deepfakes have varying levels of “good enough to fool people on social media.” Some of the Eiffel Tower clips look a bit cartoony, but one is convincing enough that you may need to rewatch it a few times to be sure it’s AI. I I know it’s not me when the AI turns her head and reveals her hair pulled back into a ponytail. But I’m not sure anyone else would know the difference, and it makes me feel weird.
We are definitely deep in the uncanny valley
I’m a little exhausted by all of this, to be honest. When I tested the Veo 3 I was shocked by the realism it can produce. I’ve been shocked at how easy it is to photograph fake people in fake photos over and over again over the past few years. I’m probably shocked by the Omni too, and I think I am, but the edge has worn off.
It’s still not quite as easy to create an AI-generated cinematic masterpiece as Google would like you to believe. But Omni improves on Veo in some well-known ways. If you have a Google account and a credit card, you can take a video of yourself sitting at home and make it look like you’re on a trip to Maui with little effort. I don’t think we’re exactly on the “foothills of the singularity,” but we’re certainly deep in the uncanny valley.
All images and videos in this story were created by Google Gemini.