Words, camera, action: Creating an AI music video with Unreal Engine 5

July 26, 2022

What if you could create an Akira-inspired cyberpunk film—using nothing but text? For the team at artist collective Sagans, this is a future worth pioneering. Using Unreal Engine’s MetaHuman technology together with AI algorithms, they recently generated most of the shots in their debut music video Coherence from written descriptions alone.

The result was a postmodern portrayal of a woman finding her place in a new city, drifting from a bright sakura tree to a New York subway through sketch-like drawings that constantly move to the accompanying music. Together, Sagans’ artists showed that AI didn’t need to be a frightening prospect designed to take over our jobs (or the world altogether). Instead, it could be the key to unlocking entirely new creative possibilities.

Disrupting the music industry

According to Aurelien, 3D Specialist at Sagans, Coherence represents exactly what the collective is trying to achieve. “We felt the music industry was afraid to take risks, unless they were profit-driven,” he explains. “That’s led to a drop in authentic content, which is what prompted us to start Sagans in the first place.”

Made up of musicians, graphic designers, and researchers, the Sagans collective didn’t need to wait to finish a song before starting on their first music video. This gave the team more time and freedom to explore how their vision could come to life through technologies like artificial intelligence and real-time graphics.

“The first technology we decided to use was Unreal Engine 5,” Aurelien continues. “We started by creating universes with the Quixel asset library, then very quickly adopted MetaHuman Creator for realistic characters. This changed everything for us. Suddenly, we could make something beautiful in very little time. We knew we were ready to start Coherence.”

Video courtesy of Sagans

Adding AI

First, the team decided to use Katsuhiro Otomo’s iconic Akira anime style as the basis of the video’s aesthetic. The next step was to create the heroine of the video. “Before working in Unreal Engine, we weren’t very familiar with the world of 3D,” says Aurelien. “From the outside, character creation always seemed to be a very complicated task. Luckily, MetaHuman Creator made things easy.”

Images courtesy of Sagans

To create someone that audiences could feel familiar with, the team blended elements of three faces from the MetaHuman database to find a result they liked. Next, Live Link Face app for iOS was used to animate the character’s facial expressions. At the same time, Sagans artists began using Quixel to build the video’s Unreal Engine environments, which involved everything from British streets stylized to look Japanese to a New York subway train.

Video courtesy of Sagans

“When it came to creating environments, Lumen was crucial,” Aurelien explains. “With it, we were able to work on lighting in real time, without using a lot of hardware resources. Together with Niagara, which we used for particles like rain, we knew we could generate a render that corresponded exactly to what we saw in our UI in just seconds.”

The last step was to run Unreal Engine output through three separate AI algorithms to create the final Otomo style, depending on the scene required. The first algorithm enabled Sagans artists to describe the style of scene in text, then see it appear as a video. The second algorithm took a reference image to produce an animated video, while the final algorithm turned live-action film into a stylized animated result.

“We used our text-to-video algorithm on the majority of the shots,” Aurelien adds. “For this, we used Disco Diffusion 5 to help generate individual frames, which the algorithm then turned into video. We had to ensure our text was extremely detailed, describing the style and filling any gaps: If you’re not mentioning the sky, the machine could simply forget that there had to be a sky on top of the city. Once we got a good first frame, we would then make sure there was enough description to generate the next frames depending on the camera moves we desired, then check the render one last time on the 20th frame.”

Image courtesy of Sagans

Unreal results

Thanks to their unique real-time 3D and AI workflow, it only took three Sagans creatives eight weeks to go from first shot to final render of Coherence. “We wanted a solution that was both simple to use and fast enough to get our vision realized very quickly, and Unreal Engine didn’t disappoint,” adds Aurelien. “Within a few clicks, we could easily create an extremely detailed character and city environment, and animate our heroine without a motion capture suit.”

Now that Coherence is complete, the team aims to continue using both Unreal Engine and AI for future music video projects. “It’s always nerve-wracking to release content for the first time, but the response to Coherence has been exceptional,” Aurelien concludes. “We want people to understand that artificial intelligence is there to assist and inspire us more than just take our jobs, and this video is our first step to making that happen.”

Get Unreal Engine today!

Get the world’s most open and advanced creation tool.
With every feature and full source code access included, Unreal Engine comes fully loaded out of the box.

Get started now