3d modeling

Samsung’s AI animates paintings and photos without 3D modeling

GamesBeat Summit 2022 returns with its biggest event for gaming leaders from April 26-28. Book your spot here!


Engineers and researchers from Samsung’s Artificial Intelligence Center in Moscow and the Skolkovo Institute of Science and Technology have created a model capable of generating realistic animated talking heads from images without resorting to traditional methods, such as 3D modeling.

Samsung opened AI research centers in Moscow, Cambridge and Toronto last year.

“In effect, the learned model serves as a realistic avatar of a person,” engineer Egor Zakharov said in a video explaining the results.

Well-known faces seen in the newspaper include Marilyn Monroe, Albert Einstein, Leonardo da Vinci’s Mona Lisa, and RZA of the Wu Tang Clan. The technology that focuses on synthesizing photorealistic head images and facial cues could be applied to video games, video conferencing or digital avatars like the kind now available on Samsung’s Galaxy S10. Facebook is also working on realistic avatars for its virtual reality initiatives.

Such technology could clearly also be used to create deepfakes.

Learning in a few shots means that the model can start animating a face using only a few images of an individual, or even a single image. Meta-training with the VoxCeleb2 video dataset is performed before the model can animate previously unseen faces.

During the training process, the system creates three neural networks: the embedded network maps images to vectors, a generator network maps facial landmarks in the synthesized video, and a discriminator network evaluates the realism and pose of the generated images.

“Basically, the system is able to initialize generator and discriminator parameters in a person-specific way so that training can be based on just a few images and done quickly, despite having to adjust tens of millions of parameters. We show that such an approach is capable of learning highly realistic and personalized talking head models of new people and even portraits,” the co-authors said in a summary of the paper on arXiv.

In other forms of AI recently developed to mimic human faces, researchers at the University of Washington shared last year how they created ObamaNet, a lip-syncing model based on Pix2Pix and trained on videos of former US president.

And last fall, researchers at the University of California, Berkeley presented a model that uses YouTube videos to train a data set of AIs to dance or do acrobatic moves, like backflips.

The GamesBeat creed when covering the video game industry is “where passion meets business”. What does it mean? We want to tell you how much the news means to you, not only as a decision maker in a game studio, but also as a game fan. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about and engage with the industry. Learn more