Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

19 April 2024

Estimated reading time: 2 minutes

About The Author

Enlarge / A sample image from Microsoft for “VASA-1: Lifelike Audio-Driven Talking Faces
Generated in Real Time.” (credit: Microsoft)

On Tuesday, Microsoft Research Asia unveiled VASA-1, an AI model that can create a synchronized animated video of a person talking or singing from a single photo and an existing audio track. In the future, it could power virtual avatars that render locally and don’t require video feeds—or allow anyone with similar tools to take a photo of a person found online and make them appear to say whatever they want.

“It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors,” reads the abstract of the accompanying research paper titled, “VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time.” It’s the work of Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, and Baining Guo.

The VASA framework (short for “Visual Affective Skills Animator”) uses machine learning to analyze a static image along with a speech audio clip. It is then able to generate a realistic video with precise facial expressions, head movements, and lip-syncing to the audio. It does not clone or simulate voices (like other Microsoft research) but relies on an existing audio input that could be specially recorded or spoken for a particular purpose.

Read 11 remaining paragraphs | Comments

About The Author

See author's posts

Post Views: 0

Discover more from Artificial Race!

Subscribe to get the latest posts sent to your email.

Leave a ReplyCancel reply

Related Stories

Nvidia and AMD rush to stockpile graphics cards ahead of Trump tariff that could raise prices by 40%

Denon’s DCD-3000NE CD Player is Going to Make SACD Fans Really Happy

This One UI 7 feature might make you want to only use Samsung devices

You may have missed

Microsoft’s potential answer to the PlayStation 5 DualSense controller revealed in patent

Inside an American Rare-Earth Boomtown

The U.S. Will Start Manufacturing Advanced Chips

AMD Radeon RX 9070 XT early details reveal up to 3.1 GHz boost clock and 330W TBP

About The Author

Like this:

Discover more from Artificial Race!

Leave a ReplyCancel reply

Related Stories

Nvidia and AMD rush to stockpile graphics cards ahead of Trump tariff that could raise prices by 40%

Denon’s DCD-3000NE CD Player is Going to Make SACD Fans Really Happy

This One UI 7 feature might make you want to only use Samsung devices

You may have missed

Microsoft’s potential answer to the PlayStation 5 DualSense controller revealed in patent

Inside an American Rare-Earth Boomtown

The U.S. Will Start Manufacturing Advanced Chips

AMD Radeon RX 9070 XT early details reveal up to 3.1 GHz boost clock and 330W TBP

Discover more from Artificial Race!