Input a photo with a portrait, which can be a real person or a cartoon

Then input a piece of audio, which can be a narration or singing

Combine them to generate a digital human or a cartoon digital human

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation