First: Upload 2 images, ensure the face is fully visible without hand obstructions, then apply masks to the faces of the characters respectively.
Then: Upload audio of about 3 seconds, clone the voice, and input a line of dialogue.
Finally: Click run to generate 3 video effects, namely background, left, and right.
Import the videos into editing software, place the background on the bottom track, and place the left and right on the upper tracks. Apply linear mask processing to each, and finally edit them together.

Source of double cartoon character generation: https://www.runninghub.cn/ai detail/1910649876794015746