





Non-automatic. First generate the required actions for export, then read the actions to save the time needed to review the generation each time.
Update completed, adjusted to fully automatic.
2025/6/19 Update High-definition direct output 1920*1080
2025/7/1 Update Face description enhancement (choose whether to wear glasses)
2025/7/7 Update
Dance duration (seconds)
Skip duration (seconds)
Frame rate (if you enter 16, you can try a single 10s)
Understanding intensity (optimize consistency, but may cause screen crashes)
Perfectly copy actions and music
Currently capable of running 150 frames, non-interpolated single generation, the highest number across all platforms.
The setting here is 5 seconds, 30 frames/second.
Operation steps (must-read)
Video version (tutorial guaranteed to teach you everything)
https://www.bilibili.com/video/BV18LM1zEENN/
Text version (read carefully, if unclear, watch the video)
First generate the required actions for export, then read the actions to save the time needed to review the generation each time.
1. Insert a reference video.2. Enable the ignored DW posture actions and video synthesis
3. Right-click on the video synthesis to execute the node
4. Save the generated skeleton video and ignore the DW posture and video merging
Second stage
1. Insert the task reference image
2. Place the saved skeleton video into the bottommost video loader
3. Run and wait. If 24G keeps exhausting memory, try 48G
Below are some tips to help you generate better results.
1. Reference sample of the character image. Avoid being too compact; leave some space around the character.
2. Reference video is recommended to focus on the upper body, clearly showing limb movements for better capture and positioning.
3. The final merged output, if ignored, can be controlled to around 10 minutes per clip. 150 frames.
Below are test data
Reducing frame count can improve other performance metrics.
Separate video processing
150 frames. Generated video takes 8-13 minutes per instance.
Separate action processing
150 frames. Action processing takes 6-7 minutes per instance.
24G can run directly.
Non-automatic. First generate the required actions for export, then read the actions to save the time needed to review the generation each time.
Update completed, adjusted to fully automatic.
2025/6/19 Update High-definition direct output 1920*1080
2025/7/1 Update Face description enhancement (choose whether to wear glasses)
2025/7/7 Update
Dance duration (seconds)
Skip duration (seconds)
Frame rate (if you enter 16, you can try a single 10s)
Understanding intensity (optimize consistency, but may cause screen crashes)
Perfectly copy actions and music
Currently capable of running 150 frames, non-interpolated single generation, the highest number across all platforms.
The setting here is 5 seconds, 30 frames/second.
Operation steps (must-read)
Video version (tutorial guaranteed to teach you everything)
https://www.bilibili.com/video/BV18LM1zEENN/
Text version (read carefully, if unclear, watch the video)
First generate the required actions for export, then read the actions to save the time needed to review the generation each time.
1. Insert a reference video.2. Enable the ignored DW posture actions and video synthesis
3. Right-click on the video synthesis to execute the node
4. Save the generated skeleton video and ignore the DW posture and video merging
Second stage
1. Insert the task reference image
2. Place the saved skeleton video into the bottommost video loader
3. Run and wait. If 24G keeps exhausting memory, try 48G
Below are some tips to help you generate better results.
1. Reference sample of the character image. Avoid being too compact; leave some space around the character.
2. Reference video is recommended to focus on the upper body, clearly showing limb movements for better capture and positioning.
3. The final merged output, if ignored, can be controlled to around 10 minutes per clip. 150 frames.
Below are test data
Reducing frame count can improve other performance metrics.
Separate video processing
150 frames. Generated video takes 8-13 minutes per instance.
Separate action processing
150 frames. Action processing takes 6-7 minutes per instance.
24G can run directly.