





Not fully automatic. First, generate the required actions for export, then read the actions to save the time needed for review every time.
Update completed, adjusted to fully automatic.
2025/6/19 Update, HD direct output 1920*1080
Perfectly copy actions and music
Currently capable of running 150 frames, non-frame interpolation, single generation, the highest number on all platforms
This is set to 5 seconds, 30 frames per second
Steps to operate (must-read)
Video version (comprehensive teaching, guaranteed to learn)
https://www.bilibili.com/video/BV18LM1zEENN/
Text version (read carefully, if unclear, refer to the video)
First, generate the required actions for export, then read the actions to save the time needed for review every time,
1. Insert a reference video.2. Enable the ignored DW pose actions and video synthesis.
3. Right-click on the video synthesis to execute the node.
4. Save the generated skeletal video and ignore DW poses and video merging.
Second stage
1. Insert a task reference image.
2. Place the saved skeletal video into the bottom video loader.
3. Run and wait. If 24G consistently exceeds VRAM, try 48G.
The following are key points to help you achieve better generation results.
1. Reference images of characters. Avoid tight framing; leave some space around the character.
2. Reference videos are recommended to focus on half-body shots. Clearly visible body movements allow better capture and positioning.
3. The final merged output, if ignored, can be controlled within about 10 minutes per clip. 150 frames.
Below are test data
Reducing frame count can improve other performance metrics.
Processing video alone
150 frames. Generating video takes 8-13 minutes per cycle.
Processing actions alone
150 frames. Processing actions takes 6-7 minutes per cycle.
24G can run directly.
Not fully automatic. First, generate the required actions for export, then read the actions to save the time needed for review every time.
Update completed, adjusted to fully automatic.
2025/6/19 Update, HD direct output 1920*1080
Perfectly copy actions and music
Currently capable of running 150 frames, non-frame interpolation, single generation, the highest number on all platforms
This is set to 5 seconds, 30 frames per second
Steps to operate (must-read)
Video version (comprehensive teaching, guaranteed to learn)
https://www.bilibili.com/video/BV18LM1zEENN/
Text version (read carefully, if unclear, refer to the video)
First, generate the required actions for export, then read the actions to save the time needed for review every time,
1. Insert a reference video.2. Enable the ignored DW pose actions and video synthesis.
3. Right-click on the video synthesis to execute the node.
4. Save the generated skeletal video and ignore DW poses and video merging.
Second stage
1. Insert a task reference image.
2. Place the saved skeletal video into the bottom video loader.
3. Run and wait. If 24G consistently exceeds VRAM, try 48G.
The following are key points to help you achieve better generation results.
1. Reference images of characters. Avoid tight framing; leave some space around the character.
2. Reference videos are recommended to focus on half-body shots. Clearly visible body movements allow better capture and positioning.
3. The final merged output, if ignored, can be controlled within about 10 minutes per clip. 150 frames.
Below are test data
Reducing frame count can improve other performance metrics.
Processing video alone
150 frames. Generating video takes 8-13 minutes per cycle.
Processing actions alone
150 frames. Processing actions takes 6-7 minutes per cycle.
24G can run directly.