1. This workflow is the LTX2.3 audio-driven digital human workflow, offering multiple configurations to choose from (FP8 version, KJ version, GUFF version). The online version defaults to the FP8 version.

2. The best quality is still the original version. However, if using KJ's model (ltx 2.3 22b distilled_transformer_only_fp8_scaled), it is the optimal model for low configuration, fast, and high-quality performance.

3. It is strongly recommended to use landscape mode. Portrait mode can generate outputs but with less satisfactory results.

4. Audio-driven prompts can be simple, so when writing your own prompts, disconnect the prompt link. The default is to use automatic prompts, but you can also generate video prompts using prompt templates in other intelligent agents.

5. If I2V image generation is needed, turn on the switch.

6. Regarding resolution selection:

(1) LTX output video dimensions must be multiples of 32. To ensure a 16:9 aspect ratio, you can refer to the following dimensions: 512×288; 1024×576
(2) Other approximate 16:9 aspect ratios: 1920×1088
(3) Additionally, 1280×720/736 output videos are all 1280×704
(4) 1280×768 can output normally
(5) Higher resolution yields better results but requires higher configurations

7. Default parameters (1280×736, 10 seconds) run approximately 80 points per session, taking about 7 minutes.

Get free generation credits: Click "Your Avatar in the Top Right Corner" > Invitation Code > Enter the Invitation Code to get 1000 RH coins for free, and log in daily for 100 coins!

Invitation Code: rh v1443

LTX2.3 Audio-visual-driven digital human workflow (automatic prompt full configuration adjustable to adapt to 12G)
228
7
21

D-Human

Videos

Image-to-Video

LTX2.3 Audio-visual-driven digital human workflow (automatic prompt full configuration adjustable to adapt to 12G) 228721

D-Human

Videos

Image-to-Video

LTX2.3 Audio-visual-driven digital human workflow (automatic prompt full configuration adjustable to adapt to 12G)
228
7
21