A workflow that enables video characters to lip-sync, with the video duration needing to be equal to or greater than half the audio length.

Models required for local deployment (not necessarily these versions):

1. Wan2_1 I2V 14B 480P_fp8_e4m3fn

https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled

2. Wan2_1 InfiniTetalk Single_fp16.safetensors

https://huggingface.co/Kijai/WanVideo_comfy/tree/main/InfiniteTalk

3. lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16

https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Lightx2v