Based on the testing results, S2V lip syncing videos are only suitable for situations where lip syncing is required and both actions and dialogue are needed. It is not recommended to generate non-human videos, and it is best to use vocal music or pure vocals as the audio. If the first 2 seconds of the 5-second audio are pure vocals and the last 3 seconds are background music, it can easily cause interference.
Models that need to be downloaded for local deployment:
1. Wan2.2 T2V high (file name: wan2.2_t2w_high_noise_14B_fp16. safetensors)
https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models
Place folder: models \ diffusionmodels
2. Wan2.2 S2V (file name: wan2.2_st2v5_14B-bf16. safetensors)
https://hf-mirror.com/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models
Place folder: models \ diffusionmodels
3.wav2vec2_large_english_fp16
https://hf-mirror.com/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/audio_encoders
Place folder: models \ audio_coders


Notes:
WanSoundImageToVideo error, update plugin version.
AudioSeparation error, delete and reinstall.