A 15-second MV production requires only one character reference image, a song longer than 15 seconds, and three MV descriptions. Utilizes the infinite Talk and Humo models.

Local deployment requires the following models to be installed:

1. Humo (KJ Quantized Version)

https://huggingface.co/Kijai/WanVideo_comfy/tree/main/HuMo

Folder placement:

whisper_large_v3_encoder_fp16.safetensors

models/audio_encoders

Wan2_1 HuMo 14B_fp16.safetensors

models\diffusion_models

2. InfiniteTalk

https://huggingface.co/MeiGen AI/InfiniteTalk/tree/main/comfyui

Placement location: models/diffusion models