ltx2_s2v
610
11
35

Consistent Characters

Image-to-Video

Quirky Creations

Through this ComfyUI workflow, we can directly input an audio clip (such as a song or speech) and a reference image to generate high-quality videos that are lip synced to the audio.

This technique allows you to generate videos that perfectly lip sync to any audio track using just a reference image and a sound file. I have included two versions of the workflow, focusing heavily on the Low VRAM optimized version using GGUF models, which surprisingly delivered better quality in my tests.

610

Download

有趣的80后程序员

2026-01-15 Update

Consistent Characters

Image-to-Video

Quirky Creations

有趣的80后程序员

2026-01-15 Update

Workflow introduction

Through this ComfyUI workflow, we can directly input an audio clip (such as a song or speech) and a reference image to generate high-quality videos that are lip synced to the audio.

Nodes Information

Primitive Nodes (7)

CLIPTextEncode

DualCLIPLoader

LoadImage

SetLatentNoiseMask

UNETLoader

VAEDecode

VAELoader

Custom Nodes (26)

CFGGuider

EmptyLTXVLatentVideo

GetImageSize

GetNode

INTConstant

ImageResizeKJv2

KSamplerSelect

LTXVAudioVAEDecode

LTXVAudioVAEEncode

LTXVConcatAVLatent

LTXVConditioning

LTXVImgToVideoInplace

LTXVPreprocess

LTXVScheduler

LTXVSeparateAVLatent

LoadAudio

MelBandRoFormerModelLoader

MelBandRoFormerSampler

PreviewAudio

RandomNoise

SamplerCustomAdvanced

SetNode

SolidMask

TrimAudioDuration

VAELoaderKJ

VHS_VideoCombine

ltx2_s2v 6101135

Consistent Characters

Image-to-Video

Quirky Creations

ltx2_s2v
610
11
35