VoiceGate is a cross-language video intelligent dubbing engine built on VoxCPM2 and ComfyUI. VoxCPM2 supports 30 languages (including eight Southeast Asian languages) and 9 Chinese dialects (Cantonese, Sichuanese, Wu, Northeastern, Minnan, etc.), with capabilities for voice cloning and timbre design. The engine achieves frame-level alignment of TTS voice and SRT subtitle timestamps through the self-developed VoiceBridge plugin, ensuring precise synchronization of dubbing and visuals.

The complete pipeline covers ASR subtitle extraction, LLM translation, multilingual TTS to audio alignment and merging, with visualized node graph orchestration, ready to use out of the box.



Input: Video and target language

Output: Clone the timbre of the input video and generate a video in the target language while outputting corresponding subtitles. The output audio aligns with the input video at the subtitle level.

Please copy the target language from below: Arabic, Burmese, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Khmer, Korean, Lao, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Vietnamese



GitHub project:

https://github.com/YanTianlong 01/VoiceGate