


Powered By RTX 4090
Wan2.1 is the next-generation video generation model developed by Alibaba's Tongyi Wanxiang team, achieving significant breakthroughs in AI-driven visual content creation.
• Chinese and English video model: Wan2.1 is the first video model capable of generating videos with Chinese and English text, featuring powerful text generation capabilities that enhance its practicality. It can generate cinematic-level text and animations. It supports font applications in various scenarios, including special effect fonts, poster fonts, and font displays in real-world scenes, meeting a variety of professional needs. • Multi-video tasks: Provides powerful text-to-video and image-to-video generation, as well as video editing, video-to-audio, and other task capabilities. • High-quality performance: Wan 2.1 is based on a hybrid Variational Autoencoder (VAE) and Diffusion Transformer (DiT) architecture, enhancing temporal modeling and scene understanding capabilities. Through multimodal fusion technology, it can simultaneously generate high-definition videos, dynamic subtitles, and multilingual dubbing, supporting 1080p resolution and efficient encoding and decoding, ensuring high-quality video output. In January 2025, Alibaba Tongyi Wanxiang's Wan2.1 model topped the Vbench leaderboard, surpassing Sora, HunyuanVideo, Minimax, Luma, Gen3, Pika, and other domestic and international video generation models. It has consistently outperformed existing open-source models and state-of-the-art commercial solutions in multiple benchmark tests.
wangyi AI Studio
2025-05-02 Update
wangyi AI Studio
2025-05-02 Update
Workflow introduction
Wan2.1 is the next-generation video generation model developed by Alibaba's Tongyi Wanxiang team, achieving significant breakthroughs in AI-driven visual content creation.
• Chinese and English video model: Wan2.1 is the first video model capable of generating videos with Chinese and English text, featuring powerful text generation capabilities that enhance its practicality. It can generate cinematic-level text and animations. It supports font applications in various scenarios, including special effect fonts, poster fonts, and font displays in real-world scenes, meeting a variety of professional needs. • Multi-video tasks: Provides powerful text-to-video and image-to-video generation, as well as video editing, video-to-audio, and other task capabilities. • High-quality performance: Wan 2.1 is based on a hybrid Variational Autoencoder (VAE) and Diffusion Transformer (DiT) architecture, enhancing temporal modeling and scene understanding capabilities. Through multimodal fusion technology, it can simultaneously generate high-definition videos, dynamic subtitles, and multilingual dubbing, supporting 1080p resolution and efficient encoding and decoding, ensuring high-quality video output. In January 2025, Alibaba Tongyi Wanxiang's Wan2.1 model topped the Vbench leaderboard, surpassing Sora, HunyuanVideo, Minimax, Luma, Gen3, Pika, and other domestic and international video generation models. It has consistently outperformed existing open-source models and state-of-the-art commercial solutions in multiple benchmark tests.
Nodes Information
14
LoadImage
LoadWanVideoClipTextEncoder
LoadWanVideoT5TextEncoder
Note
Note Plus (mtb)
VHS_VideoCombine
WanVideoBlockSwap
WanVideoDecode
WanVideoImageClipEncode
WanVideoModelLoader
WanVideoSampler
WanVideoTextEncode
WanVideoVAELoader
easy cleanGpuUsed