Wan2.1 High-quality image-to-video
457
2
32

Text-to-Video

Videos

Image-to-Video

Wan2.1 is the next-generation video generation model developed by Alibaba's Tongyi Wanxiang team, achieving significant breakthroughs in AI-driven visual content creation.

• Chinese and English video model: Wan2.1 is the first video model capable of generating videos with Chinese and English text, featuring powerful text generation capabilities that enhance its practicality. It can generate cinematic-level text and animations. It supports font applications in various scenarios, including special effect fonts, poster fonts, and font displays in real-world scenes, meeting a variety of professional needs.
• Multi-video tasks: Provides powerful text-to-video and image-to-video generation, as well as video editing, video-to-audio, and other task capabilities.
• High-quality performance: Wan 2.1 is based on a hybrid Variational Autoencoder (VAE) and Diffusion Transformer (DiT) architecture, enhancing temporal modeling and scene understanding capabilities. Through multimodal fusion technology, it can simultaneously generate high-definition videos, dynamic subtitles, and multilingual dubbing, supporting 1080p resolution and efficient encoding and decoding, ensuring high-quality video output. In January 2025, Alibaba Tongyi Wanxiang's Wan2.1 model topped the Vbench leaderboard, surpassing Sora, HunyuanVideo, Minimax, Luma, Gen3, Pika, and other domestic and international video generation models. It has consistently outperformed existing open-source models and state-of-the-art commercial solutions in multiple benchmark tests.

457

Download

Open AI App

wangyi AI Studio

2025-05-02 Update

Text-to-Video

Videos

Image-to-Video

wangyi AI Studio

2025-05-02 Update

Workflow introduction

Wan2.1 is the next-generation video generation model developed by Alibaba's Tongyi Wanxiang team, achieving significant breakthroughs in AI-driven visual content creation.

• Chinese and English video model: Wan2.1 is the first video model capable of generating videos with Chinese and English text, featuring powerful text generation capabilities that enhance its practicality. It can generate cinematic-level text and animations. It supports font applications in various scenarios, including special effect fonts, poster fonts, and font displays in real-world scenes, meeting a variety of professional needs.
• Multi-video tasks: Provides powerful text-to-video and image-to-video generation, as well as video editing, video-to-audio, and other task capabilities.
• High-quality performance: Wan 2.1 is based on a hybrid Variational Autoencoder (VAE) and Diffusion Transformer (DiT) architecture, enhancing temporal modeling and scene understanding capabilities. Through multimodal fusion technology, it can simultaneously generate high-definition videos, dynamic subtitles, and multilingual dubbing, supporting 1080p resolution and efficient encoding and decoding, ensuring high-quality video output. In January 2025, Alibaba Tongyi Wanxiang's Wan2.1 model topped the Vbench leaderboard, surpassing Sora, HunyuanVideo, Minimax, Luma, Gen3, Pika, and other domestic and international video generation models. It has consistently outperformed existing open-source models and state-of-the-art commercial solutions in multiple benchmark tests.

Nodes Information

Primitive Nodes (1)

LoadImage

Custom Nodes (13)

LoadWanVideoClipTextEncoder

LoadWanVideoT5TextEncoder

Note

Note Plus (mtb)

VHS_VideoCombine

WanVideoBlockSwap

WanVideoDecode

WanVideoImageClipEncode

WanVideoModelLoader

WanVideoSampler

WanVideoTextEncode

WanVideoVAELoader

easy cleanGpuUsed

Wan2.1 High-quality image-to-video 457232

Text-to-Video

Videos

Image-to-Video

Wan2.1 High-quality image-to-video
457
2
32