Wan: Open and Advanced Large-Scale Video Generation Model

In this repository, we present Wan2.1, a comprehensive and open video foundation model that pushes the boundaries of video generation. Wan2.1 offers the following key features:

  • 👍 SOTA Performance: Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
  • 👍Support for Consumer-Grade GPUs: The T2V 1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video in about 4 minutes (without using optimization techniques like quantization) on an RTX 4090. Its performance even rivals some closed-source models.
  • 👍Multiple Tasks: Wan2.1 excels in text-to-video, image-to-video, video editing, text-to-image, and video-to-audio tasks, advancing the field of video generation.
  • 👍Visual Text Generation: Wan2.1 is the first video model capable of generating text in both Chinese and English. Its powerful text generation capabilities enhance its practical applications.
  • 👍Powerful Video VAE: Wan VAE offers exceptional efficiency and performance, enabling encoding and decoding of 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.