CogVideo Tora
5
0

Image-to-Video

CogVideo Tora

CogVideo Tora

Powered By RTX 4090

Easy to burst video memory

A trajectory-oriented DiT framework that simultaneously integrates text, vision, and trajectory conditions to generate videos. Specifically, Tora consists of a trajectory extractor (TE), spatiotemporal DiT, and a motion-guided fusion (MGF). TE encodes arbitrary trajectories into hierarchical spatiotemporal motion blocks using a 3D video compression network. MGF integrates motion blocks into DiT blocks to generate consistent videos that follow the trajectory. Our design fits seamlessly with the scalability of DiT, allowing precise control of the dynamics of video content with different durations, aspect ratios, and resolutions. Extensive experiments demonstrate that Tora excels in achieving high motion fidelity while carefully simulating the motion of the physical world.

https://github.com/alibaba/Tora?tab=readme-ov-file#-inference

5

Download

0

0

Ethan

2024-11-04 Update

Image-to-Video

Ethan

2024-11-04 Update

Workflow introduction

Easy to burst video memory

A trajectory-oriented DiT framework that simultaneously integrates text, vision, and trajectory conditions to generate videos. Specifically, Tora consists of a trajectory extractor (TE), spatiotemporal DiT, and a motion-guided fusion (MGF). TE encodes arbitrary trajectories into hierarchical spatiotemporal motion blocks using a 3D video compression network. MGF integrates motion blocks into DiT blocks to generate consistent videos that follow the trajectory. Our design fits seamlessly with the scalability of DiT, allowing precise control of the dynamics of video content with different durations, aspect ratios, and resolutions. Extensive experiments demonstrate that Tora excels in achieving high motion fidelity while carefully simulating the motion of the physical world.

https://github.com/alibaba/Tora?tab=readme-ov-file#-inference

Nodes Information

19

Primitive Nodes (3)

CLIPLoader

LoadImage

PreviewImage

Custom Nodes (16)

CogVideoDecode

CogVideoImageEncode

CogVideoSampler

CogVideoTextEncode

CreateShapeImageOnPath

DownloadAndLoadCogVideoModel

DownloadAndLoadToraModel

GetMaskSizeAndCount

ImageCompositeMasked

ImageResizeAdvanced

ImageResizeKJ

Label (rgthree)

SplineEditor

ToraEncodeTrajectory

VHS_VideoCombine

easy cleanGpuUsed