FramePack is a progressive next-frame generation (partial next-frame) prediction neural network architecture.

FramePack compresses the input context into a fixed length, so that the workload for generation is not affected by video length.

Even on a laptop GPU, FramePack can handle a large number of frames using a 13B model.

FramePack can be trained with a larger batch size, similar to the batch size used in image diffusion training.

Video diffusion, but it feels like image diffusion.