lynx_lite_resampler_fp32.safetensors
Back

lynx_lite_resampler_fp32.safetensors
0 1 6

WAN

lynx_lite_resampler_fp32.safetensors

Lynx

Towards High-Fidelity Personalized Video Generation

We present Lynx, a high-fidelity model for personalized video synthesis from a single input image. Built on an open-source Diffusion Transformer (DiT) foundation model, Lynx introduces two lightweight adapters to ensure identity fidelity. The ID-adapter employs a Perceiver Resampler to convert ArcFace-derived facial embeddings into compact identity tokens for conditioning, while the Ref-adapter integrates dense VAE features from a frozen reference pathway, injecting fine-grained details across all transformer layers through cross-attention. These modules collectively enable robust identity preservation while maintaining temporal coherence and visual realism. Through evaluation on a curated benchmark of 40 subjects and 20 unbiased prompts, which yielded 800 test cases, Lynx has demonstrated superior face resemblance, competitive prompt following, and strong video quality, thereby advancing the state of personalized video generation.

This model is sourced from an external transfer (transfer address: 1.0 ),if the original author has objections to this transfer, you can click,
Appeal
We will, within 24 hours, edit, delete, or transfer the model to the original author according to the original author's request

Pinto 平托

Pinto 平托

WAN

Model Information

Frozen
Original author:
byteaigc
Model Type:
LoRA
Basic Model:
WAN2.1
Resource Name:
models/loras/lynx_lite_resampler_fp32.safetensors
MD5:
17e1b89c2b83595c4ac5ac3f6392f4b0

Lynx

Towards High-Fidelity Personalized Video Generation

We present Lynx, a high-fidelity model for personalized video synthesis from a single input image. Built on an open-source Diffusion Transformer (DiT) foundation model, Lynx introduces two lightweight adapters to ensure identity fidelity. The ID-adapter employs a Perceiver Resampler to convert ArcFace-derived facial embeddings into compact identity tokens for conditioning, while the Ref-adapter integrates dense VAE features from a frozen reference pathway, injecting fine-grained details across all transformer layers through cross-attention. These modules collectively enable robust identity preservation while maintaining temporal coherence and visual realism. Through evaluation on a curated benchmark of 40 subjects and 20 unbiased prompts, which yielded 800 test cases, Lynx has demonstrated superior face resemblance, competitive prompt following, and strong video quality, thereby advancing the state of personalized video generation.

This model is sourced from an external transfer (transfer address: 1.0 ),if the original author has objections to this transfer, you can click,
Appeal
We will, within 24 hours, edit, delete, or transfer the model to the original author according to the original author's request