Janus Pro is a unified understanding and generation MLLM, which decouples visual encoding to achieve multimodal understanding and generation. Janus Pro is constructed based on DeepSeek LLM 1.5b base/DeepSeek LLM 7b base.

For multimodal understanding, it uses SigLIP L as the vision encoder, supporting 384 x 384 image input. For image generation, Janus Pro uses the tokenizer from here, with a downsample rate of 16.

Janus Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus Pro is constructed based on the DeepSeek LLM 1.5b base/DeepSeek LLM 7b base.

For multimodal understanding, it uses the SigLIP L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus Pro uses the tokenizer from here with a downsample rate of 16.

DeepSeek Janus Pro raw images and reverse推
594
0
13

Text-to-Image

Other

DeepSeek Janus Pro raw images and reverse推 594013

Text-to-Image

Other

DeepSeek Janus Pro raw images and reverse推
594
0
13