

Janus Pro is a unified understanding and generation MLLM, which decouples visual encoding to achieve multimodal understanding and generation. Janus Pro is constructed based on DeepSeek LLM 1.5b base/DeepSeek LLM 7b base.
For multimodal understanding, it uses SigLIP L as the vision encoder, supporting 384 x 384 image input. For image generation, Janus Pro uses the tokenizer from here, with a downsample rate of 16.
Janus Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus Pro is constructed based on the DeepSeek LLM 1.5b base/DeepSeek LLM 7b base.
For multimodal understanding, it uses the SigLIP L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus Pro uses the tokenizer from here with a downsample rate of 16.
Janus Pro is a unified understanding and generation MLLM, which decouples visual encoding to achieve multimodal understanding and generation. Janus Pro is constructed based on DeepSeek LLM 1.5b base/DeepSeek LLM 7b base.
For multimodal understanding, it uses SigLIP L as the vision encoder, supporting 384 x 384 image input. For image generation, Janus Pro uses the tokenizer from here, with a downsample rate of 16.
Janus Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus Pro is constructed based on the DeepSeek LLM 1.5b base/DeepSeek LLM 7b base.
For multimodal understanding, it uses the SigLIP L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus Pro uses the tokenizer from here with a downsample rate of 16.