NetaYume Lumina v3.0
Back

NetaYume Lumina v3.0
99 4 27

Manga

Illustration

2D

Girl

NetaYume Lumina v3.0

I. Introduction

NetaYume Lumina is a text-to-image model developed by Neta.art Laboratory, a high-quality anime-style image generation model fine-tuned on Neta Lumina. It is built upon the open-source foundational model Lumina Image 2.0 released by the Shanghai Artificial Intelligence Laboratory's Alpha VLLM Team.

Main features:

  • High-quality anime generation: Generates detailed anime-style images with clear outlines, vibrant colors, and smooth shading.

  • Improved character understanding: Better captures characters, especially those from the Danbooru dataset, resulting in more coherent and accurate character representations.

  • Enhanced fine details: Accurately generates accessories, clothing textures, hairstyles, and background elements with greater clarity.

II. Information

For version 1.0:

  • This model is fine-tuned based on the NetaLumina model version neta lumina beta 0624 raw, using a custom dataset containing approximately 10 million images. Training was conducted for 3 weeks on 8× NVIDIA B200 GPUs.

For version 2.0:

This version has 2 variations:

Version 2.0:

  • I switched the base model to Neta Lumina v1 and trained this model on my custom dataset, which includes images from e621 and Danbooru. This dataset contains annotations in multiple languages: 30% of the images are annotated in Japanese, 30% in Chinese (50% with Danbooru-style tags and 50% with natural language annotations), and the remaining 40% with natural English descriptions.

  • For annotations, I used ChatGPT and other models capable of rapid optimization to enhance tag quality. Additionally, I modified the code to support multi-scale training instead of being fixed at a resolution of 1024, dynamically adjusting image sizes between 768 and 1536 during training.

  • Note: Currently, I have only evaluated the model using benchmark tests, so its full capabilities are yet to be determined. However, based on my preliminary tests, the model performs quite well when generating images at a resolution of 1312x2048 (as shown in my provided sample images).

  • Moreover, this version the model generates images with the size up to 2048x2048 based on my testing.

老李猛猛画

老李猛猛画

Manga

Illustration

2D

Girl

Model Information

Active
Model Type:
Checkpoint
Basic Model:
SDXL 1.0
Resource Name:
models/checkpoints/netayumeLuminaNetaLumina_v30.safetensors
MD5:
2a13446d59dcf3e7a743fca89d6f7e37

I. Introduction

NetaYume Lumina is a text-to-image model developed by Neta.art Laboratory, a high-quality anime-style image generation model fine-tuned on Neta Lumina. It is built upon the open-source foundational model Lumina Image 2.0 released by the Shanghai Artificial Intelligence Laboratory's Alpha VLLM Team.

Main features:

  • High-quality anime generation: Generates detailed anime-style images with clear outlines, vibrant colors, and smooth shading.

  • Improved character understanding: Better captures characters, especially those from the Danbooru dataset, resulting in more coherent and accurate character representations.

  • Enhanced fine details: Accurately generates accessories, clothing textures, hairstyles, and background elements with greater clarity.

II. Information

For version 1.0:

  • This model is fine-tuned based on the NetaLumina model version neta lumina beta 0624 raw, using a custom dataset containing approximately 10 million images. Training was conducted for 3 weeks on 8× NVIDIA B200 GPUs.

For version 2.0:

This version has 2 variations:

Version 2.0:

  • I switched the base model to Neta Lumina v1 and trained this model on my custom dataset, which includes images from e621 and Danbooru. This dataset contains annotations in multiple languages: 30% of the images are annotated in Japanese, 30% in Chinese (50% with Danbooru-style tags and 50% with natural language annotations), and the remaining 40% with natural English descriptions.

  • For annotations, I used ChatGPT and other models capable of rapid optimization to enhance tag quality. Additionally, I modified the code to support multi-scale training instead of being fixed at a resolution of 1024, dynamically adjusting image sizes between 768 and 1536 during training.

  • Note: Currently, I have only evaluated the model using benchmark tests, so its full capabilities are yet to be determined. However, based on my preliminary tests, the model performs quite well when generating images at a resolution of 1312x2048 (as shown in my provided sample images).

  • Moreover, this version the model generates images with the size up to 2048x2048 based on my testing.