NONE.1-sqvd-int4










This model is fully fine-tuned with the Flux.1 dev model.
Compressed using MIT Han Song's svdq 4-bit quantization technology.
The main purpose is to address the issues of weak realism, insufficient visual lens sense, and weak detail resolution in the original Flux model.
At the same time, it aims to minimize excessive deviation of the overall model weights, avoiding overfitting of partial weights or catastrophic forgetting of weights outside the training set.
Note! This model is positioned for direct camera capture output, and most of the training set consists of unedited original images, leaving more room for post-processing. It does not have the overly retouched effect of common models, and no specific training was done to enhance facial data, so beauty or unattractiveness is left to chance.
It is divided into three stages:
Stage 1: Gradually unfreeze the transformer block Attention q k layers while freezing most layers of the model, fine-tuning the token sequence alignment performance of prompt words, and adjusting the response performance to annotation logic.
Stage 2: Freeze other layers and conduct image fitting training on the transformer blocks ff layers to enhance visual performance and composition.
Stage 3: Freeze other layers and gradually unfreeze the single transformer blocks layer, testing by unfreezing 7 layers at a time until unfreezing transformer block 15-18. Subsequent tests involve pruning single transformer blocks proj_mlp and proj_out layers for fine-tuning the final visual presentation results.
The merged layer is BFL, adapted for ComfyUI. Reminder: The effect of webUI generation has not been tested, use cautiously.
Recommended parameters:
ComfyUI
Sampler: DPMpp 2s a
Sigmas: beta
Shift: 1.1/0.6
FluxGuidance: 3
Step: 15
Recommendation: Use CN for upscaling; example images include workflows.
webUI (not recommended)
Sampler: DPMpp 2m
Sigmas: beta
CFG: 1
Step: 20
Normal upscaling is sufficient.
Prompt logic recommendation:
1. Main subject
2. Appearance features and attire
3. Expression and posture
4. The relationship between background elements and the subject
5. Geometric and abstract shapes, colors, and relative positions
6. Light and shadow, color tone, and the resulting main atmosphere
Combine these into a natural language description. You can use large language models to assist in writing.
Model Information
This model is fully fine-tuned with the Flux.1 dev model.
Compressed using MIT Han Song's svdq 4-bit quantization technology.
The main purpose is to address the issues of weak realism, insufficient visual lens sense, and weak detail resolution in the original Flux model.
At the same time, it aims to minimize excessive deviation of the overall model weights, avoiding overfitting of partial weights or catastrophic forgetting of weights outside the training set.
Note! This model is positioned for direct camera capture output, and most of the training set consists of unedited original images, leaving more room for post-processing. It does not have the overly retouched effect of common models, and no specific training was done to enhance facial data, so beauty or unattractiveness is left to chance.
It is divided into three stages:
Stage 1: Gradually unfreeze the transformer block Attention q k layers while freezing most layers of the model, fine-tuning the token sequence alignment performance of prompt words, and adjusting the response performance to annotation logic.
Stage 2: Freeze other layers and conduct image fitting training on the transformer blocks ff layers to enhance visual performance and composition.
Stage 3: Freeze other layers and gradually unfreeze the single transformer blocks layer, testing by unfreezing 7 layers at a time until unfreezing transformer block 15-18. Subsequent tests involve pruning single transformer blocks proj_mlp and proj_out layers for fine-tuning the final visual presentation results.
The merged layer is BFL, adapted for ComfyUI. Reminder: The effect of webUI generation has not been tested, use cautiously.
Recommended parameters:
ComfyUI
Sampler: DPMpp 2s a
Sigmas: beta
Shift: 1.1/0.6
FluxGuidance: 3
Step: 15
Recommendation: Use CN for upscaling; example images include workflows.
webUI (not recommended)
Sampler: DPMpp 2m
Sigmas: beta
CFG: 1
Step: 20
Normal upscaling is sufficient.
Prompt logic recommendation:
1. Main subject
2. Appearance features and attire
3. Expression and posture
4. The relationship between background elements and the subject
5. Geometric and abstract shapes, colors, and relative positions
6. Light and shadow, color tone, and the resulting main atmosphere
Combine these into a natural language description. You can use large language models to assist in writing.