说明文档

SDXL-Turbo 模型卡片

row01 SDXL-Turbo 是一个快速的文本生成图像模型，能够在单次网络评估中根据文本提示合成逼真的图像。实时演示可在此处获取：http://clipdrop.co/stable-diffusion-turbo

请注意：商业使用请参考 https://stability.ai/license。

模型详情

模型描述

SDXL-Turbo 是 SDXL 1.0 的蒸馏版本，专为实时合成而训练。 SDXL-Turbo 基于一种名为对抗扩散蒸馏的新训练方法（ADD）（参见技术报告），该方法允许在1到4步内以高图像质量采样大规模基础图像扩散模型。这种方法使用分数蒸馏来利用大规模现成的图像扩散模型作为教师信号，并结合对抗损失以确保即使在一步或两步采样的低步数情况下也能保持高图像保真度。

开发者： Stability AI
资助方： Stability AI
模型类型： 生成式文本生成图像模型
基于模型微调： SDXL 1.0 Base

模型来源

出于研究目的，我们推荐我们的 generative-models Github 仓库，该仓库实现了最流行的扩散框架（包括训练和推理）。

仓库： https://github.com/Stability-AI/generative-models
论文： https://stability.ai/research/adversarial-diffusion-distillation
演示： http://clipdrop.co/stable-diffusion-turbo

评估

comparison1 comparison2 上图表评估了用户对 SDXL-Turbo 与其他单步和多步模型的偏好。在单步评估下，SDXL-Turbo 在图像质量和提示遵循方面被人类投票者认为优于四步（或更少步）评估的 LCM-XL。此外，我们发现对 SDXL-Turbo 使用四步可以进一步提高性能。有关用户研究的详细信息，请参阅研究论文。

用途

直接使用

该模型适用于非商业和商业用途。您可以根据此许可将此模型用于非商业或研究目的。可能的研究领域和任务包括

生成模型研究。
生成模型实时应用研究。
实时生成模型影响研究。
安全部署可能生成有害内容的模型。
探索和理解生成模型的局限性和偏见。
艺术作品生成以及在设计和其他艺术过程中的应用。
教育或创意工具中的应用。

商业使用请参考 https://stability.ai/membership。

排除的用途如下所述。

Diffusers

pip install diffusers transformers accelerate --upgrade

文本生成图像：

SDXL-Turbo 不使用 guidance_scale 或 negative_prompt，我们通过 guidance_scale=0.0 禁用它。该模型最好生成 512x512 尺寸的图像，但更高的图像尺寸也可以。单步足以生成高质量图像。

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

图像生成图像：

当使用 SDXL-Turbo 进行图像生成图像时，请确保 num_inference_steps * strength 大于或等于 1。图像生成图像管道将运行 int(num_inference_steps * strength) 步，例如下面示例中的 0.5 * 2.0 = 1 步。

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]

超出范围的用途

该模型并未被训练用于生成人物或事件的真实或事实性表征，因此使用该模型生成此类内容超出了该模型的能力范围。该模型不应以任何违反 Stability AI 可接受使用政策的方式使用。

局限性和偏见

局限性

生成的图像具有固定分辨率（512x512 像素），且该模型无法实现完美的逼真度。
该模型无法渲染清晰的文本。
面部和人物整体可能无法正确生成。
模型的自动编码部分是有损的。

建议

该模型适用于非商业和商业用途。

如何开始使用该模型

请查看 https://github.com/Stability-AI/generative-models

toobakhann15/sdxl_image

作者 toobakhann15

text-to-image diffusers

↓ 0 ♥ 0

创建时间: 2025-09-01 13:35:11+00:00

更新时间: 2025-09-05 16:00:55+00:00

在 Hugging Face 上查看

文件 (39)

.gitattributes

LICENSE.md

README.md

image_quality_one_step.png

model_index.json

output_tile.jpg

prompt_alignment_one_step.png

scheduler/scheduler_config.json

sd_xl_turbo_1.0.safetensors

sd_xl_turbo_1.0_fp16.safetensors

text_encoder/config.json

text_encoder/model.fp16.safetensors

text_encoder/model.onnx ONNX

text_encoder/model.safetensors

text_encoder_2/config.json

text_encoder_2/model.fp16.safetensors

text_encoder_2/model.onnx ONNX

text_encoder_2/model.onnx_data

text_encoder_2/model.safetensors

tokenizer/merges.txt

tokenizer/special_tokens_map.json

tokenizer/tokenizer_config.json

tokenizer/vocab.json

tokenizer_2/merges.txt

tokenizer_2/special_tokens_map.json

tokenizer_2/tokenizer_config.json

tokenizer_2/vocab.json

unet/config.json

unet/diffusion_pytorch_model.fp16.safetensors

unet/diffusion_pytorch_model.safetensors

unet/model.onnx ONNX

unet/model.onnx_data

vae/config.json

vae/diffusion_pytorch_model.fp16.safetensors

vae/diffusion_pytorch_model.safetensors

vae_decoder/config.json

vae_decoder/model.onnx ONNX

vae_encoder/config.json

vae_encoder/model.onnx ONNX