Info

https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

Test 1 - Different seed variations

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Time: 4m 15.88s | total 494.75 pipeline 249.02 preview 238.10 decode 5.90 move 0.69 prompt 0.61 gc 0.57 post 0.27 | GPU 9470 MB 7% | RAM 2.84 GB 2%

CFG 6, 50 STEPS	2899868740	2561095516	3977700936	1099727609	1972235878
bookshop girl 768px
1024
face and hand 768px
legs and shoes 768px

Test 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Time: 1m 26.80s | total 165.44 pipeline 83.94 preview 78.63 decode 2.59 gc 0.46 | GPU 8872 MB 7% | RAM 2.84 GB 2%

	8	16	20	32	50
CFG0 CFG1
CFG2
CFG3
CFG4
CFG5
CFG6
CFG7
CFG8
CFG9
CFG12

Test 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Time: 55.70s | total 60.21 pipeline 52.82 preview 4.50 decode 2.63 gc 0.46 | GPU 8878 MB 7% | RAM 2.84 GB 2%

	8	16	20	32	50	100
CFG0 CFG1
CFG2
CFG3
CFG4
CFG5
CFG6
CFG7
CFG8
CFG12

Test 3 - Legs

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

Time: 46.40s | total 87.23 pipeline 42.72 preview 39.91 decode 2.54 move 0.89 prompt 0.88 gc 0.50 | GPU 8878 MB 7% | RAM 2.86 GB 2%

	16	20	32	50
CFG4
CFG6
CFG8
CFG10
CFG12
CFG14

System info

app: sdnext.git updated: 2025-07-25 hash: 0b8001c0 url: https://github.com/vladmandic/sdnext.git/tree/dev
arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-24-generic 
python: 3.12.3 Torch: 2.7.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:122.5 used:2.83 total:125.33
xformers:  diffusers: 0.35.0.dev0 transformers: 4.53.2
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: sd_xl_base_1.0 [31e35c80fc] refiner: none vae: none te: none unet: none

Config

{
  "samples_filename_pattern": "[seq]-[date]-[model_name]-[height]x[width]-STEP[steps]-CFG[cfg]-Seed[seed]",
  "diffusers_version": "1c50a5f7e0392281336e21bc3f74ba48f8819207",
  "sd_model_checkpoint": "sd_xl_base_1.0 [31e35c80fc]",
  "sd_checkpoint_hash": "31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b",
  "diffusers_to_gpu": true,
  "diffusers_offload_mode": "none",
  "civitai_token": "f1099bd3751c5985c20e4b25b79cba65"
}

Model Info

Model: sd_xl_base_1.0
Type: sdxl
Class: StableDiffusionXLPipeline
Size: 6 938 078 334 bytes
Modified: 2025-07-15 13:47:33

Module	Class	Device	DType	Params	Modules	Config
vae	AutoencoderKL	xpu:0	torch.bfloat16	83653863	243	FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 4, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.13025, 'shift_factor': None, 'latents_mean': None, 'latents_std': None, 'force_upcast': False, 'use_quant_conv': True, 'use_post_quant_conv': True, 'mid_block_add_attention': True, '_use_default_values': ['mid_block_add_attention', 'latents_std', 'use_quant_conv', 'use_post_quant_conv', 'latents_mean', 'shift_factor'], '_class_name': 'AutoencoderKL', '_diffusers_version': '0.20.0.dev0', '_name_or_path': '../sdxl-vae/'})
text_encoder	CLIPTextModel	xpu:0	torch.bfloat16	123060480	152	CLIPTextConfig { "architectures": [ "CLIPTextModel" ], "attention_dropout": 0.0, "bos_token_id": 0, "dropout": 0.0, "eos_token_id": 2, "hidden_act": "quick_gelu", "hidden_size": 768, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 77, "model_type": "clip_text_model", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "projection_dim": 768, "torch_dtype": "float16", "transformers_version": "4.53.2", "vocab_size": 49408 }
text_encoder_2	CLIPTextModelWithProjection	xpu:0	torch.bfloat16	694659840	393	CLIPTextConfig { "architectures": [ "CLIPTextModelWithProjection" ], "attention_dropout": 0.0, "bos_token_id": 0, "dropout": 0.0, "eos_token_id": 2, "hidden_act": "gelu", "hidden_size": 1280, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 5120, "layer_norm_eps": 1e-05, "max_position_embeddings": 77, "model_type": "clip_text_model", "num_attention_heads": 20, "num_hidden_layers": 32, "pad_token_id": 1, "projection_dim": 1280, "torch_dtype": "float16", "transformers_version": "4.53.2", "vocab_size": 49408 }
tokenizer	CLIPTokenizer	None	None	0	0	None
tokenizer_2	CLIPTokenizer	None	None	0	0	None
unet	UNet2DConditionModel	xpu:0	torch.bfloat16	2567463684	1930	FrozenDict({'sample_size': 128, 'in_channels': 4, 'out_channels': 4, 'center_input_sample': False,
scheduler	EulerDiscreteScheduler	None	None	0	0	FrozenDict({'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'trained_betas': None, 'prediction_type': 'epsilon', 'interpolation_type': 'linear', 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'sigma_min': None, 'sigma_max': None, 'timestep_spacing': 'leading', 'timestep_type': 'discrete', 'steps_offset': 1, 'rescale_betas_zero_snr': False, 'final_sigmas_type': 'zero', '_use_default_values': ['use_exponential_sigmas', 'timestep_type', 'sigma_min', 'final_sigmas_type', 'use_beta_sigmas', 'sigma_max', 'rescale_betas_zero_snr'], '_class_name': 'EulerDiscreteScheduler', '_diffusers_version': '0.19.0.dev0', 'clip_sample': False, 'sample_max_value': 1.0, 'set_alpha_to_one': False, 'skip_prk_steps': True})
image_encoder	NoneType	None	None	0	0	None
feature_extractor	NoneType	None	None	0	0	None
force_zeros_for_empty_prompt	bool	None	None	0	0	None
_class_name	str	None	None	0	0	None
_diffusers_version	str	None	None	0	0	None

{
modelspec.sai_model_spec: "1.0.0",
modelspec.architecture: "stable-diffusion-xl-v1-base",
modelspec.implementation: "https://github.com/Stability-AI/generative-models",
modelspec.title: "Stable Diffusion XL 1.0 Base",
modelspec.author: "StabilityAI",
modelspec.description: "SDXL 1.0 Base Model, compositional expert. SDXL, the most advanced development in the Stable Diffusion text-to-image suite of models. SDXL produces massively improved image and composition detail over its predecessors. The ability to generate hyper-realistic creations for films, television, music, and instructional videos, as well as offering advancements for design and industrial use, places SDXL at the forefront of real world applications for AI imagery.",
modelspec.date: "2023-07-26",
modelspec.resolution: "1024x1024",
modelspec.prediction_type: "epsilon",
modelspec.license: "CreativeML Open RAIL++-M License",
modelspec.thumbnail: "data",
modelspec.hash_sha256: "0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7"
}

Page tree

Test 34 - Stable Diffusion XL base 1.0 - steps and guidance