Info

https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo

Defaults for Turbo:

    num_inference_steps=4,
    guidance_scale=0.0,

Defaults for large

    num_inference_steps=28,
    guidance_scale=3.5,

Test 0 - Different seed variations

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Parameters: Steps: 28| Size: 1024x1024| Seed: 1620085323| CFG scale: 3.5| App: SD.Next| Version: b56d508| Pipeline: StableDiffusion3Pipeline| Operations: txt2img| Model: stable-diffusion-3.5-large

Execution: Time: 25m 47.53s | total 1651.35 pipeline 1535.92 preview 90.80 decode 11.56 prompt 7.30 offload 4.78 move 0.63 gc 0.30 | GPU 13394 MB 10% | RAM 18.12 GB 14%

CFG3.5, STEP 28Seed: 1620085323Seed: 1931701040Seed: 2736029172Seed: 4075624134
bookshop girl

hand and face




legs and shoes



Test 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling



816203250

CFG1






CFG2






CFG3






CFG4






CFG5






CFG6






CFG8







Test 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.



816203250

CFG1






CFG2






CFG3






CFG4






CFG6






CFG8






Test 3 - Legs

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.



816203250

CFG1






CFG2






CFG3






CFG4






CFG6






CFG8







System info


Mon Aug  4 19:48:58 2025
app: sdnext.git updated: 2025-08-01 hash: b56d508a url: https://github.com/vladmandic/sdnext.git/tree/master
arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-27-generic 
python: 3.12.3 Torch: 2.7.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:122.51 used:2.82 total:125.33
xformers:  diffusers: 0.35.0.dev0 transformers: 4.54.1
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: Diffusers/stabilityai/stable-diffusion-3.5-large [ceddf0a7fd] refiner: none vae: none te: none unet: none
Backend: ipex Cross-attention: Scaled-Dot-Product


Model

Model: Diffusers/stabilityai/stable-diffusion-3.5-large
Type: sd3
Class: StableDiffusion3Pipeline
Size: 0 bytes
Modified: 2025-08-04 11:51:22


Module Class Device DType Params Modules Config

vae

AutoencoderKL

cpu

torch.bfloat16

83819683

241

FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 1.5305, 'shift_factor': 0.0609, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.31.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--stabilityai--stable-diffusion-3.5-large/snapshots/ceddf0a7fdf2064ea28e2213e3b84e4afa170a0f/vae'})

text_encoder

CLIPTextModelWithProjection

xpu:0

torch.bfloat16

123650304

153

CLIPTextConfig { "architectures": [ "CLIPTextModelWithProjection" ], "attention_dropout": 0.0, "bos_token_id": 0, "dropout": 0.0, "eos_token_id": 2, "hidden_act": "quick_gelu", "hidden_size": 768, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 77, "model_type": "clip_text_model", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "projection_dim": 768, "torch_dtype": "bfloat16", "transformers_version": "4.54.1", "vocab_size": 49408 }

text_encoder_2

CLIPTextModelWithProjection

xpu:0

torch.bfloat16

694659840

393

CLIPTextConfig { "architectures": [ "CLIPTextModelWithProjection" ], "attention_dropout": 0.0, "bos_token_id": 0, "dropout": 0.0, "eos_token_id": 2, "hidden_act": "gelu", "hidden_size": 1280, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 5120, "layer_norm_eps": 1e-05, "max_position_embeddings": 77, "model_type": "clip_text_model", "num_attention_heads": 20, "num_hidden_layers": 32, "pad_token_id": 1, "projection_dim": 1280, "torch_dtype": "bfloat16", "transformers_version": "4.54.1", "vocab_size": 49408 }

text_encoder_3

T5EncoderModel

xpu:0

torch.bfloat16

4762310656

463

T5Config { "architectures": [ "T5EncoderModel" ], "classifier_dropout": 0.0, "d_ff": 10240, "d_kv": 64, "d_model": 4096, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": false, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 24, "num_heads": 64, "num_layers": 24, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.54.1", "use_cache": false, "vocab_size": 32128 }

tokenizer

CLIPTokenizer

None

None

0

0

None

tokenizer_2

CLIPTokenizer

None

None

0

0

None

tokenizer_3

T5TokenizerFast

None

None

0

0

None

transformer

SD3Transformer2DModel

xpu:0

torch.bfloat16

8056627520

1456

FrozenDict({'sample_size': 128, 'patch_size': 2, 'in_channels': 16, 'num_layers': 38, 'attention_head_dim': 64, 'num_attention_heads': 38, 'joint_attention_dim': 4096, 'caption_projection_dim': 2432, 'pooled_projection_dim': 2048, 'out_channels': 16, 'pos_embed_max_size': 192, 'dual_attention_layers': (), 'qk_norm': 'rms_norm', '_use_default_values': ['dual_attention_layers'], '_class_name': 'SD3Transformer2DModel', '_diffusers_version': '0.31.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--stabilityai--stable-diffusion-3.5-large/snapshots/ceddf0a7fdf2064ea28e2213e3b84e4afa170a0f/transformer'})

scheduler

FlowMatchEulerDiscreteScheduler

None

None

0

0

FrozenDict({'num_train_timesteps': 1000, 'shift': 3.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_use_default_values': ['base_image_seq_len', 'invert_sigmas', 'use_exponential_sigmas', 'use_beta_sigmas', 'stochastic_sampling', 'max_image_seq_len', 'max_shift', 'time_shift_type', 'shift_terminal', 'use_karras_sigmas', 'base_shift', 'use_dynamic_shifting'], '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.29.0.dev0'})

image_encoder

NoneType

None

None

0

0

None

feature_extractor

NoneType

None

None

0

0

None

_name_or_path

str

None

None

0

0

None

_class_name

str

None

None

0

0

None

_diffusers_version

str

None

None

0

0

None