Info
https://huggingface.co/stabilityai/stable-diffusion-2-1-base
Test 1 - Different seed variations
Check different seed impacts onthe on the result
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
...
| 2899868740 | 2561095516 | 3977700936 | 1099727609 | 1972235878 | |
|---|---|---|---|---|---|
bookshop girl |
|
|
| ||
face and hand CFG 6 50 STEPS | |||||
| legs and shoes |
Test 1 - Bookshop
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Parameters: Steps: 32| Size: 768x768| Seed: 2561095516| CFG scale: 3| App: SD.Next| Version: 287c360| Pipeline: StableDiffusionPipeline| Operations: txt2img| Model: stable-diffusion-2-1
Execution: Time: 2m 16.41s | total 160.46 pipeline 132.67 preview 23.83 decode 3.48 gc 0.58 | GPU 3444 MB 3% | RAM 2.98 GB 2%
| 8 | 16 | 20 | 32 | 50 | |
|---|---|---|---|---|---|
CFG0 | |||||
CFG2 | |||||
CFG3 | |||||
CFG4 | |||||
CFG5 | |||||
CFG6 | |||||
CFG7 | |||||
CFG8 | |||||
CFG9 |
Test 2 - Face and hand
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Parameters: Steps: 100| Size: 768x768| Seed: 2561095516| CFG scale: 3| App: SD.Next| Version: 287c360| Pipeline: StableDiffusionPipeline| Operations: txt2img| Model: stable-diffusion-2
...
-1
Time: 6m 57.31s | total 475.28 pipeline 413.12 preview 57.39 decode 3.48 gc 0.65 move 0.40 prompt 0.38 post 0.31 | GPU 3444 MB 3% | RAM 2.97 GB 2%
...
| 8 | 16 | 20 | 32 | 50 | 100 | |
|---|---|---|---|---|---|---|
CFG0 CFG1 | ||||||
CFG2 | ||||||
CFG3 | ||||||
CFG4 | ||||||
6 | ||||||
8 | ||||||
12 |
Test 3 - Legs
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
...
Parameters: Time: 3m 32.10s | total 249.31 pipeline 208.00 preview 36.62 decode 3.49 gc 0.59 move 0.36 prompt 0.33 | GPU 3444 MB 3% | RAM 2.97 GB 2%
...
| 4 | 8 | 16 | 20 | 32 | 50 | |
|---|---|---|---|---|---|---|
CFG0 CFG1 | CFG0 | CFG2 |
System info
CFG2 | |||||
CFG3 | |||||
CFG4 | |||||
6 | |||||
8 | |||||
12 |
System info
| Code Block |
|---|
app: sdnext.git updated: 2025-07-20 hash: 287c3600 url: https://github.com/vladmandic/sdnext.git/tree/dev
arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-24-generic
python: 3.12.3 Torch 2.7.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex:
ram: free:122.37 used:2.96 total:125.33
xformers: diffusers: 0.35.0.dev0 transformers: 4.53.2
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
Cross-attention: Scaled-Dot-Product |
Config
| Code Block |
|---|
{
"sd_model_checkpoint": "stabilityai/stable-diffusion-2-1",
"diffusers_version": "9c13f8657986e68f5f05987912c54432fd28d86f",
"sd_checkpoint_hash": null,
"diffusers_offload_min_gpu_memory": 0.05,
"diffusers_offload_max_gpu_memory": 0.95,
"diffusers_vae_tiling": true,
"diffusers_vae_tile_size": 512,
"dynamic_attention_slice_rate": 1,
"dynamic_attention_trigger_rate": 2,
"samples_filename_pattern": "[seq]-[date]-[model_name]-[height]x[width]-STEP[steps]-CFG[cfg]-Seed[seed]"
} |
Model Info
| Module | Class | Device | DType | Params | Modules | Config |
|---|---|---|---|---|---|---|
tokenizer | CLIPTokenizer | None | None | 0 | 0 | None |
unet | UNet2DConditionModel | xpu:0 | torch.bfloat16 | 865910724 | 709 | FrozenDict({'sample_size': 96, 'in_channels': 4, 'out_channels': 4, 'center_input_sample': False, 'flip_sin_to_cos': True, 'freq_shift': 0, 'down_block_types': ['CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'DownBlock2D'], 'mid_block_type': 'UNetMidBlock2DCrossAttn', 'up_block_types': ['UpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D'], 'only_cross_attention': False, 'block_out_channels': [320, 640, 1280, 1280], 'layers_per_block': 2, 'downsample_padding': 1, 'mid_block_scale_factor': 1, 'dropout': 0.0, 'act_fn': 'silu', 'norm_num_groups': 32, 'norm_eps': 1e-05, 'cross_attention_dim': 1024, 'transformer_layers_per_block': 1, 'reverse_transformer_layers_per_block': None, 'encoder_hid_dim': None, 'encoder_hid_dim_type': None, 'attention_head_dim': [5, 10, 20, 20], 'num_attention_heads': None, 'dual_cross_attention': False, 'use_linear_projection': True, 'class_embed_type': None, 'addition_embed_type': None, 'addition_time_embed_dim': None, 'num_class_embeds': None, 'upcast_attention': True, 'resnet_time_scale_shift': 'default', 'resnet_skip_time_act': False, 'resnet_out_scale_factor': 1.0, 'time_embedding_type': 'positional', 'time_embedding_dim': None, 'time_embedding_act_fn': None, 'timestep_post_act': None, 'time_cond_proj_dim': None, 'conv_in_kernel': 3, 'conv_out_kernel': 3, 'projection_class_embeddings_input_dim': None, 'attention_type': 'default', 'class_embeddings_concat': False, 'mid_block_only_cross_attention': None, 'cross_attention_norm': None, 'addition_embed_type_num_heads': 64, '_use_default_values': ['cross_attention_norm', 'transformer_layers_per_block', 'timestep_post_act', 'num_attention_heads', 'addition_time_embed_dim', 'resnet_time_scale_shift', 'reverse_transformer_layers_per_block', 'time_embedding_type', 'class_embed_type', 'time_embedding_act_fn', 'addition_embed_type', 'attention_type', 'class_embeddings_concat', 'mid_block_type', 'time_cond_proj_dim', 'encoder_hid_dim_type', 'addition_embed_type_num_heads', 'resnet_skip_time_act', 'mid_block_only_cross_attention', 'resnet_out_scale_factor', 'conv_in_kernel', 'dropout', 'conv_out_kernel', 'projection_class_embeddings_input_dim', 'encoder_hid_dim', 'time_embedding_dim'], '_class_name': 'UNet2DConditionModel', '_diffusers_version': '0.10.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/unet'}) |
scheduler | DDIMScheduler | None | None | 0 | 0 | FrozenDict({'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'trained_betas': None, 'clip_sample': False, 'set_alpha_to_one': False, 'steps_offset': 1, 'prediction_type': 'v_prediction', 'thresholding': False, 'dynamic_thresholding_ratio': 0.995, 'clip_sample_range': 1.0, 'sample_max_value': 1.0, 'timestep_spacing': 'leading', 'rescale_betas_zero_snr': False, '_use_default_values': ['clip_sample_range', 'thresholding', 'timestep_spacing', 'dynamic_thresholding_ratio', 'rescale_betas_zero_snr', 'sample_max_value'], '_class_name': 'DDIMScheduler', '_diffusers_version': '0.8.0', 'skip_prk_steps': True}) |
safety_checker | NoneType | None | None | 0 | 0 | None |
feature_extractor | CLIPImageProcessor | None | None | 0 | 0 | None |
image_encoder | NoneType | None | None | 0 | 0 | None |
requires_safety_checker | bool | None | None | 0 | 0 | None |
_name_or_path | str | None | None | 0 | 0 | None |
_class_name | str | None | None | 0 | 0 | None |
_diffusers_version | str | None | None | 0 | 0 | None |
...