Part 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Parameters: Steps: 20| Size: 1024x1024| Seed: 3286438823| CFG scale: 7| Model: Cosmos-Predict2-14B-Text2Image| App: SD.Next| Version: bcea748| Operations: txt2img| Pipeline: Cosmos2TextToImagePipeline

Execution: Time: 23m 33.78s | total 1433.89 pipeline 1399.03 offload 14.71 move 7.36 decode 7.36 te 5.40 | GPU 41668 MB 32% | RAM 29.59 GB 24%


STEPS: 2STEPS: 4STEPS: 6STEPS: 8STEPS: 12STEPS: 16STEPS: 20STEPS: 32
CFG0

CFG1
=
CFG0

CGF2

CFG3

CFG4

CFG5

CFG6

CFG7

Part 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Parameters: Steps: 32| Size: 1024x1024| Seed: 3286438823| CFG scale: 8| Model: Cosmos-Predict2-14B-Text2Image| App: SD.Next| Version: bcea748| Operations: txt2img| Pipeline: Cosmos2TextToImagePipeline

Execution: Time: 38m 26.29s | total 2319.18 pipeline 2298.66 offload 7.60 decode 7.59 te 5.28 | GPU 41668 MB 32% | RAM 29.63 GB 24%


8162032
CFG=1

CFG=2

CFG=6

CFG=8

Part 3 - Legs and ribbon



8162032
CFG=1


CFG=2

CFG=6

CFG=8

System Info

app: sdnext.git updated: 2025-06-28 hash: bcea748d url: https://github.com/vladmandic/sdnext.git/tree/dev
arch: x86_64 cpu: x86_64 system: Linux release: 6.11.0-28-generic
python: 3.12.3 Torch 2.7.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:96.45 used:28.88 total:125.33
xformers: diffusers: 0.35.0.dev0 transformers: 4.53.0
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: Diffusers/nvidia/Cosmos-Predict2-14B-Text2Image [015332720f] refiner: none vae: none te: noneunet: none

Model Data

Model: Diffusers/nvidia/Cosmos-Predict2-14B-Text2Image
Type: cosmos
Class: Cosmos2TextToImagePipeline
Size: 0 bytes
Modified: 2025-06-29 12:17:45

SD.Next dev 2025-06-29

Module

Class

Device

DType

Params

Modules

Config

vae

AutoencoderKLWan

xpu:0

torch.bfloat16

126892531

260

FrozenDict({'base_dim': 96, 'z_dim': 16, 'dim_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_scales': [], 'temperal_downsample': [False, True, True], 'dropout': 0.0, 'latents_mean': [-0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508, 0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921], 'latents_std': [2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743, 3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.916], '_class_name': 'AutoencoderKLWan', '_diffusers_version': '0.34.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--nvidia--Cosmos-Predict2-14B-Text2Image/snapshots/015332720f70dd7b497c1cff9fd0c936a77f160b/vae'})

text_encoder

T5EncoderModel

xpu:0

torch.bfloat16

4864791552

439

T5Config { "architectures": [ "T5EncoderModel" ], "classifier_dropout": 0.0, "d_ff": 65536, "d_kv": 128, "d_model": 1024, "decoder_start_token_id": 0, "dense_act_fn": "relu", "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "relu", "initializer_factor": 1.0, "is_encoder_decoder": true, "is_gated_act": false, "layer_norm_epsilon": 1e-06, "model_type": "t5", "n_positions": 512, "num_decoder_layers": 24, "num_heads": 128, "num_layers": 24, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "task_specific_params": { "summarization": { "early_stopping": true, "length_penalty": 2.0, "max_length": 200, "min_length": 30, "no_repeat_ngram_size": 3, "num_beams": 4, "prefix": "summarize: " }, "translation_en_to_de": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to German: " }, "translation_en_to_fr": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to French: " }, "translation_en_to_ro": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to Romanian: " } }, "torch_dtype": "bfloat16", "transformers_version": "4.53.0", "use_cache": true, "vocab_size": 32128 }

tokenizer

T5TokenizerFast

None

None

0

0

None

transformer

CosmosTransformer3DModel

cpu

torch.bfloat16

14265265152

1458

FrozenDict({'in_channels': 16, 'out_channels': 16, 'num_attention_heads': 40, 'attention_head_dim': 128, 'num_layers': 36, 'mlp_ratio': 4.0, 'text_embed_dim': 1024, 'adaln_lora_dim': 256, 'max_size': [128, 240, 240], 'patch_size': [1, 2, 2], 'rope_scale': [1.0, 4.0, 4.0], 'concat_padding_mask': True, 'extra_pos_embed_type': None, '_class_name': 'CosmosTransformer3DModel', '_diffusers_version': '0.34.0.dev0', '_name_or_path': 'nvidia/Cosmos-Predict2-14B-Text2Image'})

scheduler

FlowMatchEulerDiscreteScheduler

None

None

0

0

FrozenDict({'num_train_timesteps': 1000, 'shift': 1.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': True, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.34.0.dev0', 'final_sigmas_type': 'sigma_min', 'sigma_data': 1.0, 'sigma_max': 80.0, 'sigma_min': 0.002})

safety_checker

Fake_safety_checker

None

None

0

0

None

_name_or_path

str

None

None

0

0

None

_class_name

str

None

None

0

0

None

_diffusers_version

str

None

None

0

0

None

SD.Next master 2025-06-30

Module

Class

Device

DType

Params

Modules

Config

vae

AutoencoderKLWan

xpu:0

torch.bfloat16

126892531

260

FrozenDict({'base_dim': 96, 'z_dim': 16, 'dim_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_scales': [], 'temperal_downsample': [False, True, True], 'dropout': 0.0, 'latents_mean': [-0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508, 0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921], 'latents_std': [2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743, 3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.916], '_class_name': 'AutoencoderKLWan', '_diffusers_version': '0.34.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--nvidia--Cosmos-Predict2-14B-Text2Image/snapshots/015332720f70dd7b497c1cff9fd0c936a77f160b/vae'})

text_encoder

T5EncoderModel

xpu:0

torch.bfloat16

4864791552

439

T5Config { "architectures": [ "T5EncoderModel" ], "classifier_dropout": 0.0, "d_ff": 65536, "d_kv": 128, "d_model": 1024, "decoder_start_token_id": 0, "dense_act_fn": "relu", "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "relu", "initializer_factor": 1.0, "is_encoder_decoder": true, "is_gated_act": false, "layer_norm_epsilon": 1e-06, "model_type": "t5", "n_positions": 512, "num_decoder_layers": 24, "num_heads": 128, "num_layers": 24, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "task_specific_params": { "summarization": { "early_stopping": true, "length_penalty": 2.0, "max_length": 200, "min_length": 30, "no_repeat_ngram_size": 3, "num_beams": 4, "prefix": "summarize: " }, "translation_en_to_de": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to German: " }, "translation_en_to_fr": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to French: " }, "translation_en_to_ro": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to Romanian: " } }, "torch_dtype": "bfloat16", "transformers_version": "4.53.0", "use_cache": true, "vocab_size": 32128 }

tokenizer

T5TokenizerFast

None

None

0

0

None

transformer

CosmosTransformer3DModel

xpu:0

torch.bfloat16

14265265152

1458

FrozenDict({'in_channels': 16, 'out_channels': 16, 'num_attention_heads': 40, 'attention_head_dim': 128, 'num_layers': 36, 'mlp_ratio': 4.0, 'text_embed_dim': 1024, 'adaln_lora_dim': 256, 'max_size': [128, 240, 240], 'patch_size': [1, 2, 2], 'rope_scale': [1.0, 4.0, 4.0], 'concat_padding_mask': True, 'extra_pos_embed_type': None, '_class_name': 'CosmosTransformer3DModel', '_diffusers_version': '0.34.0.dev0', '_name_or_path': 'nvidia/Cosmos-Predict2-14B-Text2Image'})

scheduler

FlowMatchEulerDiscreteScheduler

None

None

0

0

FrozenDict({'num_train_timesteps': 1000, 'shift': 1.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': True, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.34.0.dev0', 'final_sigmas_type': 'sigma_min', 'sigma_data': 1.0, 'sigma_max': 80.0, 'sigma_min': 0.002})

safety_checker

Fake_safety_checker

None

None

0

0

None

_name_or_path

str

None

None

0

0

None

_class_name

str

None

None

0

0

None

_diffusers_version

str

None

None

0

0

None