...
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Parameters: Steps: 32| Size: 2048x2048| Seed: 1931701040| CFG scale: 1.5| App: SD.Next| Version: 1aee3cc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers
Time: 54m 24.69s | total 3325.52 pipeline 3264.59 vae 17.23 offload 15.34 onload 14.21 te 8.61 callback 5.39 | GPU 52616 MB 41% | RAM 98.26 GB 78%
...
CFG1
CFG2
CFG3
CFG4
CFG5
CFG6
CFG8
| 4 | 8 | 16 | 32 | 64 | |
|---|---|---|---|---|---|
CFG1 CFG2 CFG3 CFG4 CFG5 CFG6 CFG8 |
Test 2 - Face and hand
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
| 8 | 16 | 20 | 32 | |
|---|---|---|---|---|
CFG3 |
Test 3 - Legs
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
| 8 | 16 | 20 | 32 | |
|---|---|---|---|---|
CFG3 |
Test 4 - Other model Covers
512px
1024px
2048px
System info
| Code Block |
|---|
Sat Oct 25 12:53:29 2025
app: sdnext.git updated: 2025-11-20 hash: 187943c3e url: https://github.com/liutyi/sdnext/tree/pytorch
arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-36-generic
Python: 3.12.3 Torch: 2.9.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex:
ram: free:119.21 used:6.12 total:125.33
xformers: diffusers: 0.36.0.dev0 transformers: 4.57.1
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers refiner: none vae: none te: none unet: none
Backend: ipex Pipeline: native Memory optimization: none Cross-attention: Scaled-Dot-Product |
Config
| Code Block |
|---|
"huggingface_token": "hf_..FraU",
"diffusers_version": "cd3bbe2910666880307b84729176203f5785ff7e",
"sd_model_checkpoint": "hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers",
"sd_checkpoint_hash": null,
"schedulers_shift": 4,
"diffusers_offload_mode": "none",
"diffusers_to_gpu": true,
"device_map": "gpu",
"show_progress_type": "Approximate",
"ui_request_timeout": 300000 |
Model info
hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers
| Module | Class | Device | Dtype | Quant | Params | Modules | Config |
|---|---|---|---|---|---|---|---|
| vae | AutoencoderKLHunyuanImage | xpu:0 | torch.bfloat16 | None | 405575491 | 255 | FrozenDict({'in_channels': 3, 'out_channels': 3, 'latent_channels': 64, 'block_out_channels': [128, 256, 512, 512, 1024, 1024], 'layers_per_block': 2, 'spatial_compression_ratio': 32, 'sample_size': 384, 'scaling_factor': 0.75289, 'downsample_match_channel': True, 'upsample_match_channel': True, '_class_name': 'AutoencoderKLHunyuanImage', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--hunyuanvideo-community--HunyuanImage-2.1-Distilled-Diffusers/snapshots/2effeb8511aee5b2ed94984d30c630203404173b/vae'}) |
| text_encoder | Qwen2_5_VLForConditionalGeneration | xpu:0 | torch.bfloat16 | None | 8292166656 | 763 | Qwen2_5_VLConfig { " |
Test 2 - Face and hand
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
...
CFG3
Test 3 - Legs
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
...
CFG3
Test 4 - Other model Covers
512px
1024px
2048px
System info
| Code Block |
|---|
Sat Oct 25 12:53:29 2025
app: sdnext.git updated: 2025-10-24 hash: 88ac83839 url: https://github.com/liutyi/sdnext.git/tree/pytorch
arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-33-generic
python: 3.12.3 python: 3.12.3 Torch: 2.9.0+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex:
ram: free:119.7 used:5.63 total:125.33
xformers: diffusers: 0.36.0.dev0 transformers: 4.57.1
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: Diffusers/hunyuanvideo-community/HunyuanImage-2.1-Diffusers [7e7b7a177d] refiner: none vae: none te: none unet: none
Backend: ipex Pipeline: native Memory optimization: none Cross-attention: Scaled-Dot-Product |
Config
| Code Block |
|---|
"huggingface_token": "hf_..FraU",
"diffusers_version": "7536f647e4144c7acaf9e140893ff7edb85bf9a3",
"sd_model_checkpoint": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers",
"sd_checkpoint_hash": null,
"diffusers_to_gpu": true,
"device_map": "gpu",
"model_wan_stage": "combined",
"diffusers_offload_mode": "none",
"ui_request_timeout": 300000,
"show_progress_type": "Simple" |
Model info
hunyuanvideo-community/HunyuanImage-2.1-Diffusers [7e7b7a177d]
| Module | Class | Device | Dtype | Quant | Params | Modules | Config | ||
|---|---|---|---|---|---|---|---|---|---|
| vae | AutoencoderKLHunyuanImage | cpu | torch.bfloat16 | None | 405575491 | 255 | FrozenDict({'in_channels': 3, 'out_channels': 3, 'latent_channels': 64, 'block_out_channels': [128, 256, 512, 512, 1024, 1024], 'layers_per_block': 2, 'spatial_compression_ratio': 32, 'sample_size': 384, 'scaling_factor': 0.75289, 'downsample_match_channel': True, 'upsample_match_channel': True, '_class_name': 'AutoencoderKLHunyuanImage', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--hunyuanvideo-community--HunyuanImage-2.1-Diffusers/snapshots/7e7b7a177de58591aeaffca0929f4765003d7ced/vae'}) | ||
| text_encoder | Qwen2_5_VLForConditionalGeneration | xpu:0 | torch.bfloat16 | None | 8292166656 | 763 | Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "layermax_position_typesembeddings": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.57.1", "use_cache": true, "use_sliding_window": false, "vision_config": { "depth": 32, "dtype": "bfloat16", "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_token_id": 151654, "vocab_size": 152064 } | ||
| tokenizer | Qwen2Tokenizer | None | None | None | 0 | 0 | None | ||
| text_encoder_2 | T5EncoderModel | xpu:0 | torch.bfloat16 | None | 219314944 | 235 | T5Config { "architectures": [ "T5EncoderModel" ], "classifier_dropout": 0.0, "d_ff": 3584, "d_kv": 64, "d_model": 1472, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.57.1", "use_cache": true, "use_sliding_window": false, "vision_config": { "depth": 32, "dtype": "bfloat16", "eosfullatt_tokenblock_idindexes": 1[ 7, 15, 23, 31 ], "feedhidden_forward_projact": "gated-gelusilu", "gradienthidden_checkpointingsize": false1280, "initializerin_factorchannels": 1.03, "isin_encoder_decoderchans": false3, "isinitializer_gated_actrange": true0.02, "layerintermediate_norm_epsilonsize": 1e-063420, "model_type": "t5", "num_decoder_layers": 4qwen2_5_vl", "num_heads": 616, "numout_hidden_layerssize": 123584, "padpatch_token_idsize": 014, "relativespatial_attentionmerge_max_distancesize": 1282, "relativespatial_attentionpatch_num_bucketssize": 3214, "tietemporal_wordpatch_embeddingssize": false2, "tokenizertokens_per_classsecond": "ByT5Tokenizer"2, "transformerswindow_versionsize": "4.57.1"112 }, "usevision_token_cacheid": false151654, "vocab_size": 1510 152064 } | ||
| tokenizer_2 | ByT5TokenizerQwen2Tokenizer | None | None | None | 0 | 0 | None | ||
| text_encoder_2 | T5EncoderModel | transformer | HunyuanImageTransformer2DModel | xpu:0 | torch.bfloat16 | None | 17425795520 | 1397 | FrozenDict({'in_channels': 64, 'out_channels': 64, 'num_attention_heads': 28, 'attention_head_dim': 128, 'num_layers': 20, 'num_single_layers': 40, 'num_refiner_layers': 2, 'mlp_ratio': 4.0, 'patch_size': [1, 1], 'qk_norm': 'rms_norm', 'guidance_embeds': False, 'text_embed_dim': 3584, 'text_embed_2_dim': 1472, 'rope_theta': 256.0, 'rope_axes_dim': [64, 64], 'use_meanflow': False, '_use_default_values': ['use_meanflow'], '_class_name': 'HunyuanImageTransformer2DModel', '_diffusers_version': '0.36.0.dev0', '_name_or_path': 'hunyuanvideo-community/HunyuanImage-2.1-Diffusers'}) |
| scheduler | FlowMatchEulerDiscreteScheduler | None | None | None | 0 | 0 | FrozenDict({'num_train_timesteps': 1000, 'shift': 5.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.36.0.dev0'}) | ||
| guider | AdaptiveProjectedMixGuidance | None | None | None | 0 | 0 | FrozenDict({'guidance_scale': 3.5, 'guidance_rescale': 0.0, 'adaptive_projected_guidance_scale': 10.0, 'adaptive_projected_guidance_momentum': -0.5, 'adaptive_projected_guidance_rescale': 10.0, 'eta': 0.0, 'use_original_formulation': False, 'start': 0.0, 'stop': 1.0, 'adaptive_projected_guidance_start_step': 5, 'enabled': True, '_class_name': 'AdaptiveProjectedMixGuidance', '_diffusers_version': '0.36.0.dev0'}) | ||
| 219314944 | 235 | T5Config { "architectures": [ "T5EncoderModel" ], "classifier_dropout": 0.0, "d_ff": 3584, "d_kv": 64, "d_model": 1472, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "dtype": "bfloat16", "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "gradient_checkpointing": false, "initializer_factor": 1.0, "is_encoder_decoder": false, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 4, "num_heads": 6, "num_layers": 12, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "tokenizer_class": "ByT5Tokenizer", "transformers_version": "4.57.1", "use_cache": false, "vocab_size": 1510 } | |||||||
| tokenizer_2 | ByT5Tokenizer | None | None | None | 0 | 0 | None | ||
| transformer | HunyuanImageTransformer2DModel | xpu:0 | torch.bfloat16 | None | 17453334976 | 1406 | FrozenDict({'in_channels': 64, 'out_channels': 64, 'num_attention_heads': 28, 'attention_head_dim': 128, 'num_layers': 20, 'num_single_layers': 40, 'num_refiner_layers': 2, 'mlp_ratio': 4.0, 'patch_size': [1, 1], 'qk_norm': 'rms_norm', 'guidance_embeds': True, 'text_embed_dim': 3584, 'text_embed_2_dim': 1472, 'rope_theta': 256.0, 'rope_axes_dim': [64, 64], 'use_meanflow': True, '_class_name': 'HunyuanImageTransformer2DModel', '_diffusers_version': '0.36.0.dev0', '_name_or_path': 'hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers'}) | ||
| scheduler | FlowMatchEulerDiscreteScheduler | ocr_guider | AdaptiveProjectedMixGuidance | None | None | None | 0 | 0 | FrozenDict({'guidance_scale': 3num_train_timesteps': 1000, 'shift': 4.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'guidancemax_rescaleshift': 01.015, 'adaptivebase_projectedimage_guidanceseq_scalelen': 10.0256, 'adaptivemax_projectedimage_guidanceseq_momentumlen': -0.54096, 'adaptive_projected_guidance_rescaleinvert_sigmas': 10.0False, 'etashift_terminal': 0.0None, 'use_originalkarras_formulationsigmas': False, 'startuse_exponential_sigmas': 0.0False, 'stopuse_beta_sigmas': 1.0False, 'adaptivetime_projected_guidance_start_stepshift_type': 38'exponential', 'enabledstochastic_sampling': TrueFalse, '_class_name': 'AdaptiveProjectedMixGuidanceFlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.36.0.dev0'}) |
| guider | NoneType | None | None | None | 0 | 0 | None | ||
| ocr_guider | NoneType | None | None | None | 0 | 0 | None |