| Info |
|---|
CFG (Guidance scale) seems to be ignored, and attempting to use different flow match samplers also give the same result |
Part 1 - Bookshop
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
...
Execution: Time: 4m 53.61s | total 293.63 pipeline 293.57 | GPU 18030 MB 14% | RAM 3.21 GB 3%
| CFG=0 | CFG=1 | CFG=2 | CFG=37CFG= | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | CFG=5 | CFG=6 | CFG=7 | CFG=8 | CFG=9 | 4 | 8 | 12 | 16 | 20 | 35 | 50 | ||||||
| 8 | ||||||||||||||||||
| 12 | ||||||||||||||||||
| 16 | ||||||||||||||||||
| 20 | ||||||||||||||||||
| 35 | ||||||||||||||||||
| 50 |
Part 2 - Face and hand
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Parameters: Steps: 4| Size: 1024x1024| Sampler: Euler FlowMatch| Seed: 1297385681| CFG scale: 3| Model: OmniGen2| App: SD.Next| Version: d5d857a| Operations: txt2img| Pipeline: OmniGen2Pipeline
Execution: Time: 2m 31.04s | total 151.05 pipeline 150.98 | GPU 18110 MB 14% | RAM 3.11 GB 2%
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Parameters: Steps: 32| Size: 1024x1024| Seed: 432887351| Model: OmniGen2| App: SD.Next| Version: d5d857a| Operations: txt2img| Pipeline: OmniGen2Pipeline
Execution: Time: 20m 26.24s | total 1226.25 pipeline 1226.18 | GPU 18110 MB 14% | RAM 2.76 GB 2%
| 4 | 8 | 16 | 20 | 32 | 50 | |
|---|---|---|---|---|---|---|
Euler FlowMatch CFG3 Seed: 1297385681 | ||||||
CFG0 Seed: 432887351 | CFG=0 | CFG=1 | CFG=2 | CFG=3 | CFG=4 | CFG=5 | CFG=6 | CFG=7 | CFG=8 | CFG=9 | 2 | 4 | 6 | 8 | 12 | 20 | 35 | 50
Part 3 - Legs and ribbon
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
Parameters: Steps: 50| Size: 1024x1024| Seed: 2560750479| Model: OmniGen2| App: SD.Next| Version: d5d857a| Operations: txt2img| Pipeline: OmniGen2Pipeline
Execution: Time: 31m 19.53s | total 1879.55 pipeline 1879.49 | GPU 18094 MB 14% | RAM 3.16 GB 3%
| STEPS | 4 | 8 | 16 | 20 | 32 | 50 |
|---|---|---|---|---|---|---|
| Seed: 2560750479 | ||||||
| Seed: 432887351 | CFG=0 | CFG=1 | CFG=2 | CFG=3 | CFG=4 | CFG=5 | CFG=6 | CFG=7 | CFG=8 | CFG=9 | 2 | 4 | 6 | 8 | 12 | 16 | 20 | 35 | 50
System
| Code Block |
|---|
app: sdnext.git updated: 2025-06-30 hash: 0d7c025a url: https://github.com/vladmandic/sdnext.git/tree/master arch: x86_64 cpu: x86_64 system: Linux release: 6.11.0-29-generic python: 3.12.3 Torch 2.7.1+xpu device: Intel(R) Arc(TM) Graphics (1) ipex: ram: free:122.13 used:3.2 total:125.33 xformers: diffusers: 0.35.0.dev0 transformers: 4.53.0 active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16 base: Diffusers/OmniGen2/OmniGen2 [453419871d] refiner: none vae: none te: none unet: none |
...
| Code Block |
|---|
Model: Diffusers/OmniGen2/OmniGen2 Type: omnigen2 Class: OmniGen2Pipeline Size: 0 bytes Modified: 2025-06-29 07:50:29 |
Module | Class | Device | DType | Params | Modules | Config |
|---|---|---|---|---|---|---|
transformer | OmniGen2Transformer2DModel | xpu:0 | torch.bfloat16 | 3967161400 | 852 | FrozenDict({'patch_size': 2, 'in_channels': 16, 'out_channels': None, 'hidden_size': 2520, 'num_layers': 32, 'num_refiner_layers': 2, 'num_attention_heads': 21, 'num_kv_heads': 7, 'multiple_of': 256, 'ffn_dim_multiplier': None, 'norm_eps': 1e-05, 'axes_dim_rope': [40, 40, 40], 'axes_lens': [1024, 1664, 1664], 'text_feat_dim': 2048, 'timestep_scale': 1000.0, '_class_name': 'OmniGen2Transformer2DModel', '_diffusers_version': '0.33.1', '_name_or_path': 'OmniGen2/OmniGen2'}) |
vae | AutoencoderKL | xpu:0 | torch.bfloat16 | 83819683 | 241 | FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.3611, 'shift_factor': 0.1159, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.33.1', '_name_or_path': '/mnt/models/Diffusers/models--OmniGen2--OmniGen2/snapshots/72b7402a1ff562d16409f60d4f3bdf0e13279b5e/vae'}) |
scheduler | FlowMatchEulerDiscreteScheduler | None | None | 0 | 0 | FrozenDict({'num_train_timesteps': 1000, 'dynamic_time_shift': True, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.33.1'}) |
mllm | Qwen2_5_VLForConditionalGeneration | xpu:0 | torch.bfloat16 | 3754622976 | 875 | Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 2048, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 128000, "max_window_layers": 70, "model_type": "qwen2_5_vl", "num_attention_heads": 16, "num_hidden_layers": 36, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 2048, "image_token_id": null, "initializer_range": 0.02, "intermediate_size": 11008, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 70, "model_type": "qwen2_5_vl_text", "num_attention_heads": 16, "num_hidden_layers": 36, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "use_cache": true, "use_sliding_window": false, "video_token_id": null, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 151936 }, "torch_dtype": "bfloat16", "transformers_version": "4.53.0", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "depth": 32, "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 2048, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "torch_dtype": "bfloat16", "window_size": 112 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 151936 } |
processor | Qwen2_5_VLProcessor | None | None | 0 | 0 | None |
_name_or_path | str | None | None | 0 | 0 | None |
_class_name | str | None | None | 0 | 0 | None |
_diffusers_version | str | None | None | 0 | 0 | None |