Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

CFG (Guidance scale) seems to be ignored, and attempting to use different flow match samplers also give the same result

Part 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

...

Execution: Time: 4m 53.61s | total 293.63 pipeline 293.57 | GPU 18030 MB 14% | RAM 3.21 GB 3%



CFG=0CFG=1CFG=2CFG=37CFG=
4CFG=5CFG=6CFG=7CFG=8CFG=94

Image Removed

8

Image Removed

1216203550

Image Added

Image Added

Image Added

Image Added

8

Image Added

Image Added

Image Added

Image Added

12

Image Added

Image Added

Image Added

Image Added

16

Image Added

Image Added

Image Added

Image Added

20

Image Added

Image Added

Image Added

Image Added

35

Image Added

Image Added

Image Added

Image Added

50

Image Added

Image Added

Image Added

Image Added

Part 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Parameters: Steps: 4| Size: 1024x1024| Sampler: Euler FlowMatch| Seed: 1297385681| CFG scale: 3| Model: OmniGen2| App: SD.Next| Version: d5d857a| Operations: txt2img| Pipeline: OmniGen2Pipeline

Execution: Time: 2m 31.04s | total 151.05 pipeline 150.98 | GPU 18110 MB 14% | RAM 3.11 GB 2%


Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Parameters: Steps: 32| Size: 1024x1024| Seed: 432887351| Model: OmniGen2| App: SD.Next| Version: d5d857a| Operations: txt2img| Pipeline: OmniGen2Pipeline

Execution: Time: 20m 26.24s | total 1226.25 pipeline 1226.18 | GPU 18110 MB 14% | RAM 2.76 GB 2%


50

4816203250

Euler FlowMatch
Heun FlowMatch
UniPC FlowMatch
DPM2++ SDE FlowMatch


CFG3

Seed: 1297385681

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

CFG0

Seed: 432887351

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

CFG=0CFG=1CFG=2CFG=3CFG=4CFG=5CFG=6CFG=7CFG=8CFG=92468122035

Part 3 - Legs and ribbon

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

Parameters: Steps: 50| Size: 1024x1024| Seed: 2560750479| Model: OmniGen2| App: SD.Next| Version: d5d857a| Operations: txt2img| Pipeline: OmniGen2Pipeline

Execution: Time: 31m 19.53s | total 1879.55 pipeline 1879.49 | GPU 18094 MB 14% | RAM 3.16 GB 3%

50
STEPS4816203250
Seed: 2560750479

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

Seed: 432887351

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

CFG=0CFG=1CFG=2CFG=3CFG=4CFG=5CFG=6CFG=7CFG=8CFG=9246812162035

System

Code Block
app: sdnext.git updated: 2025-06-30 hash: 0d7c025a url: https://github.com/vladmandic/sdnext.git/tree/master
arch: x86_64 cpu: x86_64 system: Linux release: 6.11.0-29-generic 
python: 3.12.3 Torch 2.7.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:122.13 used:3.2 total:125.33
xformers: diffusers: 0.35.0.dev0 transformers: 4.53.0
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: Diffusers/OmniGen2/OmniGen2 [453419871d] refiner: none vae: none te: none unet: none

...

Code Block
Model: Diffusers/OmniGen2/OmniGen2
Type: omnigen2
Class: OmniGen2Pipeline
Size: 0 bytes
Modified: 2025-06-29 07:50:29


Module

Class

Device

DType

Params

Modules

Config

transformer

OmniGen2Transformer2DModel

xpu:0

torch.bfloat16

3967161400

852

FrozenDict({'patch_size': 2, 'in_channels': 16, 'out_channels': None, 'hidden_size': 2520, 'num_layers': 32, 'num_refiner_layers': 2, 'num_attention_heads': 21, 'num_kv_heads': 7, 'multiple_of': 256, 'ffn_dim_multiplier': None, 'norm_eps': 1e-05, 'axes_dim_rope': [40, 40, 40], 'axes_lens': [1024, 1664, 1664], 'text_feat_dim': 2048, 'timestep_scale': 1000.0, '_class_name': 'OmniGen2Transformer2DModel', '_diffusers_version': '0.33.1', '_name_or_path': 'OmniGen2/OmniGen2'})

vae

AutoencoderKL

xpu:0

torch.bfloat16

83819683

241

FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.3611, 'shift_factor': 0.1159, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.33.1', '_name_or_path': '/mnt/models/Diffusers/models--OmniGen2--OmniGen2/snapshots/72b7402a1ff562d16409f60d4f3bdf0e13279b5e/vae'})

scheduler

FlowMatchEulerDiscreteScheduler

None

None

0

0

FrozenDict({'num_train_timesteps': 1000, 'dynamic_time_shift': True, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.33.1'})

mllm

Qwen2_5_VLForConditionalGeneration

xpu:0

torch.bfloat16

3754622976

875

Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 2048, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 128000, "max_window_layers": 70, "model_type": "qwen2_5_vl", "num_attention_heads": 16, "num_hidden_layers": 36, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 2048, "image_token_id": null, "initializer_range": 0.02, "intermediate_size": 11008, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 70, "model_type": "qwen2_5_vl_text", "num_attention_heads": 16, "num_hidden_layers": 36, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "use_cache": true, "use_sliding_window": false, "video_token_id": null, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 151936 }, "torch_dtype": "bfloat16", "transformers_version": "4.53.0", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "depth": 32, "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 2048, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "torch_dtype": "bfloat16", "window_size": 112 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 151936 }

processor

Qwen2_5_VLProcessor

None

None

0

0

None

_name_or_path

str

None

None

0

0

None

_class_name

str

None

None

0

0

None

_diffusers_version

str

None

None

0

0

None