Execution: Time: 4m 53.61s | total 293.63 pipeline 293.57 | GPU 18030 MB 14% | RAM 3.21 GB 3%

	CFG=0	CFG=1	CFG=2	CFG=37CFG=
4	CFG=5	CFG=6	CFG=7	CFG=8	CFG=9	4	Image Removed	8	Image Removed	Image Added	Image Added	Image Added	Image Added
8	Image Added	Image Added	Image Added	Image Added
12	Image Added	Image Added	Image Added	Image Added
16	Image Added	Image Added	Image Added	Image Added
20	Image Added	Image Added	Image Added	Image Added
35	Image Added	Image Added	Image Added	Image Added
50	Image Added	Image Added	Image Added	Image Added	12	16	20	35

Part 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

...

Code Block
Model: Diffusers/OmniGen2/OmniGen2 Type: omnigen2 Class: OmniGen2Pipeline Size: 0 bytes Modified: 2025-06-29 07:50:29

Module	Class	Device	DType	Params	Modules	Config
transformer	OmniGen2Transformer2DModel	xpu:0	torch.bfloat16	3967161400	852	FrozenDict({'patch_size': 2, 'in_channels': 16, 'out_channels': None, 'hidden_size': 2520, 'num_layers': 32, 'num_refiner_layers': 2, 'num_attention_heads': 21, 'num_kv_heads': 7, 'multiple_of': 256, 'ffn_dim_multiplier': None, 'norm_eps': 1e-05, 'axes_dim_rope': [40, 40, 40], 'axes_lens': [1024, 1664, 1664], 'text_feat_dim': 2048, 'timestep_scale': 1000.0, '_class_name': 'OmniGen2Transformer2DModel', '_diffusers_version': '0.33.1', '_name_or_path': 'OmniGen2/OmniGen2'})
vae	AutoencoderKL	xpu:0	torch.bfloat16	83819683	241	FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.3611, 'shift_factor': 0.1159, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.33.1', '_name_or_path': '/mnt/models/Diffusers/models--OmniGen2--OmniGen2/snapshots/72b7402a1ff562d16409f60d4f3bdf0e13279b5e/vae'})
scheduler	FlowMatchEulerDiscreteScheduler	None	None	0	0	FrozenDict({'num_train_timesteps': 1000, 'dynamic_time_shift': True, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.33.1'})
mllm	Qwen2_5_VLForConditionalGeneration	xpu:0	torch.bfloat16	3754622976	875	Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 2048, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 128000, "max_window_layers": 70, "model_type": "qwen2_5_vl", "num_attention_heads": 16, "num_hidden_layers": 36, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 2048, "image_token_id": null, "initializer_range": 0.02, "intermediate_size": 11008, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 70, "model_type": "qwen2_5_vl_text", "num_attention_heads": 16, "num_hidden_layers": 36, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "use_cache": true, "use_sliding_window": false, "video_token_id": null, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 151936 }, "torch_dtype": "bfloat16", "transformers_version": "4.53.0", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "depth": 32, "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 2048, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "torch_dtype": "bfloat16", "window_size": 112 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 151936 }
processor	Qwen2_5_VLProcessor	None	None	0	0	None
_name_or_path	str	None	None	0	0	None
_class_name	str	None	None	0	0	None
_diffusers_version	str	None	None	0	0	None

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Part 2 - Face and hand

Page tree

Page History

Versions Compared

Old Version 1

New Version 2

Key

Part 2 - Face and hand