Info
https://huggingface.co/Qwen/Qwen-Image-2512
model_name = "Qwen/Qwen-Image-2512"
# Generate image
prompt = '''A 20-year-old East Asian girl with delicate, charming features and large, bright brown eyes—expressive and lively, with a cheerful or subtly smiling expression. Her naturally wavy long hair is either loose or tied in twin ponytails. She has fair skin and light makeup accentuating her youthful freshness. She wears a modern, cute dress or relaxed outfit in bright, soft colors—lightweight fabric, minimalist cut. She stands indoors at an anime convention, surrounded by banners, posters, or stalls. Lighting is typical indoor illumination—no staged lighting—and the image resembles a casual iPhone snapshot: unpretentious composition, yet brimming with vivid, fresh, youthful charm.'''
negative_prompt = "低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。"
# Generate with different aspect ratios
aspect_ratios = {
"1:1": (1328, 1328),
"16:9": (1664, 928),
"9:16": (928, 1664),
"4:3": (1472, 1104),
"3:4": (1104, 1472),
"3:2": (1584, 1056),
"2:3": (1056, 1584),
}
width, height = aspect_ratios["16:9"]
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=width,
height=height,
num_inference_steps=50,
true_cfg_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]
Negative prompt impact
| w/o negetive | chinese negative | English negative |
|---|---|---|
Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4" Time: 24m 10.44s | total 1503.58 pipeline 1450.39 preview 47.38 callback 4.02 te 1.34 vae 0.40 | GPU 64566 MB 51% | RAM 68.15 GB 55% | Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4" Negative: 低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。 Parameters: Steps: 50| Size: 1328x1328| Seed: 2025| CFG scale: 4| CFG true: 4| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512
|
Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4" Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text. Parameters: Steps: 50| Size: 1328x1328| Seed: 2025| CFG scale: 4| CFG true: 4| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512 |
Test 0 - Different seed variations
Parameters: Steps: 50| Size: 1328x1328| CFG scale: 4| CFG true: 4
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.
| Seed: 1620085323 | Seed:1931701040 | Seed:4075624134 | Seed:2736029172 |
|---|---|---|---|
Test 1 - Bookshop
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.
Parameters: Steps: 32| Size: 1328x1328| Seed: 1620085323| CFG scale: 12| CFG true: 12| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512
| 4 | 8 | 16 | 32 | |
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 6 | ||||
| 8 | ||||
| 10 | ||||
| 12 |
Test 2 - Face and hand
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.
Parameters: Steps: 64| Size: 1328x1328| Seed: 2736029172| CFG scale: 6| CFG true: 6| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512
| 8 | 16 | 32 | 64 | |
|---|---|---|---|---|
| AG1 | ||||
| AG2 | ||||
| AG3 | ||||
| AG4 | ||||
| AG5 | ||||
| AG6 |
Test 3 - Legs
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.
Parameters: Steps: 64| Size: 1328x1328| Seed: 2736029172| CFG scale: 8| CFG true: 8| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512
| 8 | 16 | 32 | 64 | |
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 6 | ||||
| 8 |
Test 4 Other model covers
Test 5 Some other images
System info
Config
Model info
Qwen/Qwen-Image-2512 [25468b98e3]
| Module | Class | Device | Dtype | Quant | Params | Modules | Config |
|---|---|---|---|---|---|---|---|
| vae | AutoencoderKLQwenImage | xpu:0 | torch.bfloat16 | None | 126892531 | 260 | FrozenDict({'base_dim': 96, 'z_dim': 16, 'dim_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_scales': [], 'temperal_downsample': [False, True, True], 'dropout': 0.0, 'input_channels': 3, 'latents_mean': [-0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508, 0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921], 'latents_std': [2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743, 3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.916], '_use_default_values': ['input_channels'], '_class_name': 'AutoencoderKLQwenImage', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Qwen--Qwen-Image-2512/snapshots/25468b98e3276ca6700de15c6628e51b7de54a26/vae'}) |
| text_encoder | Qwen2_5_VLForConditionalGeneration | xpu:0 | torch.bfloat16 | None | 8292166656 | 763 | Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.57.3", "use_cache": true, "use_sliding_window": false, "vision_config": { "depth": 32, "dtype": "bfloat16", "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_token_id": 151654, "vocab_size": 152064 } |
| tokenizer | Qwen2Tokenizer | None | None | None | 0 | 0 | None |
| transformer | QwenImageTransformer2DModel | xpu:0 | torch.bfloat16 | None | 20430401088 | 2297 | FrozenDict({'patch_size': 2, 'in_channels': 64, 'out_channels': 16, 'num_layers': 60, 'attention_head_dim': 128, 'num_attention_heads': 24, 'joint_attention_dim': 3584, 'guidance_embeds': False, 'axes_dims_rope': [16, 56, 56], 'zero_cond_t': False, 'use_additional_t_cond': False, 'use_layer3d_rope': False, '_use_default_values': ['use_layer3d_rope', 'use_additional_t_cond', 'zero_cond_t'], '_class_name': 'QwenImageTransformer2DModel', '_diffusers_version': '0.36.0.dev0', '_name_or_path': 'Qwen/Qwen-Image-2512'}) |
| scheduler | FlowMatchEulerDiscreteScheduler | None | None | None | 0 | 0 | FrozenDict({'num_train_timesteps': 1000, 'shift': 1.0, 'use_dynamic_shifting': True, 'base_shift': 0.5, 'max_shift': 0.9, 'base_image_seq_len': 256, 'max_image_seq_len': 8192, 'invert_sigmas': False, 'shift_terminal': 0.02, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.36.0.dev0'}) |


