Info

https://huggingface.co/Qwen/Qwen-Image-2512

model_name = "Qwen/Qwen-Image-2512"

# Generate image
prompt = '''A 20-year-old East Asian girl with delicate, charming features and large, bright brown eyes—expressive and lively, with a cheerful or subtly smiling expression. Her naturally wavy long hair is either loose or tied in twin ponytails. She has fair skin and light makeup accentuating her youthful freshness. She wears a modern, cute dress or relaxed outfit in bright, soft colors—lightweight fabric, minimalist cut. She stands indoors at an anime convention, surrounded by banners, posters, or stalls. Lighting is typical indoor illumination—no staged lighting—and the image resembles a casual iPhone snapshot: unpretentious composition, yet brimming with vivid, fresh, youthful charm.'''

negative_prompt = "低分辨率，低画质，肢体畸形，手指畸形，画面过饱和，蜡像感，人脸无细节，过度光滑，画面具有AI感。构图混乱。文字模糊，扭曲。"

# Generate with different aspect ratios
aspect_ratios = {
    "1:1": (1328, 1328),
    "16:9": (1664, 928),
    "9:16": (928, 1664),
    "4:3": (1472, 1104),
    "3:4": (1104, 1472),
    "3:2": (1584, 1056),
    "2:3": (1056, 1584),
}

width, height = aspect_ratios["16:9"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=width,
    height=height,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]

Negative prompt impact

w/o negetive	chinese negative	English negative
Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4" Parameters: Steps: 50\| Size: 1328x1328\| Sampler: Euler FlowMatch\| Seed: 2025\| CFG scale: 4\| CFG true: 4\| App: SD.Next\| Version: 097b6ab\| Pipeline: QwenImagePipeline\| Operations: txt2img\| Model: Qwen-Image-2512 Time: 24m 10.44s \| total 1503.58 pipeline 1450.39 preview 47.38 callback 4.02 te 1.34 vae 0.40 \| GPU 64566 MB 51% \| RAM 68.15 GB 55%	Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4" Negative: 低分辨率，低画质，肢体畸形，手指畸形，画面过饱和，蜡像感，人脸无细节，过度光滑，画面具有AI感。构图混乱。文字模糊，扭曲。 Parameters: Steps: 50\| Size: 1328x1328\| Seed: 2025\| CFG scale: 4\| CFG true: 4\| App: SD.Next\| Version: 097b6ab\| Pipeline: QwenImagePipeline\| Operations: txt2img\| Model: Qwen-Image-2512 Time: 24m 5.53s \| total 1459.56 pipeline 1445.49 preview 8.28 callback 4.03 te 1.32 vae 0.40 \| GPU 64566 MB 51% \| RAM 68.29 GB 55%	Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4" Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text. Parameters: Steps: 50\| Size: 1328x1328\| Seed: 2025\| CFG scale: 4\| CFG true: 4\| App: SD.Next\| Version: 097b6ab\| Pipeline: QwenImagePipeline\| Operations: txt2img\| Model: Qwen-Image-2512

Test 0 - Different seed variations

Parameters: Steps: 50| Size: 1328x1328| CFG scale: 4| CFG true: 4

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

Seed: 1620085323	Seed:1931701040	Seed:4075624134	Seed:2736029172

Test 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

	4	8	16	32
1
2
3
4
6
8
10
12

Test 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

	8	16	32	64
AG1
AG2
AG3
AG4
AG5
AG6

Test 3 - Legs

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

	8	16	32	64
1
2
3
4
5
6
8

Test 4 Other model covers

Test 5 Some other images

System info

Config

Model info

Qwen/Qwen-Image-2512 [25468b98e3]

Module	Class	Device	Dtype	Quant	Params	Modules	Config
vae	AutoencoderKLQwenImage	xpu:0	torch.bfloat16	None	126892531	260	FrozenDict({'base_dim': 96, 'z_dim': 16, 'dim_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_scales': [], 'temperal_downsample': [False, True, True], 'dropout': 0.0, 'input_channels': 3, 'latents_mean': [-0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508, 0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921], 'latents_std': [2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743, 3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.916], '_use_default_values': ['input_channels'], '_class_name': 'AutoencoderKLQwenImage', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Qwen--Qwen-Image-2512/snapshots/25468b98e3276ca6700de15c6628e51b7de54a26/vae'})
text_encoder	Qwen2_5_VLForConditionalGeneration	xpu:0	torch.bfloat16	None	8292166656	763	Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.57.3", "use_cache": true, "use_sliding_window": false, "vision_config": { "depth": 32, "dtype": "bfloat16", "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_token_id": 151654, "vocab_size": 152064 }
tokenizer	Qwen2Tokenizer	None	None	None	0	0	None
transformer	QwenImageTransformer2DModel	xpu:0	torch.bfloat16	None	20430401088	2297	FrozenDict({'patch_size': 2, 'in_channels': 64, 'out_channels': 16, 'num_layers': 60, 'attention_head_dim': 128, 'num_attention_heads': 24, 'joint_attention_dim': 3584, 'guidance_embeds': False, 'axes_dims_rope': [16, 56, 56], 'zero_cond_t': False, 'use_additional_t_cond': False, 'use_layer3d_rope': False, '_use_default_values': ['use_layer3d_rope', 'use_additional_t_cond', 'zero_cond_t'], '_class_name': 'QwenImageTransformer2DModel', '_diffusers_version': '0.36.0.dev0', '_name_or_path': 'Qwen/Qwen-Image-2512'})
scheduler	FlowMatchEulerDiscreteScheduler	None	None	None	0	0	FrozenDict({'num_train_timesteps': 1000, 'shift': 1.0, 'use_dynamic_shifting': True, 'base_shift': 0.5, 'max_shift': 0.9, 'base_image_seq_len': 256, 'max_image_seq_len': 8192, 'invert_sigmas': False, 'shift_terminal': 0.02, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.36.0.dev0'})

Page tree

Test 73 - Qwen Image 2512

Info

Negative prompt impact

Test 0 - Different seed variations

Test 1 - Bookshop

Test 2 - Face and hand

Test 3 - Legs

Test 4 Other model covers

Test 5 Some other images

System info

Config

Model info