You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Info

https://huggingface.co/Qwen/Qwen-Image-2512

model_name = "Qwen/Qwen-Image-2512"

# Generate image
prompt = '''A 20-year-old East Asian girl with delicate, charming features and large, bright brown eyes—expressive and lively, with a cheerful or subtly smiling expression. Her naturally wavy long hair is either loose or tied in twin ponytails. She has fair skin and light makeup accentuating her youthful freshness. She wears a modern, cute dress or relaxed outfit in bright, soft colors—lightweight fabric, minimalist cut. She stands indoors at an anime convention, surrounded by banners, posters, or stalls. Lighting is typical indoor illumination—no staged lighting—and the image resembles a casual iPhone snapshot: unpretentious composition, yet brimming with vivid, fresh, youthful charm.'''

negative_prompt = "低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。"

# Generate with different aspect ratios
aspect_ratios = {
    "1:1": (1328, 1328),
    "16:9": (1664, 928),
    "9:16": (928, 1664),
    "4:3": (1472, 1104),
    "3:4": (1104, 1472),
    "3:2": (1584, 1056),
    "2:3": (1056, 1584),
}

width, height = aspect_ratios["16:9"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=width,
    height=height,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]

Negative prompt impact

w/o negetivechinese negativeEnglish negative

Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4"




Parameters: Steps: 50| Size: 1328x1328| Sampler: Euler FlowMatch| Seed: 2025| CFG scale: 4| CFG true: 4| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512

Time: 24m 10.44s | total 1503.58 pipeline 1450.39 preview 47.38 callback 4.02 te 1.34 vae 0.40 | GPU 64566 MB 51% | RAM 68.15 GB 55%

Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4"

Negative: 低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。

Parameters: Steps: 50| Size: 1328x1328| Seed: 2025| CFG scale: 4| CFG true: 4| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512


Time: 24m 5.53s | total 1459.56 pipeline 1445.49 preview 8.28 callback 4.03 te 1.32 vae 0.40 | GPU 64566 MB 51% | RAM 68.29 GB 55%

 

Prompt: Bright graffiti reads "QWEN IMAGE" on a brick wall with a stream of glowing snowflakes painted behind the text. Graffiti text done in 2 lines using clearly readable and bold font. Dark grey asphalt on a sidewalk below. Graffiti main colors are lime and mint. Large back-lighted building number sign reads "20B" is at upper left side above the graffiti. Graffiti authors signatures are "Alibaba". Young asian stay close to camera in front of the graffiti at the right. Road sign with text "Attention guidance 4"

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

Parameters: Steps: 50| Size: 1328x1328| Seed: 2025| CFG scale: 4| CFG true: 4| App: SD.Next| Version: 097b6ab| Pipeline: QwenImagePipeline| Operations: txt2img| Model: Qwen-Image-2512

Test 0 - Different seed variations

Parameters: Steps: 50| Size: 1328x1328| CFG scale: 4| CFG true: 4

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

Negative: Low resolution, low image quality, distorted limbs and fingers, oversaturated image, wax figure appearance, lack of facial detail, excessive smoothing, AI-like appearance. Chaotic composition. Blurry and distorted text.

Seed: 1620085323Seed:1931701040Seed:4075624134Seed:2736029172

Test 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling


Test 2 - Face and hand 

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.


8163264
AG1

AG2

AG3

AG4

AG5

AG6


Test 3 - Legs

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.



8163264
1

2


3



4



5



6



8



Test 4 Other model covers

Test 5 Some other images


System info


Config


Model info

Qwen/Qwen-Image-2512 [25468b98e3]

ModuleClassDeviceDtypeQuantParamsModulesConfig
vaeAutoencoderKLQwenImagexpu:0torch.bfloat16None126892531260

FrozenDict({'base_dim': 96, 'z_dim': 16, 'dim_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_scales': [], 'temperal_downsample': [False, True, True], 'dropout': 0.0, 'input_channels': 3, 'latents_mean': [-0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508, 0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921], 'latents_std': [2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743, 3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.916], '_use_default_values': ['input_channels'], '_class_name': 'AutoencoderKLQwenImage', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Qwen--Qwen-Image-2512/snapshots/25468b98e3276ca6700de15c6628e51b7de54a26/vae'})

text_encoderQwen2_5_VLForConditionalGenerationxpu:0torch.bfloat16None8292166656763

Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.57.3", "use_cache": true, "use_sliding_window": false, "vision_config": { "depth": 32, "dtype": "bfloat16", "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_token_id": 151654, "vocab_size": 152064 }

tokenizerQwen2TokenizerNoneNoneNone00

None

transformerQwenImageTransformer2DModelxpu:0torch.bfloat16None204304010882297

FrozenDict({'patch_size': 2, 'in_channels': 64, 'out_channels': 16, 'num_layers': 60, 'attention_head_dim': 128, 'num_attention_heads': 24, 'joint_attention_dim': 3584, 'guidance_embeds': False, 'axes_dims_rope': [16, 56, 56], 'zero_cond_t': False, 'use_additional_t_cond': False, 'use_layer3d_rope': False, '_use_default_values': ['use_layer3d_rope', 'use_additional_t_cond', 'zero_cond_t'], '_class_name': 'QwenImageTransformer2DModel', '_diffusers_version': '0.36.0.dev0', '_name_or_path': 'Qwen/Qwen-Image-2512'})

schedulerFlowMatchEulerDiscreteSchedulerNoneNoneNone00

FrozenDict({'num_train_timesteps': 1000, 'shift': 1.0, 'use_dynamic_shifting': True, 'base_shift': 0.5, 'max_shift': 0.9, 'base_image_seq_len': 256, 'max_image_seq_len': 8192, 'invert_sigmas': False, 'shift_terminal': 0.02, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.36.0.dev0'})

  • No labels