You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Model Info and links

z-image-base (civitai), Z-Image (huggingface), 52 images test card of Z Image

import torch
from diffusers import ZImagePipeline

# Load the pipeline
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

# Generate image
prompt = "两名年轻亚裔女性紧密站在一起,背景为朴素的灰色纹理墙面,可能是室内地毯地面。左侧女性留着长卷发,身穿藏青色毛衣,左袖有奶油色褶皱装饰,内搭白色立领衬衫,下身白色裤子;佩戴小巧金色耳钉,双臂交叉于背后。右侧女性留直肩长发,身穿奶油色卫衣,胸前印有“Tun the tables”字样,下方为“New ideas”,搭配白色裤子;佩戴银色小环耳环,双臂交叉于胸前。两人均面带微笑直视镜头。照片,自然光照明,柔和阴影,以藏青、奶油白为主的中性色调,休闲时尚摄影,中等景深,面部和上半身对焦清晰,姿态放松,表情友好,室内环境,地毯地面,纯色背景。"
negative_prompt = "" # Optional, but would be powerful when you want to remove some unwanted content

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1280,
    width=720,
    cfg_normalization=False,
    num_inference_steps=50,
    guidance_scale=4,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("example.png")


Test 0 - Seed and guidance

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

CFG4, STEP50Seed: 1620085323Seed:1931701040Seed:4075624134Seed:2736029172
Bookshop girl



Face and hand



Legs and shoes



Test 1 - Bookstore

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling



8163264
CFG1



CFG2



CFG3



CFG4



CFG5



CFG6



Test 2 - Face and hands

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.


8163264
CFG1



CFG2



CFG3



CFG4



CFG5



CFG6



Test 3 - Legs

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.


8163264
CFG1



CFG2



CFG3



CFG4



CFG5



CFG6



Test 4 - Other model covers


Test 5 - Other prompts

Z Image

Test 6 - Optional find the cover

Test 7 - Empty prompts


seed:1seed:2seed:3seed:4seed:5





seed:6seed:7seed:8seed:9seed:10





seed:21seed:42seed:68seed:324seed:2026






System Info

Wed Jan 28 09:25:42 2026
app: sdnext.git updated: 2026-01-27 hash: 45c898cbc url: https://github.com/liutyi/sdnext/tree/pytorch
arch: x86_64 cpu: x86_64 system: Linux release: 6.17.0-8-generic 
python: 3.12.3 Torch 2.10.0+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:117.0 used:6.07 total:123.07
xformers: diffusers: 0.37.0.dev0 transformers: 4.57.5 
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16 
base: Tongyi-MAI/Z-Image refiner: None vae: Automatic te: Default unet: Default
ipex native none Scaled-Dot-Product


App config

.


Model metadata

epicsoraXL_01 [c6fcb16341]

ModuleClassDeviceDtypeQuantParamsModulesConfig
vaeAutoencoderKLxpu:0torch.bfloat16None83819683241

FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.3611, 'shift_factor': 0.1159, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.30.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Tongyi-MAI--Z-Image/snapshots/e8fa147e7413241c5aa5146a8ae60dc38ade08ae/vae'})

text_encoderQwen3ForCausalLMxpu:0torch.bfloat16None4022468096547

Qwen3Config { "architectures": [ "Qwen3ForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "head_dim": 128, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 9728, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 40960, "max_window_layers": 36, "model_type": "qwen3", "num_attention_heads": 32, "num_hidden_layers": 36, "num_key_value_heads": 8, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000, "sliding_window": null, "tie_word_embeddings": true, "transformers_version": "4.57.5", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 }

tokenizerQwen2TokenizerNoneNoneNone00

None

schedulerFlowMatchEulerDiscreteSchedulerNoneNoneNone00

FrozenDict({'num_train_timesteps': 1000, 'shift': 6.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_use_default_values': ['shift_terminal', 'base_image_seq_len', 'use_beta_sigmas', 'stochastic_sampling', 'time_shift_type', 'base_shift', 'invert_sigmas', 'use_karras_sigmas', 'use_exponential_sigmas', 'max_image_seq_len', 'max_shift'], '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.37.0.dev0'})

transformerZImageTransformer2DModelxpu:0torch.bfloat16None6154908736697

FrozenDict({'all_patch_size': [2], 'all_f_patch_size': [1], 'in_channels': 16, 'dim': 3840, 'n_layers': 30, 'n_refiner_layers': 2, 'n_heads': 30, 'n_kv_heads': 30, 'norm_eps': 1e-05, 'qk_norm': True, 'cap_feat_dim': 2560, 'siglip_feat_dim': None, 'rope_theta': 256.0, 't_scale': 1000.0, 'axes_dims': [32, 48, 48], 'axes_lens': [1536, 512, 512], '_class_name': 'ZImageTransformer2DModel', '_diffusers_version': '0.37.0.dev0', '_name_or_path': 'Tongyi-MAI/Z-Image'})

  • No labels