Info

https://huggingface.co/Wan-AI/Wan2.12-TI2V-T2V5B-14BDiffusers
Note: 
If you are using the T2V-1.3B model, we recommend setting the parameter --sample_guide_scale 6. 
The --sample_shift parameter can be adjusted within the range of 8 to 12 based on the performance.

Code Block

model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
height = 704
width = 1280
num_frames = 121
num_inference_steps = 50
guidance_scale = 5.0

prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
#negative_prompt = "Vibrant colors, overexposed, static, blurry details, subtitles, style, artwork, painting, image, still, overall grayish, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, distorted limbs, fingers fused together, static image, cluttered background, three legs, many people in the background, walking backwards."

Test 0 - Different seed variations and resolutions

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Negative: Vibrant colors, overexposed, static, blurry details, subtitles, style, artwork, painting, image, still, overall grayish, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, distorted limbs, fingers fused together, static image, cluttered background, three legs, many people in the background, walking backwards.

Time: 1m 36.82s | total 99.75 pipeline 96.79 te 1.37 vae 1.36 | GPU 29768 MB 24% | RAM 38.62 GB 31%

CFG5, STEP20	Seed: 1620085323	Seed:1931701040	Seed:4075624134	Seed:2736029172
bookshop girl
hand and face
legs and shoes

Test 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

...

Execution: Time: 3m 54.34s | total 439.88 pipeline 233.25 preview 194.33 te 5.20 offload 5.07 vae 1.20 decode 0.80 post 0.27 gc

	8	16	20	32
CFG4
CFG6
CFG8
CFG10

Test 4 - Different seed variations and resolutions

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Time: 45.33s | total 48.41 pipeline 45.30 vae 1.66 te 1.36 | GPU 24466 MB 19% | RAM 34.07 GB 28%

1280px

CFG6, STEP 20

Seed: 1620085323

Seed:1931701040

Seed:4075624134

Seed:2736029172

512px

combined

Image Removed

768px

Image Removed

1024px

Image Removed

System info

Code Block

app: sdnext.git updated: 2025-0712-2106 hash: 34031f54764443213 url: https://github.com/vladmandicliutyi/sdnext.git/tree/devpytorch
arch: x86_64 cpu: x86_64 system: Linux release: 6.1417.0-247-generic
python: 3.12.3 Torch 2.79.1+xpu
ram: free:114.91 used:8.17 total:123.07
device: Intel(R) Arc(TM) Graphics (1) ipex:  
xformers:  diffusers: 0.3536.0.dev0 transformers: 4.5357.21
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: Diffusers/Wan-AI/Wan2.12-T2VTI2V-1.3B5B-Diffusers [0fad780a53b8fff7315c] refiner: none vae: none te: none unet: none

Config

Code Block

{
"sd_model_checkpoint": "Diffusers/Wan-AI/Wan2.1-T2V-1.3B-Diffusers [0fad780a53]",
  "diffusers_version": "9c13f8657986e68f5f05987912c54432fd28d86f",
  "sd_checkpoint_hash": null,
  "diffusers_offload_min_gpu_memory": 0.05,
  "diffusers_offload_max_gpu_memory": 0.95,
  "diffusers_vae_tiling": true,
  "diffusers_vae_tile_size": 512,
  "dynamic_attention_slice_rate": 1,
  "dynamic_attention_trigger_rate": 2,
  "samples_filename_pattern": "[seq]-[date]-[model_name]-[height]x[width]-STEP[steps]-CFG[cfg]-Seed[seed]"
}

Model info

DType

Module	Class	Device	Dtype	Quant	Params	Modules	Config
vae	AutoencoderKLWan	xpu:0	torch.bfloat16	None126892531	704688668	260272	FrozenDict({'base_dim': 160, 'decoder_base_dim': 96256, 'z_dim': 1648, 'dim_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_scales': [], 'temperal_downsample': [False, True, True], 'dropout': 0.0, 'latents_mean': [-0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508, 0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921-0.2289, -0.0052, -0.1323, -0.2339, -0.2799, 0.0174, 0.1838, 0.1557, -0.1382, 0.0542, 0.2813, 0.0891, 0.157, -0.0098, 0.0375, -0.1825, -0.2246, -0.1207, -0.0698, 0.5109, 0.2665, -0.2108, -0.2158, 0.2502, -0.2055, -0.0322, 0.1109, 0.1567, -0.0729, 0.0899, -0.2799, -0.123, -0.0313, -0.1649, 0.0117, 0.0723, -0.2839, -0.2083, -0.052, 0.3748, 0.0152, 0.1957, 0.1433, -0.2944, 0.3573, -0.0548, -0.1681, -0.0667], 'latents_std': [2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743, 3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.916]0.4765, 1.0364, 0.4514, 1.1677, 0.5313, 0.499, 0.4818, 0.5013, 0.8158, 1.0344, 0.5894, 1.0901, 0.6885, 0.6165, 0.8454, 0.4978, 0.5759, 0.3523, 0.7135, 0.6804, 0.5833, 1.4146, 0.8986, 0.5659, 0.7069, 0.5338, 0.4889, 0.4917, 0.4069, 0.4999, 0.6866, 0.4093, 0.5709, 0.6065, 0.6415, 0.4944, 0.5726, 1.2042, 0.5458, 1.6887, 0.3971, 1.06, 0.3943, 0.5537, 0.5444, 0.4089, 0.7468, 0.7744], 'is_residual': True, 'in_channels': 12, 'out_channels': 12, 'patch_size': 2, 'scale_factor_temporal': 4, 'scale_factor_spatial': 16, '_class_name': 'AutoencoderKLWan', '_diffusers_version': '0.3335.0.dev0', 'clip_output': False, '_name_or_path': '/mnt/models/Diffusers/models--Wan-AI--Wan2.12-T2VTI2V-1.3B5B-Diffusers/snapshots/0fad780a534b6463e45facd96134c9f345acfa5bb8fff7315c768468a5333511427288870b2e9635/vae'})
text_encoder	UMT5EncoderModelcpu	xpu:0	torch.bfloat165680910336	None	5680910336	486	UMT5Config { "architectures": [ "UMT5EncoderModel" ], "classifier_dropout": 0.0, "d_ff": 10240, "d_kv": 64, "d_model": 4096, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "dtype": "bfloat16", "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "umt5", "num_decoder_layers": 24, "num_heads": 64, "num_layers": 24, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "scalable_attention": true, "tie_word_embeddings": false, "tokenizer_class": "T5Tokenizer", "torch_dtype": "bfloat16", "transformers_version": "4.5357.21", "use_cache": true, "vocab_size": 256384 }
tokenizer	T5TokenizerFast	None	None	None	0	0	None
transformer	WanTransformer3DModel	xpu:0	torch.bfloat16	None	49997877121418996800	858	FrozenDict({'patch_size': [1, 2, 2], 'num_attention_heads': 1224, 'attention_head_dim': 128, 'in_channels': 1648, 'out_channels': 1648, 'text_dim': 4096, 'freq_dim': 256, 'ffn_dim': 896014336, 'num_layers': 30, 'cross_attn_norm': True, 'qk_norm': 'rms_norm_across_heads', 'eps': 1e-06, 'image_dim': None, 'added_kv_proj_dim': None, 'rope_max_seq_len': 1024, 'pos_embed_seq_len': None, '_use_default_values ': ['pos_embed_seq_len'], '_class_name': 'WanTransformer3DModel', '_diffusers_version': '0.3335.0.dev0', '_name_or_path': 'Wan-AI/Wan2.12-T2VTI2V-1.3B5B-Diffusers'})
scheduler	UniPCMultistepScheduler	None	None	None	0	0	FrozenDict({'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'trained_betas': None, 'solver_order': 2, 'prediction_type': 'flow_prediction', 'thresholding': False, 'dynamic_thresholding_ratio': 0.995, 'sample_max_value': 1.0, 'predict_x0': True, 'solver_type': 'bh2', 'lower_order_final': True, 'disable_corrector': [], 'solver_p': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'use_flow_sigmas': True, 'flow_shift': 35.0, 'timestep_spacing': 'linspace', 'steps_offset': 0, 'final_sigmas_type': 'zero', 'rescale_betas_zero_snr': False, 'use_dynamic_shifting': False, 'time_shift_type': 'exponential', '_use_default_values': ['use_dynamic_shifting', 'time_shift_type'], '_class_name': 'UniPCMultistepScheduler', '_diffusers_version': '0.3335.0.dev0'})
transformer_2	NoneType	None	_name_or_path	str	None	None	0	0	None
boundary_ratio	NoneType	None	_class_name	str	None	None	0	0	None
expand_timesteps	bool	None	_diffusers_version	str	None	None	0	0	None

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Info

Test 0 - Different seed variations and resolutions

Test 1 - Bookshop

Test 4 - Different seed variations and resolutions

System info

Config

Model info

Page tree

Page History

Versions Compared

Old Version 2

New Version 3

Key

Info

Test 0 - Different seed variations and resolutions

Test 1 - Bookshop

Test 4 - Different seed variations and resolutions

System info

Config

Model info