Info

https://github.com/NVlabs/Sana/tree/main

Code Block

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    guidance_scale=5.0,
    pag_scale=2.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')

Test

Prompt 1: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

...

Code Block
Model: Diffusers/Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers Type: sana Class: SanaPipeline Size: 0 bytes Modified: 2025-07-07 16:03:42

Module	Class	Device	DType	Params	Modules	Config
tokenizer	GemmaTokenizerFast	None	None	0	0	None
text_encoder	Gemma2Model	xpu:0	torch.bfloat16	2614341888	395	Gemma2Config { "architectures": [ "Gemma2Model" ], "attention_bias": false, "attention_dropout": 0.0, "attn_logit_softcapping": 50.0, "bos_token_id": 2, "cache_implementation": "hybrid", "eos_token_id": [ 1, 107 ], "final_logit_softcapping": 30.0, "head_dim": 256, "hidden_act": "gelu_pytorch_tanh", "hidden_activation": "gelu_pytorch_tanh", "hidden_size": 2304, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention" ], "max_position_embeddings": 8192, "model_type": "gemma2", "num_attention_heads": 8, "num_hidden_layers": 26, "num_key_value_heads": 4, "pad_token_id": 0, "query_pre_attn_scalar": 256, "rms_norm_eps": 1e-06, "rope_theta": 10000.0, "sliding_window": 4096, "torch_dtype": "bfloat16", "transformers_version": "4.53.1", "use_cache": true, "vocab_size": 256000 }
vae	AutoencoderDC	xpu:0	torch.bfloat16	312250275	442	FrozenDict({'in_channels': 3, 'latent_channels': 32, 'attention_head_dim': 32, 'encoder_block_types': ['ResBlock', 'ResBlock', 'ResBlock', 'EfficientViTBlock', 'EfficientViTBlock', 'EfficientViTBlock'], 'decoder_block_types': ['ResBlock', 'ResBlock', 'ResBlock', 'EfficientViTBlock', 'EfficientViTBlock', 'EfficientViTBlock'], 'encoder_block_out_channels': [128, 256, 512, 512, 1024, 1024], 'decoder_block_out_channels': [128, 256, 512, 512, 1024, 1024], 'encoder_layers_per_block': [2, 2, 2, 3, 3, 3], 'decoder_layers_per_block': [3, 3, 3, 3, 3, 3], 'encoder_qkv_multiscales': [[], [], [], [5], [5], [5]], 'decoder_qkv_multiscales': [[], [], [], [5], [5], [5]], 'upsample_block_type': 'interpolate', 'downsample_block_type': 'Conv', 'decoder_norm_types': 'rms_norm', 'decoder_act_fns': 'silu', 'scaling_factor': 0.41407, '_class_name': 'AutoencoderDC', '_diffusers_version': '0.33.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers/snapshots/231ba75b89215c82dc070562d00efda1801171dc/vae'})
transformer	SanaTransformer2DModel	xpu:0	torch.bfloat16	4721825952	1581	FrozenDict({'in_channels': 32, 'out_channels': 32, 'num_attention_heads': 70, 'attention_head_dim': 32, 'num_layers': 60, 'num_cross_attention_heads': 20, 'cross_attention_head_dim': 112, 'cross_attention_dim': 2240, 'caption_channels': 2304, 'mlp_ratio': 2.5, 'dropout': 0.0, 'attention_bias': False, 'sample_size': 32, 'patch_size': 1, 'norm_elementwise_affine': False, 'norm_eps': 1e-06, 'interpolation_scale': None, 'guidance_embeds': False, 'guidance_embeds_scale': 0.1, 'qk_norm': 'rms_norm_across_heads', 'timestep_scale': 1.0, '_use_default_values': ['timestep_scale', 'guidance_embeds_scale'], '_class_name': 'SanaTransformer2DModel', '_diffusers_version': '0.33.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers/snapshots/231ba75b89215c82dc070562d00efda1801171dc/transformer'})
scheduler	DPMSolverMultistepScheduler	None	None	0	0	FrozenDict({'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'trained_betas': None, 'solver_order': 2, 'prediction_type': 'flow_prediction', 'thresholding': False, 'dynamic_thresholding_ratio': 0.995, 'sample_max_value': 1.0, 'algorithm_type': 'dpmsolver++', 'solver_type': 'midpoint', 'lower_order_final': True, 'euler_at_final': False, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'use_lu_lambdas': False, 'use_flow_sigmas': True, 'flow_shift': 3.0, 'final_sigmas_type': 'zero', 'lambda_min_clipped': -inf, 'variance_type': None, 'timestep_spacing': 'linspace', 'steps_offset': 0, 'rescale_betas_zero_snr': False, '_class_name': 'DPMSolverMultistepScheduler', '_diffusers_version': '0.33.0.dev0'})
_name_or_path	str	None	None	0	0	None
_class_name	str	None	None	0	0	None
_diffusers_version	str	None	None	0	0	None

Page tree

Versions Compared

Old Version 5

New Version Current

Key

Info

Test

Page tree

Page History

Versions Compared

Old Version 5

New Version Current

Key

Info

Test