Info
https://github.com/NVlabs/Sana/tree/main
prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
prompt=prompt,
guidance_scale=5.0,
pag_scale=2.0,
num_inference_steps=20,
generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')
Test
Prompt 1: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Parameters: Pipeline: SanaPipeline| Steps: 48| Size: 1024x1024| Sampler: DPM2 FlowMatch| Seed: 2754978897| CFG scale: 6| App: SD.Next| Version: 9700cc7| Operations: txt2img| Model: SANA1.5_4.8B_1024px_diffusers| CHI: True
Execution: Time: 5m 26.09s | total 326.12 pipeline 320.52 decode 5.53 | GPU 15602 MB 12% | RAM 4.25 GB 3%
Prompt 2: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Prompt 3: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
Parameters: Pipeline: SanaPipeline| Steps: 18| Size: 1024x1024| Sampler: DPM2 FlowMatch| Seed: 2754978897| CFG scale: 6| App: SD.Next| Version: 9700cc7| Operations: txt2img| Model: SANA1.5_4.8B_1024px_diffusers| CHI: True
Execution: Time: 2m 2.34s | total 122.37 pipeline 117.51 decode 4.80 | GPU 15602 MB 12% | RAM 4.19 GB 3%
| Bookshop | face and hand | shoes with ribbons | |
|---|---|---|---|
| 2 | |||
| 4 | |||
| 6 | |||
| 8 | |||
| 10 | |||
| 12 | |||
| 14 | |||
| 16 | |||
| 18 | |||
| 20 | |||
| 24 | |||
| 28 | |||
| 32 | |||
| 48 | |||
| 64 | |||
| 100 |
System
app: sdnext.git updated: 2025-07-07 hash: 9700cc76 url: https://github.com/vladmandic/sdnext.git/tree/dev arch: x86_64 cpu: x86_64 system: Linux release: 6.11.0-29-generic python: 3.12.3 Torch 2.7.1+xpu device: Intel(R) Arc(TM) Graphics (1) ipex: ram: free:122.13 used:3.2 total:125.33 xformers: diffusers: 0.35.0.dev0 transformers: 4.53.1 active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
Model
Model: Diffusers/Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers Type: sana Class: SanaPipeline Size: 0 bytes Modified: 2025-07-07 16:03:42
Module | Class | Device | DType | Params | Modules | Config |
|---|---|---|---|---|---|---|
tokenizer | GemmaTokenizerFast | None | None | 0 | 0 | None |
text_encoder | Gemma2Model | xpu:0 | torch.bfloat16 | 2614341888 | 395 | Gemma2Config { "architectures": [ "Gemma2Model" ], "attention_bias": false, "attention_dropout": 0.0, "attn_logit_softcapping": 50.0, "bos_token_id": 2, "cache_implementation": "hybrid", "eos_token_id": [ 1, 107 ], "final_logit_softcapping": 30.0, "head_dim": 256, "hidden_act": "gelu_pytorch_tanh", "hidden_activation": "gelu_pytorch_tanh", "hidden_size": 2304, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention" ], "max_position_embeddings": 8192, "model_type": "gemma2", "num_attention_heads": 8, "num_hidden_layers": 26, "num_key_value_heads": 4, "pad_token_id": 0, "query_pre_attn_scalar": 256, "rms_norm_eps": 1e-06, "rope_theta": 10000.0, "sliding_window": 4096, "torch_dtype": "bfloat16", "transformers_version": "4.53.1", "use_cache": true, "vocab_size": 256000 } |
vae | AutoencoderDC | xpu:0 | torch.bfloat16 | 312250275 | 442 | FrozenDict({'in_channels': 3, 'latent_channels': 32, 'attention_head_dim': 32, 'encoder_block_types': ['ResBlock', 'ResBlock', 'ResBlock', 'EfficientViTBlock', 'EfficientViTBlock', 'EfficientViTBlock'], 'decoder_block_types': ['ResBlock', 'ResBlock', 'ResBlock', 'EfficientViTBlock', 'EfficientViTBlock', 'EfficientViTBlock'], 'encoder_block_out_channels': [128, 256, 512, 512, 1024, 1024], 'decoder_block_out_channels': [128, 256, 512, 512, 1024, 1024], 'encoder_layers_per_block': [2, 2, 2, 3, 3, 3], 'decoder_layers_per_block': [3, 3, 3, 3, 3, 3], 'encoder_qkv_multiscales': [[], [], [], [5], [5], [5]], 'decoder_qkv_multiscales': [[], [], [], [5], [5], [5]], 'upsample_block_type': 'interpolate', 'downsample_block_type': 'Conv', 'decoder_norm_types': 'rms_norm', 'decoder_act_fns': 'silu', 'scaling_factor': 0.41407, '_class_name': 'AutoencoderDC', '_diffusers_version': '0.33.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers/snapshots/231ba75b89215c82dc070562d00efda1801171dc/vae'}) |
transformer | SanaTransformer2DModel | xpu:0 | torch.bfloat16 | 4721825952 | 1581 | FrozenDict({'in_channels': 32, 'out_channels': 32, 'num_attention_heads': 70, 'attention_head_dim': 32, 'num_layers': 60, 'num_cross_attention_heads': 20, 'cross_attention_head_dim': 112, 'cross_attention_dim': 2240, 'caption_channels': 2304, 'mlp_ratio': 2.5, 'dropout': 0.0, 'attention_bias': False, 'sample_size': 32, 'patch_size': 1, 'norm_elementwise_affine': False, 'norm_eps': 1e-06, 'interpolation_scale': None, 'guidance_embeds': False, 'guidance_embeds_scale': 0.1, 'qk_norm': 'rms_norm_across_heads', 'timestep_scale': 1.0, '_use_default_values': ['timestep_scale', 'guidance_embeds_scale'], '_class_name': 'SanaTransformer2DModel', '_diffusers_version': '0.33.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers/snapshots/231ba75b89215c82dc070562d00efda1801171dc/transformer'}) |
scheduler | DPMSolverMultistepScheduler | None | None | 0 | 0 | FrozenDict({'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'trained_betas': None, 'solver_order': 2, 'prediction_type': 'flow_prediction', 'thresholding': False, 'dynamic_thresholding_ratio': 0.995, 'sample_max_value': 1.0, 'algorithm_type': 'dpmsolver++', 'solver_type': 'midpoint', 'lower_order_final': True, 'euler_at_final': False, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'use_lu_lambdas': False, 'use_flow_sigmas': True, 'flow_shift': 3.0, 'final_sigmas_type': 'zero', 'lambda_min_clipped': -inf, 'variance_type': None, 'timestep_spacing': 'linspace', 'steps_offset': 0, 'rescale_betas_zero_snr': False, '_class_name': 'DPMSolverMultistepScheduler', '_diffusers_version': '0.33.0.dev0'}) |
_name_or_path | str | None | None | 0 | 0 | None |
_class_name | str | None | None | 0 | 0 | None |
_diffusers_version | str | None | None | 0 | 0 | None |















































