Info

https://github.com/NVlabs/Sana/tree/main

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    guidance_scale=5.0,
    pag_scale=2.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')


Test

Prompt 1: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Parameters: Pipeline: SanaPipeline| Steps: 48| Size: 1024x1024| Sampler: DPM2 FlowMatch| Seed: 2754978897| CFG scale: 6| App: SD.Next| Version: 9700cc7| Operations: txt2img| Model: SANA1.5_4.8B_1024px_diffusers| CHI: True

Execution: Time: 5m 26.09s | total 326.12 pipeline 320.52 decode 5.53 | GPU 15602 MB 12% | RAM 4.25 GB 3%


Prompt 2: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.


Prompt 3: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

Parameters: Pipeline: SanaPipeline| Steps: 18| Size: 1024x1024| Sampler: DPM2 FlowMatch| Seed: 2754978897| CFG scale: 6| App: SD.Next| Version: 9700cc7| Operations: txt2img| Model: SANA1.5_4.8B_1024px_diffusers| CHI: True

Execution: Time: 2m 2.34s | total 122.37 pipeline 117.51 decode 4.80 | GPU 15602 MB 12% | RAM 4.19 GB 3%



Bookshopface and handshoes with ribbons
2

4

6

8

10

12

14

16

18

20

24

28

32

48

64

100

System

app: sdnext.git updated: 2025-07-07 hash: 9700cc76 url: https://github.com/vladmandic/sdnext.git/tree/dev
arch: x86_64 cpu: x86_64 system: Linux release: 6.11.0-29-generic
python: 3.12.3 Torch 2.7.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:122.13 used:3.2 total:125.33
xformers: diffusers: 0.35.0.dev0 transformers: 4.53.1
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16

Model

Model: Diffusers/Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers
Type: sana
Class: SanaPipeline
Size: 0 bytes
Modified: 2025-07-07 16:03:42


Module

Class

Device

DType

Params

Modules

Config

tokenizer

GemmaTokenizerFast

None

None

0

0

None

text_encoder

Gemma2Model

xpu:0

torch.bfloat16

2614341888

395

Gemma2Config { "architectures": [ "Gemma2Model" ], "attention_bias": false, "attention_dropout": 0.0, "attn_logit_softcapping": 50.0, "bos_token_id": 2, "cache_implementation": "hybrid", "eos_token_id": [ 1, 107 ], "final_logit_softcapping": 30.0, "head_dim": 256, "hidden_act": "gelu_pytorch_tanh", "hidden_activation": "gelu_pytorch_tanh", "hidden_size": 2304, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention" ], "max_position_embeddings": 8192, "model_type": "gemma2", "num_attention_heads": 8, "num_hidden_layers": 26, "num_key_value_heads": 4, "pad_token_id": 0, "query_pre_attn_scalar": 256, "rms_norm_eps": 1e-06, "rope_theta": 10000.0, "sliding_window": 4096, "torch_dtype": "bfloat16", "transformers_version": "4.53.1", "use_cache": true, "vocab_size": 256000 }

vae

AutoencoderDC

xpu:0

torch.bfloat16

312250275

442

FrozenDict({'in_channels': 3, 'latent_channels': 32, 'attention_head_dim': 32, 'encoder_block_types': ['ResBlock', 'ResBlock', 'ResBlock', 'EfficientViTBlock', 'EfficientViTBlock', 'EfficientViTBlock'], 'decoder_block_types': ['ResBlock', 'ResBlock', 'ResBlock', 'EfficientViTBlock', 'EfficientViTBlock', 'EfficientViTBlock'], 'encoder_block_out_channels': [128, 256, 512, 512, 1024, 1024], 'decoder_block_out_channels': [128, 256, 512, 512, 1024, 1024], 'encoder_layers_per_block': [2, 2, 2, 3, 3, 3], 'decoder_layers_per_block': [3, 3, 3, 3, 3, 3], 'encoder_qkv_multiscales': [[], [], [], [5], [5], [5]], 'decoder_qkv_multiscales': [[], [], [], [5], [5], [5]], 'upsample_block_type': 'interpolate', 'downsample_block_type': 'Conv', 'decoder_norm_types': 'rms_norm', 'decoder_act_fns': 'silu', 'scaling_factor': 0.41407, '_class_name': 'AutoencoderDC', '_diffusers_version': '0.33.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers/snapshots/231ba75b89215c82dc070562d00efda1801171dc/vae'})

transformer

SanaTransformer2DModel

xpu:0

torch.bfloat16

4721825952

1581

FrozenDict({'in_channels': 32, 'out_channels': 32, 'num_attention_heads': 70, 'attention_head_dim': 32, 'num_layers': 60, 'num_cross_attention_heads': 20, 'cross_attention_head_dim': 112, 'cross_attention_dim': 2240, 'caption_channels': 2304, 'mlp_ratio': 2.5, 'dropout': 0.0, 'attention_bias': False, 'sample_size': 32, 'patch_size': 1, 'norm_elementwise_affine': False, 'norm_eps': 1e-06, 'interpolation_scale': None, 'guidance_embeds': False, 'guidance_embeds_scale': 0.1, 'qk_norm': 'rms_norm_across_heads', 'timestep_scale': 1.0, '_use_default_values': ['timestep_scale', 'guidance_embeds_scale'], '_class_name': 'SanaTransformer2DModel', '_diffusers_version': '0.33.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers/snapshots/231ba75b89215c82dc070562d00efda1801171dc/transformer'})

scheduler

DPMSolverMultistepScheduler

None

None

0

0

FrozenDict({'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'trained_betas': None, 'solver_order': 2, 'prediction_type': 'flow_prediction', 'thresholding': False, 'dynamic_thresholding_ratio': 0.995, 'sample_max_value': 1.0, 'algorithm_type': 'dpmsolver++', 'solver_type': 'midpoint', 'lower_order_final': True, 'euler_at_final': False, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'use_lu_lambdas': False, 'use_flow_sigmas': True, 'flow_shift': 3.0, 'final_sigmas_type': 'zero', 'lambda_min_clipped': -inf, 'variance_type': None, 'timestep_spacing': 'linspace', 'steps_offset': 0, 'rescale_betas_zero_snr': False, '_class_name': 'DPMSolverMultistepScheduler', '_diffusers_version': '0.33.0.dev0'})

_name_or_path

str

None

None

0

0

None

_class_name

str

None

None

0

0

None

_diffusers_version

str

None

None

0

0

None


  • No labels