Info

https://huggingface.co/Photoroom/prx-1024-t2i-beta

Resolution: 1024 pixels
Architecture: PRX (MMDiT-like diffusion transformer variant) 
Latent backbone: Flux's VAE
Text encoder: T5-Gemma-2B-2B-UL2
Training stage: Supervised fine-tuning (SFT)

from diffusers.pipelines.prx import PRXPipeline

pipe = PRXPipeline.from_pretrained(
    "Photoroom/prx-1024-t2i-beta",
    torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A front-facing portrait of a lion in the golden savanna at sunset"
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("lion.png")

Test 0 - Different seed variations

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

CFG5, STEP28	Seed: 1620085323	Seed:1931701040	Seed:4075624134	Seed:2736029172
bookshop girl
hand and face
legs and shoes

Test 1 - Bookshop

Prompt: masterpiece, best quality, photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

	4	8	16	32	64
CFG1
CFG2
CFG3
CFG4
CFG5
CFG6
CFG7
CFG8

Test 2 - Face and hand

Prompt: masterpiece, best quality, Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

	8	16	25	50
CFG1
CFG2
CFG3
CFG4
CFG5
CFG6
CFG7
CFG8
CFG10

Test 3 - Legs

Prompt: masterpiece, best quality, Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

	8	16	25	50
CFG1
CFG2
CFG3
CFG4
CFG5
CFG6
CFG8
CFG10

Test 4 - Other model Covers

Test 5 - Art collection

https://civitai.com/collections/10716337

Test 6 - Model own example prompts

System info

Sat Nov 15 10:36:54 2025
app: sdnext.git updated: 2025-11-14 hash: c02192870 url: https://github.com/liutyi/sdnext/tree/pytorch
arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-35-generic 
python: 3.12.3 torch 2.9.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:121.86 used:3.47 total:125.33
xformers: diffusers: 0.36.0.dev0 transformers: 4.57.1
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: Diffusers/Photoroom/prx-1024-t2i-beta [318a05beb7] refiner: none vae: none te: none unet: none

Config

{
 
}

Model info

Diffusers/Photoroom/prx-1024-t2i-beta [318a05beb7]

Module	Class	Device	Dtype	Quant	Params	Modules	Config
transformer	PRXTransformer2DModel	xpu:0	torch.bfloat16	None	1170691648	303	FrozenDict({'in_channels': 16, 'patch_size': 2, 'context_in_dim': 2304, 'hidden_size': 1792, 'mlp_ratio': 3.5, 'num_heads': 28, 'depth': 16, 'axes_dim': [32, 32], 'theta': 10000, 'time_factor': 1000.0, 'time_max_period': 10000, '_name_or_path': 'Photoroom/prx-1024-t2i-beta'})
scheduler	FlowMatchEulerDiscreteScheduler	None	None	None	0	0	FrozenDict({'num_train_timesteps': 1000, 'shift': 3.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_use_default_values': ['use_karras_sigmas', 'use_exponential_sigmas', 'max_image_seq_len', 'use_beta_sigmas', 'time_shift_type', 'shift_terminal', 'use_dynamic_shifting',
text_encoder	T5GemmaEncoder	xpu:0	torch.bfloat16	None	2614341888	448	T5GemmaConfig { "architectures": [ "T5GemmaEncoder" ], "attention_bias": false, "attention_dropout": 0.0, "attn_logit_softcapping": 50.0, "bos_token_id": 2, "classifier_dropout_rate": 0.0, "decoder": { "attention_bias": false, "attention_dropout": 0.0, "attn_logit_softcapping": 50.0, "cross_attention_hidden_size": 2304, "dropout_rate": 0.0, "final_logit_softcapping": 30.0, "head_dim": 256, "hidden_activation": "gelu_pytorch_tanh", "hidden_size": 2304, "initializer_range": 0.02, "intermediate_size": 9216, "is_decoder": true, "layer_types": [ "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention" ], "max_position_embeddings": 8192, "model_type": "t5_gemma_module", "num_attention_heads": 8, "num_hidden_layers": 26, "num_key_value_heads": 4, "query_pre_attn_scalar": 256, "rms_norm_eps": 1e-06, "rope_theta": 10000.0, "sliding_window": 4096, "use_cache": true, "vocab_size": 256000 }, "dropout_rate": 0.0, "dtype": "bfloat16", "encoder": { "attention_bias": false, "attention_dropout": 0.0, "attn_logit_softcapping": 50.0, "dropout_rate": 0.0, "final_logit_softcapping": 30.0, "head_dim": 256, "hidden_activation": "gelu_pytorch_tanh", "hidden_size": 2304, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention" ], "max_position_embeddings": 8192, "model_type": "t5_gemma_module", "num_attention_heads": 8, "num_hidden_layers": 26, "num_key_value_heads": 4, "query_pre_attn_scalar": 256, "rms_norm_eps": 1e-06, "rope_theta": 10000.0, "sliding_window": 4096, "use_cache": true, "vocab_size": 256000 }, "eos_token_id": 1, "final_logit_softcapping": 30.0, "head_dim": 256, "hidden_activation": "gelu_pytorch_tanh", "hidden_size": 2304, "initializer_range": 0.02, "intermediate_size": 9216, "is_encoder_decoder": true, "layer_types": [ "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention" ], "max_position_embeddings": 8192, "model_type": "t5gemma", "num_attention_heads": 8, "num_hidden_layers": 26, "num_key_value_heads": 4, "pad_token_id": 0, "query_pre_attn_scalar": 256, "rms_norm_eps": 1e-06, "rope_theta": 10000.0, "sliding_window": 4096, "transformers_version": "4.57.1", "use_cache": true, "vocab_size": 256000 }
tokenizer	GemmaTokenizerFast	None	None	None	0	0	None
vae	AutoencoderKL	xpu:0	torch.bfloat16	None	83819683	241	FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.3611, 'shift_factor': 0.1159, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--Photoroom--prx-1024-t2i-beta/snapshots/318a05beb7d65d55616e8fc17b325055be0e4756/vae'})
default_sample_size	int	None	None	None	0	0	None

Page tree

Test 54 - PRX 1024 t2i beta