Image AddedImage Added

Info

https://github.com/meituan-longcat/LongCat-Image

...

Code Block

import torch
from diffusers import LongCatImagePipeline

if __name__ == '__main__':
    device = torch.device('cuda')

    pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype= torch.bfloat16 )
    # pipe.to(device, torch.bfloat16)  # Uncomment for high VRAM devices (Faster inference)
    pipe.enable_model_cpu_offload()  # Offload to CPU to save VRAM (Required ~17 GB); slower but prevents OOM

    prompt = '一个年轻的亚裔女性，身穿黄色针织衫，搭配白色项链。她的双手放在膝盖上，表情恬静。背景是一堵粗糙的砖墙，午后的阳光温暖地洒在她身上，营造出一种宁静而温馨的氛围。镜头采用中距离视角，突出她的神态和服饰的细节。光线柔和地打在她的脸上，强调她的五官和饰品的质感，增加画面的层次感与亲和力。整个画面构图简洁，砖墙的纹理与阳光的光影效果相得益彰，突显出人物的优雅与从容。'
    
    image = pipe(
        prompt,
        height=768,
        width=1344,
        guidance_scale=4.0,
        num_inference_steps=50,
        num_images_per_prompt=1,
        generator=torch.Generator("cpu").manual_seed(43),
        enable_cfg_renorm=True,
        enable_prompt_rewrite=True
    ).images[0]

    image.save('./t2i_example.png')

Test 0 - Different seed variations

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

...

CFG4, STEP50	Seed: 1620085323	Seed:1931701040	Seed:4075624134	Seed:2736029172
bookshop girl	Image Added	Image Added	Image Added	Image AddedImage Removed
hand and face	Image Added
legs and shoes

Test 1 - Bookshop

Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling

...

Test 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

...

Test 3 - Legs

Prompt: Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

...

Test 4 - Empty prompts

1024x1024, Steps 50

seed 1

seed 2

seed 3

seed 4

seed 5

seed 6

seed 7

seed 8

seed 9

seed 10

seed 21

seed 38

seed 42

...

sweed 68

285H Time: 16m 59.25s | total 1019.55 pipeline 1017.69 decode 1.53 gc 0.29 | GPU 31792 MB 25% | RAM 49.0 GB 40%

	4	8	16	32	64
CFG1	Image Added	Image Added	Image Added	Image Added	Image Added
CFG2	Image Added	Image Added	Image Added	Image Added	Image Added
CFG3	Image Added	Image Added	Image Added	Image Added	Image Added
CFG4	Image Added	Image Added	Image Added	Image Added	Image Added
CFG5	Image Added	Image Added	Image Added	Image Added	Image Added
CFG6	Image Added	Image Added	Image Added	Image Added	Image Added
CFG8	Image Added	Image Added	Image Added	Image Added	Image Added

Test 2 - Face and hand

Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.

	8	16	20	32	50
CFG1	Image Added	Image Added	Image Added	Image Added	Image Added
CFG2	Image Added	Image Added	Image Added	Image Added	Image Added
CFG3	Image Added	Image Added	Image Added	Image Added	Image Added
CFG4	Image Added	Image Added	Image Added	Image Added	Image Added
CFG5	Image Added	Image Added	Image Added	Image Added	Image Added
CFG6	Image Added	Image Added	Image Added	Image Added	Image Added
CFG7	Image Added	Image Added	Image Added	Image Added	Image Added

Test 3 - Legs

Prompt: Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.

	8	16	32	64
CFG2	Image Added	Image Added	Image Added	Image Added
CFG4	Image Added	Image Added	Image Added	Image Added
CFG6	Image Added	Image Added	Image Added	Image Added
CFG8	Image Added	Image Added	Image Added	Image Added
CFG10	Image Added	Image Added	Image Added	Image Added

Test 4 - Empty prompts

1024x1024, Steps 50

Image Added seed 1	Image Added seed 2	Image Added seed 3	Image Added seed 4	Image Added seed 5
Image Added seed 6	Image Added seed 7	Image Added seed 8	Image Added seed 9	Image Added seed 10
Image Added seed 21	Image Added seed 38	Image Added seed 42	Image Added sweed 68	Image Added seed 2025

Test 5 - Other Models cover

Prompts are in Test 42 - All models cover image

Image AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage Added

Test 6 - Art Prompts

Image AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage Added

Test 7 - Finding the Cover

small mirror

Image AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage Added

mirror wall graffity

Image AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage Added

neon sign

Image AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage Added

System info

Code Block

Fri Dec 19 06:59:12 2025
app: sdnext.git updated: 2025-12-16 hash: c53ebcac5 url: https://github.com/liutyi/sdnext/tree/pytorch
arch: x86_64 cpu: x86_64 system: Linux release: 6.17.0-8-generic
python: 3.12.3 PyTorch 2.9.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex: 
ram: free:118.03 used:5.05 total:123.07
xformers: diffusers: 0.36.0.dev0 transformers: 4.57.3
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: meituan-longcat/LongCat-Image refiner: none vae: none te: none unet: none
ipex native none Scaled-Dot-Product

Config

Code Block

{
  "sd_model_checkpoint": "meituan-longcat/LongCat-Image",
  "diffusers_offload_mode": "none",
  "huggingface_token": "hf_..FraU",
  "diffusers_version": "a748a839add5fe9f45a66e45dd93d8db0b45ce0f",
  "sd_checkpoint_hash": null,
  "queue_paused": true
}

Model info

meituan-longcat/LongCat-Image

seed 2025

Test 5 - Other Models cover

Prompts are in Test 42 - All models cover image

Image RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage Removed

Test 6 - Art Prompts

Image RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage Removed

Test 7 - Finding the Cover

Image RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage RemovedImage Removed

System info

...

Config

Code Block
{ }

...

Module	Class	Device	Dtype	Quant	Params	Modules	Config
vae	AutoencoderKL	xpu:0	torch.bfloat16	None	83819683	241	FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.3611, 'shift_factor': 0.1159, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.30.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--meituan-longcat--LongCat-Image/snapshots/d2ea50b79a930074c37b9b97ce45e3b2ea8cf4d8/vae'})
text_encoder	Qwen2_5_VLForConditionalGeneration	xpu:0	torch.bfloat16	None	8292166656	763	Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.57.3", "use_cache": true, "use_sliding_window": false, "vision_config": { "depth": 32, "dtype": "bfloat16", "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_token_id": 151654, "vocab_size": 152064 }
tokenizer	Qwen2Tokenizer	None	None	None	0	0	None
transformer	LongCatImageTransformer2DModel	xpu:0	torch.bfloat16	None	6270668864	677	FrozenDict({'patch_size': 1, 'in_channels': 64, 'num_layers': 10, 'num_single_layers': 20, 'attention_head_dim': 128, 'num_attention_heads': 24, 'joint_attention_dim': 3584, 'pooled_projection_dim': 3584, 'axes_dims_rope': [16, 56, 56], '_use_default_values': ['axes_dims_rope'], '_class_name': 'LongCatImageTransformer2DModel', '_diffusers_version': '0.30.0.dev0', 'guidance_embeds': False, '_name_or_path': 'meituan-longcat/LongCat-Image'})
scheduler	FlowMatchEulerDiscreteScheduler	None	None	None	0	0	FrozenDict({'num_train_timesteps': 1000, 'shift': 3.0, 'use_dynamic_shifting': True, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_use_default_values': ['shift_terminal', 'stochastic_sampling', 'time_shift_type', 'use_exponential_sigmas', 'use_karras_sigmas', 'use_beta_sigmas', 'invert_sigmas'], '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.30.0.dev0'})
text_processor	Qwen2VLProcessor	None	None	None	0	0	None

Page tree

Versions Compared

Old Version 14

New Version Current

Key

Info

Test 0 - Different seed variations

Test 1 - Bookshop

Test 2 - Face and hand

Test 3 - Legs

Test 4 - Empty prompts

Test 2 - Face and hand

Test 3 - Legs

Test 4 - Empty prompts

Test 5 - Other Models cover

Test 6 - Art Prompts

Test 7 - Finding the Cover

System info

Config

Model info

Test 5 - Other Models cover

Test 6 - Art Prompts

Test 7 - Finding the Cover

System info

Config

Page tree

Page History

Versions Compared

Old Version 14

New Version Current

Key

Info

Test 0 - Different seed variations

Test 1 - Bookshop

Test 2 - Face and hand

Test 3 - Legs

Test 4 - Empty prompts

Test 2 - Face and hand

Test 3 - Legs

Test 4 - Empty prompts

Test 5 - Other Models cover

Test 6 - Art Prompts

Test 7 - Finding the Cover

System info

Config

Model info

Test 5 - Other Models cover

Test 6 - Art Prompts

Test 7 - Finding the Cover

System info

Config