https://huggingface.co/tencent/HunyuanImage-2.1
https://github.com/Tencent-Hunyuan/HunyuanImage-2.1/tree/main
# Examples of supported resolutions and aspect ratios for HunyuanImage-2.1:
# 16:9 -> width=2560, height=1536
# 4:3 -> width=2304, height=1792
# 1:1 -> width=2048, height=2048
# 3:4 -> width=1792, height=2304
# 9:16 -> width=1536, height=2560
# Please use one of the above width/height pairs for best results.
width=2048,
height=2048,
use_reprompt=False, # Enable prompt enhancement (which may result in higher GPU memory usage)
use_refiner=True, # Enable refiner model
# For the distilled model, use 8 steps for faster inference.
# For the non-distilled model, use 50 steps for better quality.
num_inference_steps=8 if "distilled" in model_name else 50,
guidance_scale=3.25 if "distilled" in model_name else 3.5,
shift=4 if "distilled" in model_name else 5,
seed=649151,
|
| 1024 | 2048 |
|---|---|
Prompt: A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, wearing a red knitted scarf and a red beret with the word “Hunyuan Image” on it, holding a paintbrush with a focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style. Number 17B is handwritten over the image in the top left corner. Parameters: Steps: 50| Size: 1024x1024| Seed: 32| CFG scale: 3.5| App: SD.Next| Version: 88ac838| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers Time: 23m 2.80s | total 1507.64 pipeline 1382.77 callback 117.19 te 6.66 vae 0.99 | GPU 52508 MB 41% | RAM 62.13 GB 50% |
Prompt: A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, wearing a red knitted scarf and a red beret with the word “Hunyuan Image” on it, holding a paintbrush with a focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style. Number 17B is handwritten over the image in the top left corner. Parameters: Steps: 50| Size: 2048x2048| Seed: 32| CFG scale: 3.5| App: SD.Next| Version: 88ac838| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers Time: 66m 58.09s | total 4316.31 pipeline 4017.98 callback 285.59 te 9.64 vae 2.63 move 0.37 | GPU 58322 MB 45% | RAM 70.51 GB 56% |
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
| CFG3.5, STEP 50 | Seed: 1620085323 | Seed:1931701040 | Seed:4075624134 | Seed:2736029172 |
|---|---|---|---|---|
bookshop girl |
|
|
|
|
| hand and face |
|
|
|
|
| legs and shoes |
|
|
|
|
| CFG3.5, STEP 50 | Seed: 1620085323 | Seed:1931701040 | Seed:4075624134 | Seed:2736029172 |
|---|---|---|---|---|
bookshop girl |
|
|
|
|
| hand and face |
|
|
|
|
| legs and shoes |
|
|
|
|
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Parameters: Steps: 32| Size: 2048x2048| Seed: 1931701040| CFG scale: 1.5| App: SD.Next| Version: 1aee3cc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers
Time: 54m 24.69s | total 3325.52 pipeline 3264.59 vae 17.23 offload 15.34 onload 14.21 te 8.61 callback 5.39 | GPU 52616 MB 41% | RAM 98.26 GB 78%
| 4 | 8 | 16 | 32 | 64 | |
|---|---|---|---|---|---|
CFG1 CFG2 CFG3 CFG4 CFG5 CFG6 CFG8 |
|
|
|
|
|
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
| 8 | 16 | 20 | 32 | |
|---|---|---|---|---|
CFG3 |
|
|
|
|
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
| 8 | 16 | 20 | 32 | |
|---|---|---|---|---|
CFG3 |
|
|
|
|
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| non-Distilled | Distilled |
|---|---|
Prompt: score_9, style_cluster_1679, professional photo, fashion portrait of a beautiful attractive gorgeous pretty sexy woman in a champagne thin silk V-Neck sleeveless backless dress, Instagram influencer photo-model face, intricate details of skin, natural breasts, luxury vibes, dramatic light, photorealism, advertising poster, blurry dim neon light "17" is on the background top left corner Negative: score_1, score_2, score_3, score_4 anime, ugly, bad, wrong, weird, low quality, noisy, grainy, blurry, distorted, deformed, mutated, mutilated, plastic, smooth, text, signature, username, watermark Parameters: Steps: 50| Size: 2048x2048| Seed: 426578498| CFG scale: 3.5| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers |
Prompt: score_9, style_cluster_1679, professional photo, fashion portrait of a beautiful attractive gorgeous pretty sexy woman in a champagne thin silk V-Neck sleeveless backless dress, Instagram influencer photo-model face, intricate details of skin, natural breasts, luxury vibes, dramatic light, photorealism, advertising poster, blurry dim neon light "17" is on the background top left corner Negative: score_1, score_2, score_3, score_4 anime, ugly, bad, wrong, weird, low quality, noisy, grainy, blurry, distorted, deformed, mutated, mutilated, plastic, smooth, text, signature, username, watermark Parameters: Steps: 8| Size: 2048x2048| Seed: 426578498| CFG scale: 3.25| CFG true: 1.4| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Distilled-Diffusers |
Prompt: Photorealistic, ultra-detailed marble reflections, 85mm lens bokeh, soft directional overhead light; empty after-hours office corridor with marble floors and frosted-glass walls; a Korean woman leaning against the wall, one leg bent so her bare foot presses on the cool stone; her high heels lie beside her, coat lifted to reveal her ankle and arch; she holds a stack of documents in one hand, the other teasingly lifting her coat hem; clandestine, charged tension in a corporate space; detailed toes fingers and face details Negative: watermark, cartoon, extra limbs Parameters: Steps: 50| Size: 2048x2048| Seed: 1866289189| CFG scale: 3.5| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers |
Prompt: Photorealistic, ultra-detailed marble reflections, 85mm lens bokeh, soft directional overhead light; empty after-hours office corridor with marble floors and frosted-glass walls; a Korean woman leaning against the wall, one leg bent so her bare foot presses on the cool stone; her high heels lie beside her, coat lifted to reveal her ankle and arch; she holds a stack of documents in one hand, the other teasingly lifting her coat hem; clandestine, charged tension in a corporate space; detailed toes fingers and face details Negative: watermark, cartoon, extra limbs Parameters: Steps: 16| Size: 2048x2048| Seed: 1866289189| CFG scale: 3.25| CFG true: 1.4| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Distilled-Diffusers |
Prompt: Sunlight streams through the arched roof, casting dramatic beams and long shadows across a vintage train station platform lined with ornate iron columns. A stationary dark-colored passenger train occupies the right track; its windows reflect light subtly. The left side features an empty tunnel entrance framed by stone walls. Figures: Two individuals stand near each other on the lower part of the frame. one standing upright holding something red (possibly luggage), facing away from the viewer towards distant tracks or signage marked "17B" in blue text above them under bright sunlight; another figure sits slightly bent forward next to some equipment or bags close behind this person's legs, head bowed as if resting hands together between knees while gazing downward at ground level where shadow meets sunlit area creating high contrast effect due to strong backlighting conditions. Style: Photorealistic illustration mimicking photographic depth but enhanced for artistic drama. Color Palette: Dominated by deep browns, blacks, grays contrasting sharply with golden-yellow warm tones illuminating upper portions especially around central beam of direct sunshine piercing darkness below. Lighting: Strong directional backlit rays create intense highlights on metallic surfaces like railings and pillars' tops while leaving bases cloaked entirely within shade enhancing three-dimensionality via stark luminosity gradients. Texture: Rough stonework visible along tunnels and building facades juxtaposed against smooth polished metalwork defining structural elements such as support posts which bear intricate decorative carvings atop their capitals adding historical architectural charm reminiscent classic European railway stations. Medium: Digital art rendered closely resembling hyper-detailed photography employing selective focus techniques similar portrait photography emphasizing foreground subjects amidst vast background expanse filled soft diffused glow emanating filtered daylight passing overhead arches forming dreamlike ethereal ambiance suggestive quiet contemplation solitude amid bustling transit hub momentarily paused time travel experience before journey commences mood evokes nostalgia tranquility anticipation transition life phases represented physical stillness motion implied awaiting departure arrival potential connections fleeting moments captured suspended narrative tension balance serenity melancholy wonder Stylistic keywords: Hyperrealism, chiaroscuro lighting, atmospheric perspective, textured architecture, cinematic composition, emotional storytelling, interplay of natural vs. artificial structures, timeless setting Parameters: Steps: 50| Size: 2048x2048| Seed: 38| CFG scale: 3.5| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers |
Prompt: Sunlight streams through the arched roof, casting dramatic beams and long shadows across a vintage train station platform lined with ornate iron columns. A stationary dark-colored passenger train occupies the right track; its windows reflect light subtly. The left side features an empty tunnel entrance framed by stone walls. Figures: Two individuals stand near each other on the lower part of the frame. one standing upright holding something red (possibly luggage), facing away from the viewer towards distant tracks or signage marked "17B" in blue text above them under bright sunlight; another figure sits slightly bent forward next to some equipment or bags close behind this person's legs, head bowed as if resting hands together between knees while gazing downward at ground level where shadow meets sunlit area creating high contrast effect due to strong backlighting conditions. Style: Photorealistic illustration mimicking photographic depth but enhanced for artistic drama. Color Palette: Dominated by deep browns, blacks, grays contrasting sharply with golden-yellow warm tones illuminating upper portions especially around central beam of direct sunshine piercing darkness below. Lighting: Strong directional backlit rays create intense highlights on metallic surfaces like railings and pillars' tops while leaving bases cloaked entirely within shade enhancing three-dimensionality via stark luminosity gradients. Texture: Rough stonework visible along tunnels and building facades juxtaposed against smooth polished metalwork defining structural elements such as support posts which bear intricate decorative carvings atop their capitals adding historical architectural charm reminiscent classic European railway stations. Medium: Digital art rendered closely resembling hyper-detailed photography employing selective focus techniques similar portrait photography emphasizing foreground subjects amidst vast background expanse filled soft diffused glow emanating filtered daylight passing overhead arches forming dreamlike ethereal ambiance suggestive quiet contemplation solitude amid bustling transit hub momentarily paused time travel experience before journey commences mood evokes nostalgia tranquility anticipation transition life phases represented physical stillness motion implied awaiting departure arrival potential connections fleeting moments captured suspended narrative tension balance serenity melancholy wonder Stylistic keywords: Hyperrealism, chiaroscuro lighting, atmospheric perspective, textured architecture, cinematic composition, emotional storytelling, interplay of natural vs. artificial structures, timeless setting Parameters: Steps: 16| Size: 2048x2048| Seed: 38| CFG scale: 3.25| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Distilled-Diffusers |
Prompt: Sunlight streams through the arched roof, casting dramatic beams and long shadows across a vintage train station platform lined with ornate iron columns. A stationary dark-colored passenger train occupies the right track; its windows reflect light subtly. The left side features an empty tunnel entrance framed by stone walls. Figures: Two individuals stand near each other on the lower part of the frame. one standing upright holding something red (possibly luggage), facing away from the viewer towards distant tracks or signage marked "17B" in blue text above them under bright sunlight; another figure sits slightly bent forward next to some equipment or bags close behind this person's legs, head bowed as if resting hands together between knees while gazing downward at ground level where shadow meets sunlit area creating high contrast effect due to strong backlighting conditions. Style: Photorealistic illustration mimicking photographic depth but enhanced for artistic drama. Color Palette: Dominated by deep browns, blacks, grays contrasting sharply with golden-yellow warm tones illuminating upper portions especially around central beam of direct sunshine piercing darkness below. Lighting: Strong directional backlit rays create intense highlights on metallic surfaces like railings and pillars' tops while leaving bases cloaked entirely within shade enhancing three-dimensionality via stark luminosity gradients. Texture: Rough stonework visible along tunnels and building facades juxtaposed against smooth polished metalwork defining structural elements such as support posts which bear intricate decorative carvings atop their capitals adding historical architectural charm reminiscent classic European railway stations. Medium: Digital art rendered closely resembling hyper-detailed photography employing selective focus techniques similar portrait photography emphasizing foreground subjects amidst vast background expanse filled soft diffused glow emanating filtered daylight passing overhead arches forming dreamlike ethereal ambiance suggestive quiet contemplation solitude amid bustling transit hub momentarily paused time travel experience before journey commences mood evokes nostalgia tranquility anticipation transition life phases represented physical stillness motion implied awaiting departure arrival potential connections fleeting moments captured suspended narrative tension balance serenity melancholy wonder Stylistic keywords: Hyperrealism, chiaroscuro lighting, atmospheric perspective, textured architecture, cinematic composition, emotional storytelling, interplay of natural vs. artificial structures, timeless setting Parameters: Steps: 50| Size: 2048x2048| Seed: 2025| CFG scale: 3.5| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers |
Prompt: Sunlight streams through the arched roof, casting dramatic beams and long shadows across a vintage train station platform lined with ornate iron columns. A stationary dark-colored passenger train occupies the right track; its windows reflect light subtly. The left side features an empty tunnel entrance framed by stone walls. Figures: Two individuals stand near each other on the lower part of the frame. one standing upright holding something red (possibly luggage), facing away from the viewer towards distant tracks or signage marked "17B" in blue text above them under bright sunlight; another figure sits slightly bent forward next to some equipment or bags close behind this person's legs, head bowed as if resting hands together between knees while gazing downward at ground level where shadow meets sunlit area creating high contrast effect due to strong backlighting conditions. Style: Photorealistic illustration mimicking photographic depth but enhanced for artistic drama. Color Palette: Dominated by deep browns, blacks, grays contrasting sharply with golden-yellow warm tones illuminating upper portions especially around central beam of direct sunshine piercing darkness below. Lighting: Strong directional backlit rays create intense highlights on metallic surfaces like railings and pillars' tops while leaving bases cloaked entirely within shade enhancing three-dimensionality via stark luminosity gradients. Texture: Rough stonework visible along tunnels and building facades juxtaposed against smooth polished metalwork defining structural elements such as support posts which bear intricate decorative carvings atop their capitals adding historical architectural charm reminiscent classic European railway stations. Medium: Digital art rendered closely resembling hyper-detailed photography employing selective focus techniques similar portrait photography emphasizing foreground subjects amidst vast background expanse filled soft diffused glow emanating filtered daylight passing overhead arches forming dreamlike ethereal ambiance suggestive quiet contemplation solitude amid bustling transit hub momentarily paused time travel experience before journey commences mood evokes nostalgia tranquility anticipation transition life phases represented physical stillness motion implied awaiting departure arrival potential connections fleeting moments captured suspended narrative tension balance serenity melancholy wonder Stylistic keywords: Hyperrealism, chiaroscuro lighting, atmospheric perspective, textured architecture, cinematic composition, emotional storytelling, interplay of natural vs. artificial structures, timeless setting Parameters: Steps: 16| Size: 2048x2048| Seed: 2025| CFG scale: 3.25| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Distilled-Diffusers |
Prompt: A surreal, artistic portrait of a serene young woman with closed eyes, surrounded by an explosion of colorful powder and paint splashes. The composition has a dreamlike, ethereal atmosphere with vivid bursts of neon blue, green, orange, pink, and yellow around her face and hair. Her expression is peaceful, as if meditating or lost in thought. The background is dark, almost black, which makes the vibrant colors stand out dramatically. Above her head, the word “HUNYUAN” appears in modern, minimalist typography. Cinematic lighting, ultra-detailed, high resolution, digital art style. Parameters: Steps: 50| Size: 2048x2048| Seed: 42| CFG scale: 3.5| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers |
Prompt: A surreal, artistic portrait of a serene young woman with closed eyes, surrounded by an explosion of colorful powder and paint splashes. The composition has a dreamlike, ethereal atmosphere with vivid bursts of neon blue, green, orange, pink, and yellow around her face and hair. Her expression is peaceful, as if meditating or lost in thought. The background is dark, almost black, which makes the vibrant colors stand out dramatically. Above her head, the word “HUNYUAN” appears in modern, minimalist typography. Cinematic lighting, ultra-detailed, high resolution, digital art style. Parameters: Steps: 8| Size: 2048x2048| Seed: 42| CFG scale: 3.25| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Distilled-Diffusers |
Prompt: nsfw scene from sin city, photorealistic wet sexy young woman holding umbrella and walking in the city square, long hair, transparent wet shirt, mini skirt, buildings with neon signs, during night, visible raindrops falling, reflections on the floor, perfect face, extreme detailed, dslr, leika Parameters: Steps: 50| Size: 2048x2048| Seed: 42| CFG scale: 3.5| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers | ![]() Prompt: nsfw scene from sin city, photorealistic wet sexy young woman holding umbrella and walking in the city square, long hair, transparent wet shirt, mini skirt, buildings with neon signs, during night, visible raindrops falling, reflections on the floor, perfect face, extreme detailed, dslr, leika Parameters: Steps: 8| Size: 2048x2048| Seed: 42| CFG scale: 3.25| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Distilled-Diffusers |
Prompt: A close-up scene showing the text "Hunyuan Image 2.1" meticulously formed by a womans cherry lipstick on the outer surface of a rainy windowpane. The letters are also framed using individual raindrops that gather and slide along the glass, forming clear, sharp characters with slight reflections. The woman’s subtle touch a corner of the window glass with one hand, her face is partially blurred at the edges, visible only faintly through the wet glass - her features soft and indistinct, eyes downcast, hair wet, damp and clinging to her shoulders. Behind the window, a dense urban skyline stretches into the distance, featuring towering skyscrapers with reflective surfaces catching the dim twilight glow; fog curls around some buildings, adding depth and atmospheric haze. In the upper-left corner of the composition, written in loose, uneven cursive handwriting in faded blue ink, the number 17B appears subtly beneath the edge of the frame. Rain streaks cascade diagonally across the glass, enhancing texture and motion, creating a sense of quiet intensity and isolation within the city. Parameters: Steps: 50| Size: 2048x2048| Seed: 1873624607| CFG scale: 3.5| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Diffusers |
Prompt: A close-up scene showing the text "Hunyuan Image 2.1" meticulously formed by a womans cherry lipstick on the outer surface of a rainy windowpane. The letters are also framed using individual raindrops that gather and slide along the glass, forming clear, sharp characters with slight reflections. The woman’s subtle touch a corner of the window glass with one hand, her face is partially blurred at the edges, visible only faintly through the wet glass - her features soft and indistinct, eyes downcast, hair wet, damp and clinging to her shoulders. Behind the window, a dense urban skyline stretches into the distance, featuring towering skyscrapers with reflective surfaces catching the dim twilight glow; fog curls around some buildings, adding depth and atmospheric haze. In the upper-left corner of the composition, written in loose, uneven cursive handwriting in faded blue ink, the number 17B appears subtly beneath the edge of the frame. Rain streaks cascade diagonally across the glass, enhancing texture and motion, creating a sense of quiet intensity and isolation within the city. Parameters: Steps: 8| Size: 2048x2048| Seed: 1873624607| CFG scale: 3.25| App: SD.Next| Version: ded5afc| Pipeline: HunyuanImagePipeline| Operations: txt2img| Model: HunyuanImage-2.1-Distilled-Diffusers |
Sat Oct 25 12:53:29 2025 app: sdnext.git updated: 2025-10-24 hash: 88ac83839 url: https://github.com/liutyi/sdnext.git/tree/pytorch arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-33-generic python: 3.12.3 python: 3.12.3 Torch: 2.9.0+xpu device: Intel(R) Arc(TM) Graphics (1) ipex: ram: free:119.7 used:5.63 total:125.33 xformers: diffusers: 0.36.0.dev0 transformers: 4.57.1 active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16 base: Diffusers/hunyuanvideo-community/HunyuanImage-2.1-Diffusers [7e7b7a177d] refiner: none vae: none te: none unet: none Backend: ipex Pipeline: native Memory optimization: none Cross-attention: Scaled-Dot-Product |
"huggingface_token": "hf_..FraU", "diffusers_version": "7536f647e4144c7acaf9e140893ff7edb85bf9a3", "sd_model_checkpoint": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "sd_checkpoint_hash": null, "diffusers_to_gpu": true, "device_map": "gpu", "model_wan_stage": "combined", "diffusers_offload_mode": "none", "ui_request_timeout": 300000, "show_progress_type": "Simple" |
hunyuanvideo-community/HunyuanImage-2.1-Diffusers [7e7b7a177d]
| Module | Class | Device | Dtype | Quant | Params | Modules | Config |
|---|---|---|---|---|---|---|---|
| vae | AutoencoderKLHunyuanImage | cpu | torch.bfloat16 | None | 405575491 | 255 | FrozenDict({'in_channels': 3, 'out_channels': 3, 'latent_channels': 64, 'block_out_channels': [128, 256, 512, 512, 1024, 1024], 'layers_per_block': 2, 'spatial_compression_ratio': 32, 'sample_size': 384, 'scaling_factor': 0.75289, 'downsample_match_channel': True, 'upsample_match_channel': True, '_class_name': 'AutoencoderKLHunyuanImage', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--hunyuanvideo-community--HunyuanImage-2.1-Diffusers/snapshots/7e7b7a177de58591aeaffca0929f4765003d7ced/vae'}) |
| text_encoder | Qwen2_5_VLForConditionalGeneration | xpu:0 | torch.bfloat16 | None | 8292166656 | 763 | Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.57.1", "use_cache": true, "use_sliding_window": false, "vision_config": { "depth": 32, "dtype": "bfloat16", "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_token_id": 151654, "vocab_size": 152064 } |
| tokenizer | Qwen2Tokenizer | None | None | None | 0 | 0 | None |
| text_encoder_2 | T5EncoderModel | xpu:0 | torch.bfloat16 | None | 219314944 | 235 | T5Config { "architectures": [ "T5EncoderModel" ], "classifier_dropout": 0.0, "d_ff": 3584, "d_kv": 64, "d_model": 1472, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "dtype": "bfloat16", "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "gradient_checkpointing": false, "initializer_factor": 1.0, "is_encoder_decoder": false, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 4, "num_heads": 6, "num_layers": 12, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "tokenizer_class": "ByT5Tokenizer", "transformers_version": "4.57.1", "use_cache": false, "vocab_size": 1510 } |
| tokenizer_2 | ByT5Tokenizer | None | None | None | 0 | 0 | None |
| transformer | HunyuanImageTransformer2DModel | xpu:0 | torch.bfloat16 | None | 17425795520 | 1397 | FrozenDict({'in_channels': 64, 'out_channels': 64, 'num_attention_heads': 28, 'attention_head_dim': 128, 'num_layers': 20, 'num_single_layers': 40, 'num_refiner_layers': 2, 'mlp_ratio': 4.0, 'patch_size': [1, 1], 'qk_norm': 'rms_norm', 'guidance_embeds': False, 'text_embed_dim': 3584, 'text_embed_2_dim': 1472, 'rope_theta': 256.0, 'rope_axes_dim': [64, 64], 'use_meanflow': False, '_use_default_values': ['use_meanflow'], '_class_name': 'HunyuanImageTransformer2DModel', '_diffusers_version': '0.36.0.dev0', '_name_or_path': 'hunyuanvideo-community/HunyuanImage-2.1-Diffusers'}) |
| scheduler | FlowMatchEulerDiscreteScheduler | None | None | None | 0 | 0 | FrozenDict({'num_train_timesteps': 1000, 'shift': 5.0, 'use_dynamic_shifting': False, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.36.0.dev0'}) |
| guider | AdaptiveProjectedMixGuidance | None | None | None | 0 | 0 | FrozenDict({'guidance_scale': 3.5, 'guidance_rescale': 0.0, 'adaptive_projected_guidance_scale': 10.0, 'adaptive_projected_guidance_momentum': -0.5, 'adaptive_projected_guidance_rescale': 10.0, 'eta': 0.0, 'use_original_formulation': False, 'start': 0.0, 'stop': 1.0, 'adaptive_projected_guidance_start_step': 5, 'enabled': True, '_class_name': 'AdaptiveProjectedMixGuidance', '_diffusers_version': '0.36.0.dev0'}) |
| ocr_guider | AdaptiveProjectedMixGuidance | None | None | None | 0 | 0 | FrozenDict({'guidance_scale': 3.5, 'guidance_rescale': 0.0, 'adaptive_projected_guidance_scale': 10.0, 'adaptive_projected_guidance_momentum': -0.5, 'adaptive_projected_guidance_rescale': 10.0, 'eta': 0.0, 'use_original_formulation': False, 'start': 0.0, 'stop': 1.0, 'adaptive_projected_guidance_start_step': 38, 'enabled': True, '_class_name': 'AdaptiveProjectedMixGuidance', '_diffusers_version': '0.36.0.dev0'}) |