Model Info and links
https://huggingface.co/black-forest-labs/FLUX.2-klein-4B
| Code Block |
|---|
import torch
from diffusers import Flux2KleinPipeline
device = "cuda"
dtype = torch.bfloat16
pipe = Flux2KleinPipeline.from_pretrained("black-forest-labs/FLUX.2-klein-4B", torch_dtype=dtype)
pipe.enable_model_cpu_offload() # save some VRAM by offloading the model to CPU
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=1.0,
num_inference_steps=4,
generator=torch.Generator(device=device).manual_seed(0)
).images[0]
image.save("flux-klein.png") |
Test 0 - Seed and guidance
...
| CFG 1, STEP 4 | Seed: 1620085323 | Seed:1931701040 | Seed:4075624134 | Seed:2736029172 |
|---|---|---|---|---|
| Bookshop girl | ||||
| Face and hand | ||||
| Legs and shoes |
Test 1 - Bookstore
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
| 2 | 4 | 8 | 16 | 32 | 64||
|---|---|---|---|---|---|---|
| CFG1CFG2 | CFG5 |
Test 2 - Face and hands
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
| 2 | 84 | 166 | 328 | 6410 | |
|---|---|---|---|---|---|
| CFG1CFG2 |
Test 3 - Legs
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
| 2 | 4 | 6 | 8 | 1610 | 321264 | |
|---|---|---|---|---|---|---|
| CFG1CFG1 |
Test 4 - Other model covers
Test 5 - Other prompts
Test 6 - Optional find the cover
Test 7 - Empty prompts
| seed:1 | seed:2 | seed:3 | seed:4 | seed:5 |
|---|---|---|---|---|
| seed:6 | seed:7 | seed:8 | seed:9 | seed:10 |
| seed:21 | seed:42 | seed:68 | seed:324 | seed:2026 |
System Info
| Code Block |
|---|
TueMon JanFeb 62 0807:3512:3809 2026 app: sdnext.git updated: 2026-01-0531 hash: 12d4a059b tag: f79c96be5tags: url: https://github.com/liutyi/sdnext/tree/ipexpytorch arch: x86_64 cpu: x86_64 system: Linux release: 6.14.0-37-generic python: 3.12.3 Torch: 2.710.10+xpu device: Intel(R) Arc(TM) Graphics (1) ipex: 2.7.10+xpu ram: free:4450.980 used:1712.3533 total:62.33 gpu: free:47.68 used:10.47 total:58.15 gpu-active: current:6.77 peak:8.02 gpu-allocated: current:6.77 peak:8.02 gpu-reserved: current:10.47 peak:10.47 gpu-inactive: current:0.44 peak:0.92 events: retries:0 oom:0 utilization: 0 xformers: diffusers: 0.3637.0.dev0 transformers: 4.57.35 active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16 base: epicsoraXL_01 [c6fcb16341Diffusers/black-forest-labs/FLUX.2-klein-4B [5e67da950f] refiner: none vae: none te: none unet: none Backend: ipex Pipeline: native Memory optimization: none Cross-attention: Scaled-Dot-Product |
...
| Code Block |
|---|
. |
Model metadata
epicsoraXL_01 [c6fcb16341Diffusers/black-forest-labs/FLUX.2-klein-4B [5e67da950f]
| Module | Class | Device | Dtype | Quant | Params | Modules | Config | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| vae | AutoencoderKLAutoencoderKLFlux2 | xpu:0 | torch.bfloat16 | None | 8365386384046115 | 243244 | FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 432, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.13025, 'shift_factor': None, 'latents_mean': None, 'latents_std': None, 'force_upcast': FalseTrue, 'use_quant_conv': True, 'use_post_quant_conv': True, 'mid_block_add_attention': True, 'batch_usenorm_default_valueseps': ['shift_factor'0.0001, 'latents_mean', 'use_post_quant_conv', 'mid_block_add_attention', 'latents_std', 'use_quant_conv'batch_norm_momentum': 0.1, 'patch_size': [2, 2], '_class_name': 'AutoencoderKLAutoencoderKLFlux2', '_diffusers_version': '0.2037.0.dev0', '_name_or_path': '../sdxl-vae//mnt/models/Diffusers/models--black-forest-labs--FLUX.2-klein-4B/snapshots/5e67da950fce4a097bc150c22958a05716994cea/vae'}) | ||||||||
| text_encoder | CLIPTextModelQwen3ForCausalLM | xpu:0 | torch.bfloat16 | None | 1230604804022468096 | 152547 | CLIPTextConfig Qwen3Config { "architectures": [ "CLIPTextModelQwen3ForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 0151643, "dropout ": 0.0, "dtype": "float16bfloat16", "eos_token_id": 2151645, "head_dim": 128, "hidden_act": "quick_gelusilu", "hidden_size": 768, "initializer_factor": 1.02560, "initializer_range": 0.02, "intermediate_size": 30729728, "layer_norm_eps": 1e-05, "max_position_embeddings": 77, "model_type": "clip_text_model", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "projection_dim": 768, "transformers_version": "4.57.3", "vocab_size": 49408 } | text_encoder_2 | CLIPTextModelWithProjection | xpu:0 | torch.bfloat16 | None | 694659840 | 393 | CLIPTextConfig { "architectures": [ "CLIPTextModelWithProjection" ], "attention_dropout": 0.0, "bos_token_id": 0, "dropout": 0.0, "dtype": "float16", "eos_token_id": 2, "hidden_act": "gelu", "hidden_size": 1280, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 5120, "layer_norm_eps": 1e-05, "max_position_embeddings": 77types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 40960, "max_window_layers": 36, "model_type": "clip_text_modelqwen3", "num_attention_heads": 2032, "num_hidden_layers": 32, "pad_token_id": 1, "projection_dim": 128036, "num_key_value_heads": 8, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000, "sliding_window": null, "tie_word_embeddings": true, "transformers_version": "4.57.3"5", "use_cache": true, "use_sliding_window": false, "vocab_size": 49408 151936 } |
| tokenizer | CLIPTokenizerQwen2TokenizerFast | None | None | None | 0 | 0 | None | ||||||||
| tokenizer_2scheduler | CLIPTokenizerFlowMatchEulerDiscreteScheduler | None | None | None | 0 | 0 | None | unet | UNet2DConditionModel | xpu:0 | torch.bfloat16 | None | 2567463684 | 1930 | FrozenDict({'sample_size': 128, 'in_channels': 4, 'out_channels': 4, 'center_input_sample': False, 'flip_sin_to_cosnum_train_timesteps': 1000, 'shift': 3.0, 'use_dynamic_shifting': True, 'freqbase_shift': 0.5, 'downmax_block_types': ['DownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D'], 'mid_block_type': 'UNetMidBlock2DCrossAttn', 'up_block_types': ['CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'UpBlock2D'], 'only_cross_attention': False, 'block_out_channels': [320, 640, 1280], 'layers_per_block': 2, 'downsample_padding': 1, 'mid_block_scale_factor': 1, 'dropout': 0.0, 'act_fn': 'silu', 'norm_num_groups': 32, 'norm_eps': 1e-05, 'cross_attention_dim': 2048, 'transformer_layers_per_block': [1, 2, 10], 'reverse_transformer_layers_per_block': None, 'encoder_hid_dim': None, 'encoder_hid_dim_type': None, 'attention_head_dim': [5, 10, 20], 'num_attention_heads': None, 'dual_cross_attention': False, 'use_linear_projection': True, 'class_embed_type': None, 'addition_embed_type': 'text_time', 'addition_time_embed_dim': 256, 'num_class_embeds': None, 'upcast_attention': None, 'resnet_time_scale_shift': 'default', 'resnet_skip_time_act': False, 'resnet_out_scale_factor': 1.0, 'time_embedding_type': 'positional', 'time_embedding_dim': None, 'time_embedding_act_fn': None, 'timestep_post_act': None, 'time_cond_proj_dim': None, 'conv_in_kernel': 3, 'conv_out_kernel': 3, 'projection_class_embeddings_input_dim': 2816, 'attention_type': 'default', 'class_embeddings_concat': False, 'mid_block_only_cross_attention': None, 'cross_attention_norm': None, 'addition_embed_type_num_heads': 64, '_use_default_values': ['dropout', 'attention_type', 'reverse_transformer_layers_per_block'], '_class_name': 'UNet2DConditionModelshift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.1937.0.dev0'}) |
| schedulertransformer | EulerAncestralDiscreteSchedulerFlux2Transformer2DModel | None | xpu:0 | torch.bfloat16None | None | 038755445760 | 356 | FrozenDict({'numpatch_train_timestepssize': 10001, 'betain_startchannels': 0.00085128, 'betaout_endchannels': 0.012None, 'betanum_schedulelayers': 5, 'scalednum_single_linearlayers': 20, 'trainedattention_head_betasdim': None128, 'predictionnum_attention_typeheads': 'epsilon'24, 'joint_attention_dim': 7680, 'timestep_spacingguidance_channels': 'trailing'256, 'stepsmlp_offsetratio': 13.0, 'rescaleaxes_betasdims_zero_snrrope': False[32, 32, 32, 32], 'interpolationrope_typetheta': 2000, 'lineareps': 1e-06, 'useguidance_karras_sigmasembeds': False, '_class_name': 'EulerAncestralDiscreteSchedulerFlux2Transformer2DModel', '_diffusers_version': '0.37.350.1dev0', 'clip_sample': False, 'sample_max_value': 1.0, 'set_alpha_to_one': False, 'skip_prk_steps': True}) | |||||||
| image_encoder | NoneType | None | None | None | 0 | 0 | None | ||||||||
| feature_extractor | NoneType | None | None | None | 0 | 0 | None | ||||||||
_name_or_path': 'black-forest-labs/FLUX.2-klein-4B'}) | |||||||||||||||
| is_distilledforce_zeros_for_empty_prompt | bool | None | None | None | 0 | 0 | None |
