Info
https://huggingface.co/AIDC-AI/Ovis-Image-7B
...
Test 0 - Different seed variations
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Parameters: Steps: 50| Size: 1024x1024| Seed: 1620085323| CFG scale: 5| App: SD.Next| Version: d7eb90e| Pipeline: OvisImagePipeline| Operations: txt2img| Model: Ovis-Image-7B
285H Time: 9m 19.20s | total 561.76 pipeline 556.42 decode 2.72 callback 2.18 gc 0.38 | GPU 21078 MB 17% | RAM 30.89 GB 25%
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Prompt: Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
| CFG4, STEP50 | Seed: 1620085323 | Seed:1931701040 | Seed:4075624134 | Seed:2736029172 |
|---|---|---|---|---|
| bookshop girl | ||||
| hand and face | ||||
| legs and shoes |
Test 1 - Bookshop
Prompt: photorealistic girl in bookshop choosing the book in romantic stories shelf. smiling
Parameters: Steps: 50| Size: 1024x1024| Seed: 2736029172| CFG scale: 5| App: SD.Next| Version: d7eb90e| Pipeline: OvisImagePipeline| Operations: txt2img| Model: Ovis-Image-7B
285H Time: 9m 16.08s | total 558.72 pipeline 554.54 callback 2.35 decode 1.52 gc 0.29 | GPU 21078 MB 17% | RAM 31.24 GB 25%
| 8 | 16 | 20 | 32 | 50 | |
|---|---|---|---|---|---|
| CFG1 | |||||
| CFG2 | |||||
| CFG3 | |||||
| CFG4 | |||||
| CFG5 | |||||
| CFG6 | |||||
| CFG7 | |||||
| CFG8 |
Test 2 - Face and hand
Prompt: Create a close-up photograph of a woman's face and hand, with her hand raised to her chin. She is wearing a white blazer and has a gold ring on her finger. Her nails are neatly manicured and her hair is pulled back into a low bun. She is smiling and has a radiant expression on her face. The background is a plain light gray color. The overall mood of the photo is elegant and sophisticated. The photo should have a soft, natural light and a slight warmth to it. The woman's hair is dark brown and pulled back into a low bun, with a few loose strands framing her face.
Parameters: Steps: 16| Size: 1024x1024| Seed: 4075624134| CFG scale: 5| App: SD.Next| Version: d7eb90e| Pipeline: OvisImagePipeline| Operations: txt2img| Model: Ovis-Image-7B
285H Time: 2m 58.98s | total 180.04 pipeline 177.45 decode 1.51 callback 0.78 gc 0.28 | GPU 21078 MB 17% | RAM 31.23 GB 25%
| 8 | 16 |
|---|
| 32 |
|---|
| 64 |
|---|
| CFG1 |
| CFG3 |
| CFG5 |
| CFG7 | ||||
| CFG9 |
Test 3 - Legs
Prompt: Prompt: Generate Generate a photo of a woman's legs, with her feet crossed and wearing white high-heeled shoes with ribbons tied around her ankles. The shoes should have a pointed toe and a stiletto heel. The woman's legs should be smooth and tanned, with a slight sheen to them. The background should be a light gray color. The photo should be taken from a low angle, looking up at the woman's legs. The ribbons should be tied in a bow shape around the ankles. The shoes should have a red sole. The woman's legs should be slightly bent at the knee.
Parameters: Steps: 32| Size: 1024x1024| Seed: 1931701040| CFG scale: 5| App: SD.Next| Version: d7eb90e| Pipeline: OvisImagePipeline| Operations: txt2img| Model: Ovis-Image-7B
285H Time: 5m 56.50s | total 358.27 pipeline 354.97 decode 1.52 callback 1.48 gc 0.29 | GPU 21078 MB 17% | RAM 31.29 GB 25%
| 8 | 16 | 32 | 64 |
|---|
| CFG1 | ||||
| CFG3 | ||||
| CFG5 | ||||
| CFG7 | ||||
| CFG11 |
Test 4 CivitAi profile cover generation
...
Test 5 - Empty prompts
1024x1024, Steps 50
seed 1 | seed 2 | seed 3 | seed 4 | seed 5 |
seed 6 | seed 7 | seed 8 | seed 9 | seed 10 |
seed 21 | seed 38 | seed 42 |
seed 68 | seed 2025 |
Test 6 - Other Models cover
Test 7 - Art Prompts
Test 8 - Search for cover image
System info
Compare Art to Z Image turbo
| Ovis Image (7B) | Z Image Turbo (6B) - 90 sec on Intel 285H for 9 steps |
|---|---|
System info
| Code Block |
|---|
Sat Dec 27 16:49:13 2025
app: sdnext.git updated: 2025-12-26 hash: d7eb90eb8 url: https://github.com/liutyi/sdnext/tree/pytorch
arch: x86_64 cpu: x86_64 system: Linux release: 6.17.0-8-generic
python: 3.12.3 Torch: 2.9.1+xpu
device: Intel(R) Arc(TM) Graphics (1) ipex:
ram: free:114.21 used:8.86 total:123.07
xformers: diffusers: 0.36.0.dev0 transformers: 4.57.3
active: xpu dtype: torch.bfloat16 vae: torch.bfloat16 unet: torch.bfloat16
base: Diffusers/AIDC-AI/Ovis-Image-7B [ac8fb1056c] refiner: none vae: none te: none unet: none
ipex native none Scaled-Dot-Product |
| Code Block |
. |
Config
| Code Block |
|---|
{
"sd_model_checkpoint": "Diffusers/AIDC-AI/Ovis-Image-7B [ac8fb1056c]",
"diffusers_offload_mode": "none",
"huggingface_token": "hf_..FraU",
"diffusers_version": "f6b6a7181eb44f0120b29cd897c129275f366c2a",
"sd_checkpoint_hash": null,
"model_qwen_layers": 6
} |
Model info
AIDC-AI/Ovis-Image-7B [ac8fb1056c]| Module | Class | Device | Dtype | Quant | Params | Modules | Config |
|---|---|---|---|---|---|---|---|
| vae | AutoencoderKL | xpu:0 | torch.bfloat16 | None | 83819683 | 241 | FrozenDict({'in_channels': 3, 'out_channels': 3, 'down_block_types': ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D'], 'up_block_types': ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'], 'block_out_channels': [128, 256, 512, 512], 'layers_per_block': 2, 'act_fn': 'silu', 'latent_channels': 16, 'norm_num_groups': 32, 'sample_size': 1024, 'scaling_factor': 0.3611, 'shift_factor': 0.1159, 'latents_mean': None, 'latents_std': None, 'force_upcast': True, 'use_quant_conv': False, 'use_post_quant_conv': False, 'mid_block_add_attention': True, '_class_name': 'AutoencoderKL', '_diffusers_version': '0.30.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--AIDC-AI--Ovis-Image-7B/snapshots/ac8fb1056c6df0b22901ddcabc965336eb9bdc41/vae'}) |
| text_encoder | Qwen3Model | xpu:0 | torch.bfloat16 | None | 1720574976 | 425 | Qwen3Config { "_attn_implementation_autoset": true, "architectures": [ "Qwen3Model" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 151643, "dtype": "bfloat16", "eos_token_id": 151645, "head_dim": 128, "hidden_act": "silu", "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 6144, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 40960, "max_window_layers": 28, "model_type": "qwen3", "num_attention_heads": 16, "num_hidden_layers": 28, "num_key_value_heads": 8, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000, "sliding_window": null, "tie_word_embeddings": true, "transformers_version": "4.57.3", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } |
| tokenizer | Qwen2TokenizerFast | None | None | None | 0 | 0 | None |
| transformer | OvisImageTransformer2DModel | xpu:0 | torch.bfloat16 | None | 7370449728 | 635 | FrozenDict({'patch_size': 1, 'in_channels': 64, 'out_channels': None, 'num_layers': 6, 'num_single_layers': 27, 'attention_head_dim': 128, 'num_attention_heads': 24, 'joint_attention_dim': 2048, 'axes_dims_rope': [16, 56, 56], '_class_name': 'OvisImageTransformer2DModel', '_diffusers_version': '0.36.0.dev0', '_name_or_path': '/mnt/models/Diffusers/models--AIDC-AI--Ovis-Image-7B/snapshots/ac8fb1056c6df0b22901ddcabc965336eb9bdc41/transformer'}) |
| scheduler | FlowMatchEulerDiscreteScheduler | None | None | None | 0 | 0 | FrozenDict({'num_train_timesteps': 1000, 'shift': 3.0, 'use_dynamic_shifting': True, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256, 'max_image_seq_len': 4096, 'invert_sigmas': False, 'shift_terminal': None, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'time_shift_type': 'exponential', 'stochastic_sampling': False, '_use_default_values': ['invert_sigmas', 'use_exponential_sigmas', 'use_karras_sigmas', 'time_shift_type', 'shift_terminal', 'stochastic_sampling', 'use_beta_sigmas'], '_class_name': 'FlowMatchEulerDiscreteScheduler', '_diffusers_version': '0.30.0.dev0'}) |
