You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

I couldn’t directly open your private wiki pages (they require access), but I did recover enough context + your previous matrices + general prompt-matrix methodology to design a clean, high-signal 4×5 test grid.

Also, based on how prompt matrices work (systematically varying elements to expose model strengths/weaknesses ), I built this to:

  • Cover difficulty tiers (easy → failure cases)

  • Vary structure, verbosity, and syntax

  • Test composition, counting, text rendering, style fidelity, realism, and reasoning

  • Avoid negative prompts (as requested)

  • Keep each prompt visually comparable across models


TEXT2IMAGE TEST MATRIX v3 (4×5 = 20 prompts)

🧩 STRUCTURE

  • Rows = complexity tier

  • Columns = test category

#CATEGORYPROMPT

🟢 ROW 1 — SIMPLE BASELINE (sanity + aesthetic bias)

1. Subject clarity (minimal prompt)

a red apple on a wooden table, soft natural lighting

2. Style adherence

a small cabin in the mountains, watercolor painting, pastel tones

3. Photorealism baseline

portrait photo of a 35 year old man, neutral expression, studio lighting, 85mm lens

4. Composition / framing

a cat sitting in the center of a window frame, symmetrical composition, morning light

5. Color control

a street scene at night illuminated only by neon blue and pink lights


🟡 ROW 2 — CONTROLLED VARIATION (multi-attribute prompts)

6. Multi-object + attributes

three glass bottles, one filled with red liquid, one blue, one green, arranged in a row on a reflective surface

7. Material + lighting interaction

a chrome sphere and a matte black cube on a white surface, strong directional sunlight casting sharp shadows

8. Camera + realism

cinematic photo of a woman walking in rain, wet asphalt reflections, shot on 50mm lens, shallow depth of field

9. Style fusion

a futuristic city skyline in the style of cyberpunk and art deco, highly detailed, dramatic lighting

10. Perspective / angle

extreme low angle view of a towering skyscraper disappearing into fog, wide angle lens distortion


🟠 ROW 3 — COMPLEX PROMPTS (relationships + reasoning)

11. Spatial relationships

a wooden chair placed on top of a table inside a small room, viewed from the doorway

12. Action + interaction

a chef flipping a pancake in mid air in a busy kitchen, motion blur, dynamic composition

13. Counting + variation

five birds sitting on a wire, each bird a different color and size

14. Text rendering

a storefront sign that clearly reads "OPEN 24 HOURS", realistic street photography

15. Lighting logic

a candle lighting a dark room, objects gradually fading into shadow, realistic light falloff


🔴 ROW 4 — HARD / FAILURE CASES

16. Multi-subject + attributes

two identical twins, one wearing black suit and one wearing white suit, standing side by side, neutral background

17. Complex scene description

a cluttered desk with a laptop, a coffee mug, scattered papers, a glowing desk lamp, and a small plant near the edge

18. Reflection + physics

a glass of water on a mirror surface reflecting a sunset sky, realistic reflections and refractions

19. Style + realism conflict

a hyper realistic photograph of a dragon sitting in a modern living room, natural lighting

20. Compositional constraint

a perfectly centered circle inside a square frame, minimalistic design, high contrast black and white


🧠 WHY THIS MATRIX WORKS

This isn’t random — it’s intentionally aligned with known evaluation axes in text-to-image research:

  • Models struggle with multi-object attribute binding and counting

  • Prompt structure (subject + style + modifiers) strongly impacts output quality

What each row exposes:

RowWhat breaks
1bias, default aesthetic, prompt understanding
2attribute binding, style blending
3reasoning, counting, text
4real-world failure cases

⚡ EXTRA (OPTIONAL EXTENSIONS)

If you want this to behave like your previous matrices even more:

Add controlled variables per run

  • Same seed vs random seed

  • CFG scaling sweep (e.g. 3 / 7 / 12)

  • Resolution scaling (512 → 1024)


Script for SD.Next

#!/usr/bin/env bash

API="http://127.0.0.1:7860/sdapi/v1/txt2img"
OUTDIR="outputs_matrix_v3"
mkdir -p "$OUTDIR"

# ---- GLOBAL SETTINGS (KEEP CONSTANT FOR FAIR COMPARISON) ----
STEPS=8
CFG=1
WIDTH=1024
HEIGHT=1024
SAMPLER="Default"
SEED=20260425   # set -1 for random

#MODEL='Diffusers/baidu/ERNIE-Image-Turbo [54f8a75695]'  # optionally pass model name

# ---- PROMPT LIST ----
prompts=(
"a red apple on a wooden table, soft natural lighting"
"a small cabin in the mountains, watercolor painting, pastel tones"
"portrait photo of a 35 year old man, neutral expression, studio lighting, 85mm lens"
"a cat sitting in the center of a window frame, symmetrical composition, morning light"
"a street scene at night illuminated only by neon blue and pink lights"

"three glass bottles, one filled with red liquid, one blue, one green, arranged in a row on a reflective surface"
"a chrome sphere and a matte black cube on a white surface, strong directional sunlight casting sharp shadows"
"cinematic photo of a woman walking in rain, wet asphalt reflections, shot on 50mm lens, shallow depth of field"
"a futuristic city skyline in the style of cyberpunk and art deco, highly detailed, dramatic lighting"
"extreme low angle view of a towering skyscraper disappearing into fog, wide angle lens distortion"

"a wooden chair placed on top of a table inside a small room, viewed from the doorway"
"a chef flipping a pancake in mid air in a busy kitchen, motion blur, dynamic composition"
"five birds sitting on a wire, each bird a different color and size"
"a storefront sign that clearly reads \"OPEN 24 HOURS\", realistic street photography"
"a candle lighting a dark room, objects gradually fading into shadow, realistic light falloff"

"two identical twins, one wearing black suit and one wearing white suit, standing side by side, neutral background"
"a cluttered desk with a laptop, a coffee mug, scattered papers, a glowing desk lamp, and a small plant near the edge"
"a glass of water on a mirror surface reflecting a sunset sky, realistic reflections and refractions"
"a hyper realistic photograph of a dragon sitting in a modern living room, natural lighting"
"a perfectly centered circle inside a square frame, minimalistic design, high contrast black and white"
)

# ---- OPTIONAL: SWITCH MODEL ----
if [ -n "$MODEL" ]; then
  echo "🔄 Switching model to: $MODEL"
  curl -s -X POST http://127.0.0.1:7860/sdapi/v1/options \
    -H "Content-Type: application/json" \
    -d "{\"sd_model_checkpoint\": \"$MODEL\"}" > /dev/null
  sleep 2
fi

# ---- GENERATION LOOP ----
i=1
for prompt in "${prompts[@]}"; do
  printf "\n[%02d/20] Generating...\n" "$i"

  json=$(jq -n \
    --arg prompt "$prompt" \
    --arg sampler "$SAMPLER" \
    --argjson steps $STEPS \
    --argjson cfg $CFG \
    --argjson w $WIDTH \
    --argjson h $HEIGHT \
    --argjson seed $SEED \
    '{
      prompt: $prompt,
      steps: $steps,
      cfg_scale: $cfg,
      width: $w,
      height: $h,
      sampler_name: $sampler,
      seed: $seed,
      batch_size: 1,
      n_iter: 1
    }')

  response=$(curl -s "$API" \
    -H "Content-Type: application/json" \
    -d "$json")

  # Extract base64 image and save
  echo "$response" | jq -r '.images[0]' | base64 -d > \
          "$OUTDIR/$(date --iso)_$(printf "%02d" $i)_seed${SEED}.png"

  ((i++))
done

echo "✅ Done. Images saved to $OUTDIR/"


transposed result matrix


SIMPLE BASELINE (sanity + aesthetic bias)CONTROLLED VARIATION (multi-attribute prompts)COMPLEX PROMPTS (relationships + reasoning)

HARD / FAILURE CASES

1

Subject clarity (minimal prompt)

Multi-object + attributes

Spatial relationships

Multi-subject + attributes


 

 

 

 

2

Style adherence

Material + lighting interaction

Action + interaction

Complex scene description


 

 

 

 

3

Photorealism baseline

Camera + realism

Counting + variation

Reflection + physics


 

 

 

 

4

Composition / framing

Style fusion

Text rendering

Style + realism conflict


 

 

 

 

5Color control

Perspective / angle

Lighting logic

Compositional constraint


 

 

 

 


  • No labels