9.7 KiB
9.7 KiB
Prompting best practices
These prompting principles are shared by both top-level modes of the skill:
- built-in
image_gentool (default) - explicit
scripts/image_gen.pyCLI fallback
This file is about prompt structure, specificity, and iteration. Fallback-only execution controls such as quality, input_fidelity, masks, output format, and output paths live in the fallback docs.
Contents
- Structure
- Specificity policy
- Allowed and disallowed augmentation
- Composition and layout
- Constraints and invariants
- Text in images
- Input images and references
- Iterate deliberately
- Transparent images
- Fallback-only execution controls
- Use-case tips
- Where to find copy/paste recipes
Structure
- Use a consistent order: scene/backdrop -> subject -> key details -> constraints -> output intent.
- Include intended use (ad, UI mock, infographic) to set the level of polish.
- For complex requests, use short labeled lines instead of one long paragraph.
Specificity policy
- If the user prompt is already specific and detailed, normalize it into a clean spec without adding creative requirements.
- If the prompt is generic, you may add tasteful detail when it materially improves the output.
- Treat examples in
sample-prompts.mdas fully-authored recipes, not as the default amount of augmentation to add to every request. - For photorealism, include
photorealisticdirectly when that is the goal, plus concrete real-world texture such as pores, wrinkles, fabric wear, material grain, or imperfect everyday detail.
Allowed and disallowed augmentation
Allowed augmentation for generic prompts:
- composition and framing cues
- intended-use or polish-level hints
- practical layout guidance
- reasonable scene concreteness that supports the request
Do not add:
- extra characters, props, or objects that are not implied
- brand palettes, slogans, or story beats that are not implied
- arbitrary side-specific placement unless the surrounding layout supports it
Composition and layout
- Specify framing and viewpoint (close-up, wide, top-down) and placement only when it materially helps.
- Call out negative space if the asset clearly needs room for UI or copy.
- Avoid making left/right layout decisions unless the user or surrounding layout supports them.
- For people, describe body framing, scale, gaze, and object interactions when they matter (
full body visible,looking down at the book,hands naturally gripping the handlebars).
Constraints and invariants
- State what must not change (
keep background unchanged). - For edits, say
change only X; keep Y unchangedand repeat invariants on every iteration to reduce drift.
Text in images
- Put literal text in quotes or ALL CAPS and specify typography (font style, size, color, placement).
- Spell uncommon words letter-by-letter if accuracy matters.
- For in-image copy, require verbatim rendering and no extra characters.
- In CLI fallback mode, use
mediumorhighquality for small text, dense infographics, data-heavy slides, multi-font layouts, legends, axes, and footnotes.
Input images and references
- Do not assume that every provided image is an edit target.
- Label each image by index and role (
Image 1: edit target,Image 2: style reference). - If the user provides images for style, composition, or mood guidance and does not ask to modify them, treat the request as generation with references.
- If the user asks to preserve an existing image while changing specific parts, treat the request as an edit.
- For compositing, describe how the images interact (
place the subject from Image 2 into Image 1).
Iterate deliberately
- Start with a clean base prompt, then make small single-change edits.
- Re-specify critical constraints when you iterate.
- Prefer one targeted follow-up at a time over rewriting the whole prompt.
Transparent images
- Use built-in
image_genfirst for transparent-image requests. If the subject is clearly too complex for chroma-key removal, explain the fallback and ask before switching to CLI. - Prompt for a perfectly flat solid chroma-key background, usually
#00ff00; use#ff00ffwhen the subject is green, and avoid key colors that appear in the subject. - Explicitly prohibit shadows, gradients, floor planes, reflections, texture, and lighting variation in the background.
- Ask for crisp edges, generous padding, and no use of the key color inside the subject.
- After generation, remove the background locally with
python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" --input <source> --out <final.png> --auto-key border --soft-matte --transparent-threshold 12 --opaque-threshold 220 --despilland validate the alpha result before shipping it. - Use soft matte and despill for antialiased edges; hard tolerance-only removal is mainly for flat pixel-art or exact-color fixtures.
- Use CLI
gpt-image-1.5 --background transparent --output-format pngonly after the user explicitly confirms the fallback, or when the user already explicitly requestedgpt-image-1.5,scripts/image_gen.py, or CLI fallback. Ask first for true/native transparency requests, failed chroma-key validation, or complex transparent subjects such as hair, fur, glass, smoke, liquids, translucent materials, reflective objects, or soft shadows.
Fallback-only execution controls
quality,input_fidelity, explicit masks, output format, and output paths are fallback-only execution controls.- Do not assume they are built-in
image_gentool arguments. - If the user explicitly chooses CLI fallback, see
references/cli.mdandreferences/image-api.mdfor those controls. - In CLI fallback mode,
gpt-image-2is the default. It supportsquality=low|medium|high|auto; uselowfor fast drafts and thumbnails, and move tomedium,high, orautofor final assets. gpt-image-2always uses high fidelity for image inputs, so do not setinput_fidelitywith that model.- If a transparent request needs true CLI transparency, ask before using
gpt-image-1.5unless the user already explicitly chose it. Explain that built-in chroma-key removal is the default path, butgpt-image-2does not supportbackground=transparent. - If the user asks for 4K-style output with
gpt-image-2, use3840x2160for landscape or2160x3840for portrait.
Use-case tips
Generate:
- photorealistic-natural: Prompt as if a real photo is captured in the moment; use photography language (lens, lighting, framing); call for real texture; avoid over-stylized polish unless requested.
- product-mockup: Describe the product/packaging and materials; ensure clean silhouette and label clarity; if in-image text is needed, require verbatim rendering and specify typography.
- ui-mockup: Describe the target fidelity first (shippable mockup or low-fi wireframe), then focus on layout, hierarchy, and practical UI elements; avoid concept-art language.
- infographic-diagram: Define the audience and layout flow; label parts explicitly; require verbatim text; prefer higher quality in CLI mode for dense labels.
- logo-brand: Keep it simple and scalable; ask for a strong silhouette and balanced negative space; avoid decorative flourishes unless requested.
- ads-marketing: Write like a creative brief; include brand positioning, audience, desired vibe, scene, and exact tagline if text must appear.
- productivity-visual: Name the exact artifact (slide, chart, workflow diagram), define the canvas and hierarchy, provide real labels/data, and ask for readable typography and polished spacing.
- scientific-educational: Define audience, lesson objective, required labels, scientific constraints, arrows, and scan-friendly whitespace.
- illustration-story: Define panels or scene beats; keep each action concrete.
- stylized-concept: Specify style cues, material finish, and rendering approach (3D, painterly, clay) without inventing new story elements.
- historical-scene: State the location/date and required period accuracy; constrain clothing, props, and environment to match the era.
Edit:
- text-localization: Change only the text; preserve layout, typography, spacing, and hierarchy; no extra words or reflow unless needed.
- identity-preserve: Lock identity (face, body, pose, hair, expression); change only the specified elements; match lighting and shadows.
- precise-object-edit: Specify exactly what to remove/replace; preserve surrounding texture and lighting; keep everything else unchanged.
- lighting-weather: Change only environmental conditions (light, shadows, atmosphere, precipitation); keep geometry, framing, and subject identity.
- background-extraction: For simple opaque subjects, request a clean cutout on a perfectly flat chroma-key background; crisp silhouette; generous padding; no shadows; no halos; preserve label text exactly; no restyling. Ask before using true CLI transparency for complex subjects.
- style-transfer: Specify style cues to preserve (palette, texture, brushwork) and what must change; add
no extra elementsto prevent drift. - compositing: Reference inputs by index; specify what moves where; match lighting, perspective, and scale; keep the base framing unchanged.
- sketch-to-render: Preserve layout, proportions, and perspective; choose materials and lighting that support the supplied sketch without adding new elements.
Where to find copy/paste recipes
For copy/paste prompt specs (examples only), see references/sample-prompts.md. This file focuses on principles, specificity, and iteration patterns.