Add multiplexer title helper

2026-04-24 17:16:56 -07:00
parent 0b1d058417
commit ddb0a85c68
23 changed files with 1723 additions and 555 deletions
--- a/dotfiles/agents/AGENTS.md
+++ b/dotfiles/agents/AGENTS.md
@@ -1,9 +1,8 @@
 # Agentic Session Preferences

 ## Multiplexer session titling
- Treat this chat as the controller for the current tmux or zellij session.
- If the current session is in tmux, run the tmux rename one-liner eagerly without checking whether TMUX is set first.
- If the current session is in zellij, run the zellij rename commands eagerly without checking whether ZELLIJ is set first.
+- If the `TMUX` or `ZELLIJ` environment variable is set, treat this chat as the controller for the current tmux or zellij session.
+- Use `set_multiplexer_title '<project> - <task>'` to update the title. The command detects tmux vs. zellij internally, prefers tmux when both are present, and no-ops outside a multiplexer.
 - Maintain a session/window/pane title that updates when the task focus changes substantially.
 - Prefer automatic titling: infer a concise <task> from the current user request and context without asking.
 - Title format: "<project> - <task>".
@@ -12,13 +11,7 @@
  - <task> is a short, user-friendly description of what we are doing.
 - Ask for a short descriptive <task> only when the task is ambiguous or you are not confident in an inferred title.
 - When the task changes substantially, update the <task> automatically if clear; otherwise ask for an updated <task>.
- When a title is provided or updated, immediately run the matching command for the active multiplexer:
-
-  tmux rename-session '<project> - <task>' \; rename-window '<project> - <task>' \; select-pane -T '<project> - <task>'
-
-  zellij action rename-session '<project> - <task>' && zellij action rename-tab '<project> - <task>' && zellij action rename-pane '<project> - <task>'
-
- Assume you are inside the active multiplexer, so do not use tmux `-t` or zellij targeting flags unless the user asks to target a specific session/tab/pane.
+- When a title is provided or updated, immediately run `set_multiplexer_title '<project> - <task>'`; do not call raw tmux or zellij rename commands unless debugging the helper itself.
 - For Claude Code sessions, a UserPromptSubmit hook will also update titles automatically based on the latest prompt.

 ## Pane usage
--- a/dotfiles/agents/hooks/tmux-title.sh
+++ b/dotfiles/agents/hooks/tmux-title.sh
@@ -1,14 +1,6 @@
 #!/usr/bin/env bash
 set -euo pipefail

-if [[ -n "${ZELLIJ:-}" ]]; then
-  multiplexer="zellij"
-elif [[ -n "${TMUX:-}" ]]; then
-  multiplexer="tmux"
-else
-  exit 0
-fi
-
 input=$(cat)

 mapfile -d '' -t parsed < <(PAYLOAD="$input" python3 - <<'PY'
@@ -61,24 +53,9 @@ fi

 title="$project - $task"

-state_dir="${HOME}/.agents/state"
-state_file="$state_dir/${multiplexer}-title"
-mkdir -p "$state_dir"
-
-if [[ -f "$state_file" ]]; then
-  last_title=$(cat "$state_file" 2>/dev/null || true)
-  if [[ "$last_title" == "$title" ]]; then
-    exit 0
-  fi
-fi
-
-printf '%s' "$title" > "$state_file"
-
-# Update session, window/tab, and pane titles.
-if [[ "$multiplexer" == "tmux" ]]; then
-  tmux rename-session "$title" \; rename-window "$title" \; select-pane -T "$title"
+if command -v set_multiplexer_title >/dev/null 2>&1; then
+  set_multiplexer_title "$title"
 else
-  zellij action rename-session "$title"
-  zellij action rename-tab "$title"
-  zellij action rename-pane "$title"
+  hook_dir=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)
+  "$hook_dir/../../lib/functions/set_multiplexer_title" "$title"
 fi
--- a/dotfiles/agents/skills/.system/.codex-system-skills.marker
+++ b/dotfiles/agents/skills/.system/.codex-system-skills.marker
@@ -1 +1 @@
-79bd4e36950d6270
+22c0ca9bd55ca4ff
--- a/dotfiles/agents/skills/.system/imagegen/SKILL.md
+++ b/dotfiles/agents/skills/.system/imagegen/SKILL.md
@@ -11,18 +11,22 @@ Generates or edits images for the current project (for example website assets, g

 This skill has exactly two top-level modes:

- **Default built-in tool mode (preferred):** built-in `image_gen` tool for normal image generation and editing. Does not require `OPENAI_API_KEY`.
- **Fallback CLI mode (explicit-only):** `scripts/image_gen.py` CLI. Use only when the user explicitly asks for the CLI path. Requires `OPENAI_API_KEY`.
+- **Default built-in tool mode (preferred):** built-in `image_gen` tool for normal image generation, editing, and simple transparent-image requests. Does not require `OPENAI_API_KEY`.
+- **Fallback CLI mode:** `scripts/image_gen.py` CLI. Use when the user explicitly asks for the CLI/API/model path, or after the user explicitly confirms a true model-native transparency fallback with `gpt-image-1.5`. Requires `OPENAI_API_KEY`.

-Within the explicit CLI fallback only, the CLI exposes three subcommands:
+Within CLI fallback, the CLI exposes three subcommands:

 - `generate`
 - `edit`
 - `generate-batch`

 Rules:
- Use the built-in `image_gen` tool by default for all normal image generation and editing requests.
- Never switch to CLI fallback automatically.
+- Use the built-in `image_gen` tool by default for normal image generation and editing requests.
+- Do not switch to CLI fallback for ordinary quality, size, or file-path control.
+- If the user explicitly asks for a transparent image/background, stay on built-in `image_gen` first: prompt for a flat removable chroma-key background, then remove it locally with the installed helper at `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`.
+- Never silently switch from built-in `image_gen` or CLI `gpt-image-2` to CLI `gpt-image-1.5`. Treat this as a model/path downgrade and ask the user before doing it, unless the user has already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
+- If a transparent request appears too complex for clean chroma-key removal, asks for true/native transparency, or local removal fails validation, explain that true transparency requires CLI `gpt-image-1.5 --background transparent --output-format png` because `gpt-image-2` does not support `background=transparent`, then ask whether to proceed. Run the CLI fallback only after the user confirms.
+- The word `batch` by itself does not mean CLI fallback. If the user asks for many assets or says to batch-generate assets without explicitly asking for CLI/API/model controls, stay on the built-in path and issue one built-in call per requested asset or variant.
 - If the built-in tool fails or is unavailable, tell the user the CLI fallback exists and that it requires `OPENAI_API_KEY`. Proceed only if the user explicitly asks for that fallback.
 - If the user explicitly asks for CLI mode, use the bundled `scripts/image_gen.py` workflow. Do not create one-off SDK runners.
 - Never modify `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
@@ -46,6 +50,9 @@ Fallback-only docs/resources for CLI mode:
 - `references/codex-network.md`
 - `scripts/image_gen.py`

+Local post-processing helper:
+- `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`: removes a flat chroma-key background from a generated image and writes a PNG/WebP with alpha. Prefer auto-key sampling, soft matte, and despill for antialiased edges.
+
 ## When to use
 - Generate a new image (concept art, product shot, cover, website hero)
 - Generate a new image using one or more reference images for style, composition, or mood
@@ -79,12 +86,13 @@ Built-in edit semantics:

 Execution strategy:
 - In the built-in default path, produce many assets or variants by issuing one `image_gen` call per requested asset or variant.
- In the explicit CLI fallback path, use the CLI `generate-batch` subcommand only when the user explicitly chose CLI mode and needs many prompts/assets.
+- In the CLI fallback path, use the CLI `generate-batch` subcommand only when the user explicitly chose CLI mode and needs many prompts/assets.
+- For many distinct assets, do not use `n` as a substitute for separate prompts. `n` is for variants of one prompt; distinct assets need distinct built-in calls or distinct CLI `generate-batch` jobs.

 Assume the user wants a new image unless they clearly ask to change an existing one.

 ## Workflow
-1. Decide the top-level mode: built-in by default, fallback CLI only if explicitly requested.
+1. Decide the top-level mode: built-in by default, including simple transparent-output requests; fallback CLI only if explicitly requested or after the user explicitly confirms a transparent-output fallback.
 2. Decide the intent: `generate` or `edit`.
 3. Decide whether the output is preview-only or meant to be consumed by the current project.
 4. Decide the execution strategy: single asset vs repeated built-in calls vs CLI `generate-batch`.
@@ -99,13 +107,54 @@ Assume the user wants a new image unless they clearly ask to change an existing
   - If the user's prompt is already specific and detailed, normalize it into a clear spec without adding creative requirements.
   - If the user's prompt is generic, add tasteful augmentation only when it materially improves output quality.
 10. Use the built-in `image_gen` tool by default.
-11. If the user explicitly chooses the CLI fallback, then and only then use the fallback-only docs for quality, `input_fidelity`, masks, output format, output paths, and network setup.
+11. For transparent-output requests, follow the transparent image guidance below: generate with built-in `image_gen` on a flat chroma-key background, copy the selected output into the workspace or `tmp/imagegen/`, run the installed `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py` helper, and validate the alpha result before using it. If this path looks unsuitable or fails, ask before switching to CLI `gpt-image-1.5`.
 12. Inspect outputs and validate: subject, style, composition, text accuracy, and invariants/avoid items.
 13. Iterate with a single targeted change, then re-check.
 14. For preview-only work, render the image inline; the underlying file may remain at the default `$CODEX_HOME/generated_images/...` path.
 15. For project-bound work, move or copy the selected artifact into the workspace and update any consuming code or references. Never leave a project-referenced asset only at the default `$CODEX_HOME/generated_images/...` path.
-16. For batches, persist only the selected finals in the workspace unless the user explicitly asked to keep discarded variants.
-17. Always report the final saved path for any workspace-bound asset, plus the final prompt and whether the built-in tool or fallback CLI mode was used.
+16. For batches or multi-asset requests, persist every requested deliverable final in the workspace unless the user explicitly asked to keep outputs preview-only. Discarded variants do not need to be kept unless requested.
+17. If the user explicitly chooses or confirms the CLI fallback, then use the fallback-only docs for model, quality, size, `input_fidelity`, masks, output format, output paths, and network setup.
+18. Always report the final saved path(s) for any workspace-bound asset(s), plus the final prompt or prompt set and whether the built-in tool or fallback CLI mode was used.
+
+## Transparent image requests
+
+Transparent-image requests still use built-in `image_gen` first. Because the built-in tool does not expose a true transparent-background control, create a removable chroma-key source image and then convert the key color to alpha locally.
+
+Default sequence:
+1. Use built-in `image_gen` to generate the requested subject on a perfectly flat solid chroma-key background.
+2. Choose a key color that is unlikely to appear in the subject: default `#00ff00`, use `#ff00ff` for green subjects, and avoid `#0000ff` for blue subjects.
+3. After generation, move or copy the selected source image from `$CODEX_HOME/generated_images/...` into the workspace or `tmp/imagegen/`.
+4. Run the installed helper path, not a project-relative script path:
+   ```bash
+   python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" \
+     --input <source> \
+     --out <final.png> \
+     --auto-key border \
+     --soft-matte \
+     --transparent-threshold 12 \
+     --opaque-threshold 220 \
+     --despill
+   ```
+5. Validate that the output has an alpha channel, transparent corners, plausible subject coverage, and no obvious key-color fringe. If a thin fringe remains, retry once with `--edge-contract 1`; use `--edge-feather 0.25` only when the edge is visibly stair-stepped and the subject is not shiny or reflective.
+6. Save the final alpha PNG/WebP in the project if the asset is project-bound. Never leave a project-referenced transparent asset only under `$CODEX_HOME/*`.
+
+Prompt transparent requests like this:
+
+```text
+Create the requested subject on a perfectly flat solid #00ff00 chroma-key background for background removal.
+The background must be one uniform color with no shadows, gradients, texture, reflections, floor plane, or lighting variation.
+Keep the subject fully separated from the background with crisp edges and generous padding.
+Do not use #00ff00 anywhere in the subject.
+No cast shadow, no contact shadow, no reflection, no watermark, and no text unless explicitly requested.
+```
+
+Do not automatically use CLI `gpt-image-1.5 --background transparent --output-format png` instead of chroma keying. Ask the user first when the user asks for true/native transparency, when local removal fails validation, or when the requested image is complex: hair, fur, feathers, smoke, glass, liquids, translucent materials, reflective objects, soft shadows, realistic product grounding, or subject colors that conflict with all practical key colors.
+
+Use a concise confirmation like:
+
+```text
+This likely needs true native transparency. The default built-in path uses a chroma-key background plus local removal, but true transparency requires the CLI fallback with gpt-image-1.5 because gpt-image-2 does not support background=transparent. It also requires OPENAI_API_KEY. Should I proceed with that CLI fallback?
+```

 ## Prompt augmentation

@@ -140,6 +189,9 @@ Generate:
 - product-mockup — product/packaging shots, catalog imagery, merch concepts.
 - ui-mockup — app/web interface mockups and wireframes; specify the desired fidelity.
 - infographic-diagram — diagrams/infographics with structured layout and text.
+- scientific-educational — classroom explainers, scientific diagrams, and learning visuals with required labels and accuracy constraints.
+- ads-marketing — campaign concepts and ad creatives with audience, brand position, scene, and exact tagline/copy.
+- productivity-visual — slide, chart, workflow, and data-heavy business visuals.
 - logo-brand — logo/mark exploration, vector-friendly.
 - illustration-story — comics, children’s book art, narrative scenes.
 - stylized-concept — style-driven concept art, 3D/stylized renders.
@@ -150,7 +202,7 @@ Edit:
 - identity-preserve — try-on, person-in-scene; lock face/body/pose.
 - precise-object-edit — remove/replace a specific element (including interior swaps).
 - lighting-weather — time-of-day/season/atmosphere changes only.
- background-extraction — transparent background / clean cutout.
+- background-extraction — transparent background / clean cutout. Use built-in `image_gen` with chroma-key removal first for simple opaque subjects; ask before using CLI true transparency for complex subjects.
 - style-transfer — apply reference style while changing subject/scene.
 - compositing — multi-image insert/merge with matched lighting/perspective.
 - sketch-to-render — drawing/line art to photoreal render.
@@ -179,7 +231,7 @@ Avoid: <negative constraints>
 Notes:
 - `Asset type` and `Input images` are prompt scaffolding, not dedicated CLI flags.
 - `Scene/backdrop` refers to the visual setting. It is not the same as the fallback CLI `background` parameter, which controls output transparency behavior.
- Fallback-only execution notes such as `Quality:`, `Input fidelity:`, masks, output format, and output paths belong in the explicit CLI path only. Do not treat them as built-in `image_gen` tool arguments.
+- Fallback-only execution notes such as `Quality:`, `Input fidelity:`, masks, output format, and output paths belong in the CLI path only. Do not treat them as built-in `image_gen` tool arguments.

 Augmentation rules:
 - Keep it short.
@@ -220,7 +272,8 @@ Constraints: change only the background; keep the product and its edges unchange
 - Iterate with single-change follow-ups.
 - If the prompt is generic, add only the extra detail that will materially help.
 - If the prompt is already detailed, normalize it instead of expanding it.
- For explicit CLI fallback only, see `references/cli.md` and `references/image-api.md` for `quality`, `input_fidelity`, masks, output format, and output-path guidance.
+- For CLI fallback only, see `references/cli.md` and `references/image-api.md` for model, `quality`, `input_fidelity`, masks, output format, and output-path guidance.
+- For transparent images, use the built-in-first chroma-key workflow unless the request is complex enough to need true CLI transparency; ask before switching to CLI `gpt-image-1.5`.

 More principles shared by both modes: `references/prompting.md`.
 Copy/paste specs shared by both modes: `references/sample-prompts.md`.
@@ -228,10 +281,33 @@ Copy/paste specs shared by both modes: `references/sample-prompts.md`.
 ## Guidance by asset type
 Asset-type templates (website assets, game assets, wireframes, logo) are consolidated in `references/sample-prompts.md`.

+## gpt-image-2 guidance for CLI fallback
+
+The fallback CLI defaults to `gpt-image-2`.
+
+- Use `gpt-image-2` for new CLI/API workflows unless the request needs true model-native transparent output.
+- If a transparent request may need CLI fallback, ask before using `gpt-image-1.5` unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Explain that the built-in chroma-key path is the default, but true transparency requires `gpt-image-1.5` because `gpt-image-2` does not support `background=transparent`.
+- `gpt-image-2` always uses high fidelity for image inputs; do not set `input_fidelity` with this model.
+- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
+- Use `quality low` for fast drafts, thumbnails, and quick iterations. Use `medium`, `high`, or `auto` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
+- Square images are typically fastest to generate. Use `1024x1024` for fast square drafts.
+- If the user asks for 4K-style output, use `3840x2160` for landscape or `2160x3840` for portrait.
+- `gpt-image-2` size may be `auto` or `WIDTHxHEIGHT` if all constraints hold: max edge `<= 3840px`, both edges multiples of `16px`, long-to-short ratio `<= 3:1`, total pixels between `655,360` and `8,294,400`.
+
+Popular `gpt-image-2` sizes:
+- `1024x1024` square
+- `1536x1024` landscape
+- `1024x1536` portrait
+- `2048x2048` 2K square
+- `2048x1152` 2K landscape
+- `3840x2160` 4K landscape
+- `2160x3840` 4K portrait
+- `auto`
+
 ## Fallback CLI mode only

 ### Temp and output conventions
-These conventions apply only to the explicit CLI fallback. They do not describe built-in `image_gen` output behavior.
+These conventions apply only to the CLI fallback. They do not describe built-in `image_gen` output behavior.
 - Use `tmp/imagegen/` for intermediate files (for example JSONL batches); delete them when done.
 - Write final artifacts under `output/imagegen/`.
 - Use `--out` or `--out-dir` to control output paths; keep filenames stable and descriptive.
@@ -244,7 +320,7 @@ Required Python package:
 uv pip install openai
 ```

-Optional for downscaling only:
+Required for local chroma-key removal and optional downscaling:
 ```bash
 uv pip install pillow
 ```
@@ -276,4 +352,5 @@ If installation is not possible in this environment, tell the user which depende
 - `references/cli.md`: fallback-only CLI usage via `scripts/image_gen.py`.
 - `references/image-api.md`: fallback-only API/CLI parameter reference.
 - `references/codex-network.md`: fallback-only network/sandbox troubleshooting for CLI mode.
- `scripts/image_gen.py`: fallback-only CLI implementation. Do not load or use it unless the user explicitly chooses CLI mode.
+- `scripts/image_gen.py`: fallback-only CLI implementation. Do not load or use it unless the user explicitly chooses CLI mode or explicitly confirms a transparent request's true CLI transparency fallback.
+- `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`: local post-processing helper for built-in transparent-image requests.
--- a/dotfiles/agents/skills/.system/imagegen/agents/openai.yaml
+++ b/dotfiles/agents/skills/.system/imagegen/agents/openai.yaml
@@ -3,4 +3,4 @@ interface:
  short_description: "Generate or edit images for websites, games, and more"
  icon_small: "./assets/imagegen-small.svg"
  icon_large: "./assets/imagegen.png"
-  default_prompt: "Generate or edit the visual assets for this task with the built-in `image_gen` tool by default. First confirm that the task actually calls for a raster image; if the project already has SVG/vector/code-native assets and the user wants to extend or match those, do not use this skill. If the task includes reference images, treat them as references unless the user clearly wants an existing image modified. For multi-asset requests, loop built-in calls rather than treating batch as a separate top-level mode. Only use the fallback CLI if the user explicitly asks for it, and keep CLI-only controls such as `generate-batch`, `quality`, `input_fidelity`, masks, and output paths on that fallback path."
+  default_prompt: "Use $imagegen to make or edit an image for this project."
--- a/dotfiles/agents/skills/.system/imagegen/references/cli.md
+++ b/dotfiles/agents/skills/.system/imagegen/references/cli.md
@@ -1,13 +1,14 @@
 # CLI reference (`scripts/image_gen.py`)

-This file is for the fallback CLI mode only. Read it only after the user explicitly asks to use `scripts/image_gen.py` instead of the built-in `image_gen` tool.
+This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.

 `generate-batch` is a CLI subcommand in this fallback path. It is not a top-level mode of the skill.
+The word `batch` in a user request is not CLI opt-in by itself.

 ## What this CLI does
 - `generate`: generate a new image from a prompt
 - `edit`: edit one or more existing images
- `generate-batch`: run many generation jobs from a JSONL file
+- `generate-batch`: run many generation jobs from a JSONL file after the user explicitly chooses CLI/API/model controls

 Real API calls require **network access** + `OPENAI_API_KEY`. `--dry-run` does not.

@@ -16,7 +17,7 @@ Set a stable path to the skill CLI (default `CODEX_HOME` is `~/.codex`):

 ```
 export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
-export IMAGE_GEN="$CODEX_HOME/skills/imagegen/scripts/image_gen.py"
+export IMAGE_GEN="$CODEX_HOME/skills/.system/imagegen/scripts/image_gen.py"
 ```

 Install dependencies into that environment with its package manager. In uv-managed environments, `uv pip install ...` remains the preferred path.
@@ -58,27 +59,102 @@ python "$IMAGE_GEN" edit \
 - Use the bundled CLI directly (`python "$IMAGE_GEN" ...`) after activating the correct environment.
 - Do **not** create one-off runners (for example `gen_images.py`) unless the user explicitly asks for a custom wrapper.
 - **Never modify** `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
+- Do not silently downgrade from CLI `gpt-image-2` or built-in `image_gen` to CLI `gpt-image-1.5`; ask first unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.

 ## Defaults
- Model: `gpt-image-1.5`
+- Model: `gpt-image-2`
 - Supported model family for this CLI: GPT Image models (`gpt-image-*`)
- Size: `1024x1024`
- Quality: `auto`
+- Size: `auto`
+- Quality: `medium`
 - Output format: `png`
 - Default one-off output path: `output/imagegen/output.png`
 - Background: unspecified unless `--background` is set

+## gpt-image-2 size and model guidance
+
+`gpt-image-2` is the default model for new CLI fallback work.
+
+- Use `--quality low` for fast drafts, thumbnails, and quick iterations.
+- Use `--quality medium`, `--quality high`, or `--quality auto` for final assets, dense text, diagrams, identity-sensitive edits, and high-resolution outputs.
+- Square images are typically fastest. Use `--size 1024x1024` for quick square drafts.
+- If the user asks for 4K-style output, use `--size 3840x2160` for landscape or `--size 2160x3840` for portrait.
+- Do not pass `--input-fidelity` with `gpt-image-2`; this model always uses high fidelity for image inputs.
+- Do not use `--background transparent` with `gpt-image-2`; the default transparent-image workflow uses built-in `image_gen` on a flat chroma-key background plus local removal. Use `gpt-image-1.5` only after the user explicitly confirms the true-transparent CLI fallback, unless they already requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
+
+Popular `gpt-image-2` sizes:
+- `1024x1024`
+- `1536x1024`
+- `1024x1536`
+- `2048x2048`
+- `2048x1152`
+- `3840x2160`
+- `2160x3840`
+- `auto`
+
+`gpt-image-2` size constraints:
+- max edge `<= 3840px`
+- both edges multiples of `16px`
+- long edge to short edge ratio `<= 3:1`
+- total pixels between `655,360` and `8,294,400`
+- outputs above `2560x1440` total pixels are experimental
+
+Fast draft:
+
+```bash
+python "$IMAGE_GEN" generate \
+  --prompt "A product thumbnail of a matte ceramic mug on a stone surface" \
+  --quality low \
+  --size 1024x1024 \
+  --out output/imagegen/mug-draft.png
+```
+
+Final 2K landscape:
+
+```bash
+python "$IMAGE_GEN" generate \
+  --prompt "A polished landing-page hero image of a matte ceramic mug on a stone surface" \
+  --quality high \
+  --size 2048x1152 \
+  --out output/imagegen/mug-hero.png
+```
+
+4K landscape:
+
+```bash
+python "$IMAGE_GEN" generate \
+  --prompt "A detailed architectural visualization at golden hour" \
+  --size 3840x2160 \
+  --quality high \
+  --out output/imagegen/architecture-4k.png
+```
+
+True transparent fallback request:
+
+Ask for confirmation before using this command unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
+
+```bash
+python "$IMAGE_GEN" generate \
+  --model gpt-image-1.5 \
+  --prompt "A clean product cutout on a transparent background" \
+  --background transparent \
+  --output-format png \
+  --out output/imagegen/product-cutout.png
+```
+
+When using this path, explain briefly that built-in `image_gen` plus chroma-key removal is the default transparent-image path, but this request needs true model-native transparency. `gpt-image-2` does not support `background=transparent`, so `gpt-image-1.5` is required for this confirmed fallback.
+
 ## Quality, input fidelity, and masks (CLI fallback only)
 These are explicit CLI controls. They are not built-in `image_gen` tool arguments.

 - `--quality` works for `generate`, `edit`, and `generate-batch`: `low|medium|high|auto`
- `--input-fidelity` is **edit-only** and validated as `low|high`
+- `--input-fidelity` is **edit-only** and validated as `low|high`; it is not supported for `gpt-image-2`
 - `--mask` is **edit-only**

 Example:

 ```bash
 python "$IMAGE_GEN" edit \
+  --model gpt-image-1.5 \
  --image input.png \
  --prompt "Change only the background" \
  --quality high \
@@ -89,6 +165,10 @@ python "$IMAGE_GEN" edit \
 Mask notes:
 - For multi-image edits, pass repeated `--image` flags. Their order is meaningful, so describe each image by index and role in the prompt.
 - The CLI accepts a single `--mask`.
+- Image and mask must be the same size and format and each under 50MB.
+- Masks must include an alpha channel.
+- If multiple input images are provided, the mask applies to the first image.
+- Masking is prompt-guided; do not promise exact pixel-perfect mask boundaries.
 - Use a PNG mask when possible; the script treats mask handling as best-effort and does not perform full preflight validation beyond file checks/warnings.
 - In the edit prompt, repeat invariants (`change only the background; keep the subject unchanged`) to reduce drift.

@@ -147,10 +227,11 @@ Notes:
 - Per-job overrides are supported in JSONL (for example `size`, `quality`, `background`, `output_format`, `output_compression`, `moderation`, `n`, `model`, `out`, and prompt-augmentation fields).
 - `--n` generates multiple variants for a single prompt; `generate-batch` is for many different prompts.
 - In batch mode, per-job `out` is treated as a filename under `--out-dir`.
+- For many requested deliverable assets, provide one prompt/job per distinct asset and use semantic filenames when possible.

 ## CLI notes
- Supported sizes: `1024x1024`, `1536x1024`, `1024x1536`, or `auto`.
- Transparent backgrounds require `output_format` to be `png` or `webp`.
+- Supported sizes depend on the model. `gpt-image-2` supports flexible constrained sizes; older GPT Image models support `1024x1024`, `1536x1024`, `1024x1536`, or `auto`.
+- True transparent CLI outputs require `output_format` to be `png` or `webp` and are not supported by `gpt-image-2`.
 - `--prompt-file`, `--output-compression`, `--moderation`, `--max-attempts`, `--fail-fast`, `--force`, and `--no-augment` are supported.
 - This CLI is intended for GPT Image models. Do not assume older non-GPT image-model behavior applies here.

@@ -158,3 +239,4 @@ Notes:
 - API parameter quick reference for fallback CLI mode: `references/image-api.md`
 - Prompt examples shared across both top-level modes: `references/sample-prompts.md`
 - Network/sandbox notes for fallback CLI mode: `references/codex-network.md`
+- Built-in-first transparent image workflow: `SKILL.md` and `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`
--- a/dotfiles/agents/skills/.system/imagegen/references/codex-network.md
+++ b/dotfiles/agents/skills/.system/imagegen/references/codex-network.md
@@ -1,6 +1,6 @@
 # Codex network approvals / sandbox notes

-This file is for the fallback CLI mode only. Read it only after the user explicitly asks to use `scripts/image_gen.py`.
+This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.

 This guidance is intentionally isolated from `SKILL.md` because it can vary by environment and may become stale. Prefer the defaults in your environment when in doubt.

--- a/dotfiles/agents/skills/.system/imagegen/references/image-api.md
+++ b/dotfiles/agents/skills/.system/imagegen/references/image-api.md
@@ -1,13 +1,46 @@
 # Image API quick reference

-This file is for the fallback CLI mode only. Use it only after the user explicitly asks to use `scripts/image_gen.py` instead of the built-in `image_gen` tool.
+This file is for the fallback CLI mode only. Use it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.

 These parameters describe the Image API and bundled CLI fallback surface. Do not assume they are normal arguments on the built-in `image_gen` tool.

 ## Scope
- This fallback CLI is intended for GPT Image models (`gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`).
+- This fallback CLI is intended for GPT Image models (`gpt-image-2`, `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`).
 - The built-in `image_gen` tool and the fallback CLI do not expose the same controls.

+## Model summary
+
+| Model | Quality | Input fidelity | Resolutions | Recommended use |
+| --- | --- | --- | --- | --- |
+| `gpt-image-2` | `low`, `medium`, `high`, `auto` | Always high fidelity for image inputs; do not set `input_fidelity` | `auto` or flexible sizes that satisfy the constraints below | Default for new CLI/API workflows: high-quality generation and editing, text-heavy images, photorealism, compositing, identity-sensitive edits, and workflows where fewer retries matter |
+| `gpt-image-1.5` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | True transparent-background fallback and backward-compatible workflows |
+| `gpt-image-1` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Legacy compatibility |
+| `gpt-image-1-mini` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Cost-sensitive draft batches and lower-stakes previews |
+
+## gpt-image-2 sizes
+
+`gpt-image-2` accepts `auto` or any `WIDTHxHEIGHT` size that satisfies all constraints:
+
+- Maximum edge length must be less than or equal to `3840px`.
+- Both edges must be multiples of `16px`.
+- Long edge to short edge ratio must not exceed `3:1`.
+- Total pixels must be at least `655,360` and no more than `8,294,400`.
+
+Popular sizes:
+
+| Label | Size | Notes |
+| --- | --- | --- |
+| Square | `1024x1024` | Typical fast default |
+| Landscape | `1536x1024` | Standard landscape |
+| Portrait | `1024x1536` | Standard portrait |
+| 2K square | `2048x2048` | Larger square output |
+| 2K landscape | `2048x1152` | Widescreen output |
+| 4K landscape | `3840x2160` | Widescreen 4K output |
+| 4K portrait | `2160x3840` | Vertical 4K output |
+| Auto | `auto` | Default size |
+
+Square images are typically fastest to generate. For 4K-style output, use `3840x2160` or `2160x3840`.
+
 ## Endpoints
 - Generate: `POST /v1/images/generations` (`client.images.generate(...)`)
 - Edit: `POST /v1/images/edits` (`client.images.edit(...)`)
@@ -16,7 +49,7 @@ These parameters describe the Image API and bundled CLI fallback surface. Do not
 - `prompt`: text prompt
 - `model`: image model
 - `n`: number of images (1-10)
- `size`: `1024x1024`, `1536x1024`, `1024x1536`, or `auto`
+- `size`: `auto` by default for `gpt-image-2`; flexible `WIDTHxHEIGHT` sizes are allowed only for `gpt-image-2`; older GPT Image models use `1024x1024`, `1536x1024`, `1024x1536`, or `auto`
 - `quality`: `low`, `medium`, `high`, or `auto`
 - `background`: output transparency behavior (`transparent`, `opaque`, or `auto`) for generated output; this is not the same thing as the prompt's visual scene/backdrop
 - `output_format`: `png` (default), `jpeg`, `webp`
@@ -26,12 +59,19 @@ These parameters describe the Image API and bundled CLI fallback surface. Do not
 ## Edit-specific parameters
 - `image`: one or more input images. For GPT Image models, you can provide up to 16 images.
 - `mask`: optional mask image
- `input_fidelity`: `low` (default) or `high`
+- `input_fidelity`: `low` or `high` only for models that support it; do not set this for `gpt-image-2`

 Model-specific note for `input_fidelity`:
+- `gpt-image-2` always uses high fidelity for image inputs and does not support setting `input_fidelity`.
 - `gpt-image-1` and `gpt-image-1-mini` preserve all input images, but the first image gets richer textures and finer details.
 - `gpt-image-1.5` preserves the first 5 input images with higher fidelity.

+## Transparent backgrounds
+
+`gpt-image-2` does not currently support the Image API `background=transparent` parameter. The skill's default transparent-image path is built-in `image_gen` with a flat chroma-key background, followed by local alpha extraction with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py"`.
+
+Use CLI `gpt-image-1.5` with `background=transparent` and a transparent-capable output format such as `png` or `webp` only after the user explicitly confirms that fallback, unless they already requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. If the user asks for true/native transparency, the subject is too complex for clean chroma-key removal, or local background removal fails validation, explain the tradeoff and ask before switching.
+
 ## Output
 - `data[]` list with `b64_json` per image
 - The bundled `scripts/image_gen.py` CLI decodes `b64_json` and writes output files for you.
@@ -41,8 +81,9 @@ Model-specific note for `input_fidelity`:
 - Use the edits endpoint when the user requests changes to an existing image.
 - Masking is prompt-guided; exact shapes are not guaranteed.
 - Large sizes and high quality increase latency and cost.
- High `input_fidelity` can materially increase input token usage.
- If a request fails because a specific option is unsupported by the selected GPT Image model, retry manually without that option.
+- Use `quality=low` for fast drafts, thumbnails, and quick iterations. Use `medium` or `high` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
+- High `input_fidelity` can materially increase input token usage on models that support it.
+- If a request fails because a specific option is unsupported by the selected GPT Image model, retry manually without that option only when the option is not required by the user. If true transparent CLI output is required, ask before switching to `gpt-image-1.5` instead of dropping `background=transparent`, unless the user already explicitly chose that fallback.

 ## Important boundary
 - `quality`, `input_fidelity`, explicit masks, `background`, `output_format`, and related parameters are fallback-only execution controls.
--- a/dotfiles/agents/skills/.system/imagegen/references/prompting.md
+++ b/dotfiles/agents/skills/.system/imagegen/references/prompting.md
@@ -15,6 +15,7 @@ This file is about prompt structure, specificity, and iteration. Fallback-only e
 - [Text in images](#text-in-images)
 - [Input images and references](#input-images-and-references)
 - [Iterate deliberately](#iterate-deliberately)
+- [Transparent images](#transparent-images)
 - [Fallback-only execution controls](#fallback-only-execution-controls)
 - [Use-case tips](#use-case-tips)
 - [Where to find copy/paste recipes](#where-to-find-copypaste-recipes)
@@ -28,6 +29,7 @@ This file is about prompt structure, specificity, and iteration. Fallback-only e
 - If the user prompt is already specific and detailed, normalize it into a clean spec without adding creative requirements.
 - If the prompt is generic, you may add tasteful detail when it materially improves the output.
 - Treat examples in `sample-prompts.md` as fully-authored recipes, not as the default amount of augmentation to add to every request.
+- For photorealism, include `photorealistic` directly when that is the goal, plus concrete real-world texture such as pores, wrinkles, fabric wear, material grain, or imperfect everyday detail.

 ## Allowed and disallowed augmentation

@@ -46,6 +48,7 @@ Do not add:
 - Specify framing and viewpoint (close-up, wide, top-down) and placement only when it materially helps.
 - Call out negative space if the asset clearly needs room for UI or copy.
 - Avoid making left/right layout decisions unless the user or surrounding layout supports them.
+- For people, describe body framing, scale, gaze, and object interactions when they matter (`full body visible`, `looking down at the book`, `hands naturally gripping the handlebars`).

 ## Constraints and invariants
 - State what must not change (`keep background unchanged`).
@@ -55,6 +58,7 @@ Do not add:
 - Put literal text in quotes or ALL CAPS and specify typography (font style, size, color, placement).
 - Spell uncommon words letter-by-letter if accuracy matters.
 - For in-image copy, require verbatim rendering and no extra characters.
+- In CLI fallback mode, use `medium` or `high` quality for small text, dense infographics, data-heavy slides, multi-font layouts, legends, axes, and footnotes.

 ## Input images and references
 - Do not assume that every provided image is an edit target.
@@ -68,18 +72,34 @@ Do not add:
 - Re-specify critical constraints when you iterate.
 - Prefer one targeted follow-up at a time over rewriting the whole prompt.

+## Transparent images
+- Use built-in `image_gen` first for transparent-image requests. If the subject is clearly too complex for chroma-key removal, explain the fallback and ask before switching to CLI.
+- Prompt for a perfectly flat solid chroma-key background, usually `#00ff00`; use `#ff00ff` when the subject is green, and avoid key colors that appear in the subject.
+- Explicitly prohibit shadows, gradients, floor planes, reflections, texture, and lighting variation in the background.
+- Ask for crisp edges, generous padding, and no use of the key color inside the subject.
+- After generation, remove the background locally with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" --input <source> --out <final.png> --auto-key border --soft-matte --transparent-threshold 12 --opaque-threshold 220 --despill` and validate the alpha result before shipping it.
+- Use soft matte and despill for antialiased edges; hard tolerance-only removal is mainly for flat pixel-art or exact-color fixtures.
+- Use CLI `gpt-image-1.5 --background transparent --output-format png` only after the user explicitly confirms the fallback, or when the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Ask first for true/native transparency requests, failed chroma-key validation, or complex transparent subjects such as hair, fur, glass, smoke, liquids, translucent materials, reflective objects, or soft shadows.
+
 ## Fallback-only execution controls
 - `quality`, `input_fidelity`, explicit masks, output format, and output paths are fallback-only execution controls.
 - Do not assume they are built-in `image_gen` tool arguments.
 - If the user explicitly chooses CLI fallback, see `references/cli.md` and `references/image-api.md` for those controls.
+- In CLI fallback mode, `gpt-image-2` is the default. It supports `quality=low|medium|high|auto`; use `low` for fast drafts and thumbnails, and move to `medium`, `high`, or `auto` for final assets.
+- `gpt-image-2` always uses high fidelity for image inputs, so do not set `input_fidelity` with that model.
+- If a transparent request needs true CLI transparency, ask before using `gpt-image-1.5` unless the user already explicitly chose it. Explain that built-in chroma-key removal is the default path, but `gpt-image-2` does not support `background=transparent`.
+- If the user asks for 4K-style output with `gpt-image-2`, use `3840x2160` for landscape or `2160x3840` for portrait.

 ## Use-case tips
 Generate:
 - photorealistic-natural: Prompt as if a real photo is captured in the moment; use photography language (lens, lighting, framing); call for real texture; avoid over-stylized polish unless requested.
 - product-mockup: Describe the product/packaging and materials; ensure clean silhouette and label clarity; if in-image text is needed, require verbatim rendering and specify typography.
 - ui-mockup: Describe the target fidelity first (shippable mockup or low-fi wireframe), then focus on layout, hierarchy, and practical UI elements; avoid concept-art language.
- infographic-diagram: Define the audience and layout flow; label parts explicitly; require verbatim text.
+- infographic-diagram: Define the audience and layout flow; label parts explicitly; require verbatim text; prefer higher quality in CLI mode for dense labels.
 - logo-brand: Keep it simple and scalable; ask for a strong silhouette and balanced negative space; avoid decorative flourishes unless requested.
+- ads-marketing: Write like a creative brief; include brand positioning, audience, desired vibe, scene, and exact tagline if text must appear.
+- productivity-visual: Name the exact artifact (slide, chart, workflow diagram), define the canvas and hierarchy, provide real labels/data, and ask for readable typography and polished spacing.
+- scientific-educational: Define audience, lesson objective, required labels, scientific constraints, arrows, and scan-friendly whitespace.
 - illustration-story: Define panels or scene beats; keep each action concrete.
 - stylized-concept: Specify style cues, material finish, and rendering approach (3D, painterly, clay) without inventing new story elements.
 - historical-scene: State the location/date and required period accuracy; constrain clothing, props, and environment to match the era.
@@ -89,7 +109,7 @@ Edit:
 - identity-preserve: Lock identity (face, body, pose, hair, expression); change only the specified elements; match lighting and shadows.
 - precise-object-edit: Specify exactly what to remove/replace; preserve surrounding texture and lighting; keep everything else unchanged.
 - lighting-weather: Change only environmental conditions (light, shadows, atmosphere, precipitation); keep geometry, framing, and subject identity.
- background-extraction: Request a clean cutout; crisp silhouette; no halos; preserve label text exactly; no restyling.
+- background-extraction: For simple opaque subjects, request a clean cutout on a perfectly flat chroma-key background; crisp silhouette; generous padding; no shadows; no halos; preserve label text exactly; no restyling. Ask before using true CLI transparency for complex subjects.
 - style-transfer: Specify style cues to preserve (palette, texture, brushwork) and what must change; add `no extra elements` to prevent drift.
 - compositing: Reference inputs by index; specify what moves where; match lighting, perspective, and scale; keep the base framing unchanged.
 - sketch-to-render: Preserve layout, proportions, and perspective; choose materials and lighting that support the supplied sketch without adding new elements.
--- a/dotfiles/agents/skills/.system/imagegen/references/sample-prompts.md
+++ b/dotfiles/agents/skills/.system/imagegen/references/sample-prompts.md
@@ -2,7 +2,7 @@

 These prompt recipes are shared across both top-level modes of the skill:
 - built-in `image_gen` tool (default)
- explicit `scripts/image_gen.py` CLI fallback
+- `scripts/image_gen.py` CLI fallback for explicit CLI/API/model requests or user-confirmed true-transparent-output fallback requests

 Use these as starting points. They are intentionally complete prompt recipes, not the default amount of augmentation to add to every user request.

@@ -13,7 +13,14 @@ When adapting a user's prompt:

 The labeled lines are prompt scaffolding, not a closed schema. `Asset type` and `Input images` are prompt-only scaffolding; the CLI does not expose them as dedicated flags.

-Execution details such as explicit CLI flags, `quality`, `input_fidelity`, masks, output formats, and local output paths depend on mode. Use the built-in tool by default; only apply CLI-specific controls after the user explicitly opts into fallback mode.
+Execution details such as explicit CLI flags, `quality`, `input_fidelity`, masks, output formats, and local output paths depend on mode. Use the built-in tool by default, including simple transparent-image requests. For transparent images, prompt for a flat chroma-key background and remove it locally with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py"`; only apply CLI-specific controls when the user explicitly opts into fallback mode or explicitly confirms that the transparent request should use true CLI transparency.
+
+CLI model notes:
+- `gpt-image-2` is the fallback CLI default for new workflows.
+- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
+- For 4K-style `gpt-image-2` output, use `3840x2160` or `2160x3840`.
+- If transparent output needs true CLI fallback, ask before using `gpt-image-1.5` unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Explain that built-in chroma-key removal is the default path, but `gpt-image-2` does not support `background=transparent`.
+- Do not set `input_fidelity` with `gpt-image-2`; image inputs already use high fidelity.

 For prompting principles (structure, specificity, invariants, iteration), see `references/prompting.md`.

@@ -68,6 +75,18 @@ Text (verbatim): "Bean Hopper", "Grinder", "Brew Group", "Boiler", "Water Tank",
 Constraints: clear labels, strong contrast, no logos or trademarks, no watermark
 ```

+### scientific-educational
+```
+Use case: scientific-educational
+Primary request: biology diagram titled "Cellular Respiration at a Glance" for high school students
+Scene/backdrop: clean white classroom handout background
+Subject: glucose turns into energy inside a cell; include glycolysis, Krebs cycle, and electron transport chain
+Style/medium: flat scientific diagram with consistent icons, arrows, and readable labels
+Composition/framing: landscape slide-style layout with clear hierarchy and generous whitespace
+Text (verbatim): "Cellular Respiration at a Glance", "Glucose", "Pyruvate", "ATP", "NADH", "FADH2", "CO2", "O2", "H2O"
+Constraints: scientifically plausible; avoid tiny text; no extra decoration; no watermark
+```
+
 ### logo-brand
 ```
 Use case: logo-brand
@@ -100,6 +119,30 @@ Lighting/mood: volumetric light rays cutting through fog
 Constraints: no logos or trademarks; no watermark
 ```

+### ads-marketing
+```
+Use case: ads-marketing
+Primary request: campaign image for a streetwear brand called Thread
+Subject: group of friends hanging out together in a stylish urban setting
+Style/medium: polished youth streetwear campaign photography
+Composition/framing: vertical ad layout with natural poses and integrated headline space
+Lighting/mood: contemporary, energetic, tasteful
+Text (verbatim): "Yours to Create."
+Constraints: render the tagline exactly once; clean legible typography; no extra text; no watermarks; no unrelated logos
+```
+
+### productivity-visual
+```
+Use case: productivity-visual
+Primary request: one pitch-deck slide titled "Market Opportunity"
+Asset type: fundraising slide image
+Style/medium: clean modern deck slide, white background, crisp sans-serif typography
+Subject: TAM/SAM/SOM concentric-circle diagram plus a small growth bar chart from 2021 to 2026
+Composition/framing: 16:9 landscape slide, clear data hierarchy, polished spacing
+Text (verbatim): "Market Opportunity", "TAM: $42B", "SAM: $8.7B", "SOM: $340M", "AGI Research, 2024", "Internal analysis"
+Constraints: readable labels, no clip art, no stock photography, no decorative clutter, no watermark
+```
+
 ### historical-scene
 ```
 Use case: historical-scene
@@ -348,9 +391,12 @@ Constraints: preserve subject identity, geometry, camera angle, and composition;
 Use case: background-extraction
 Input images: Image 1: product photo
 Primary request: isolate the product on a clean transparent background
-Constraints: crisp silhouette; no halos or fringing; preserve label text exactly; no restyling
+Scene/backdrop: perfectly flat solid #00ff00 chroma-key background for local background removal
+Constraints: background must be one uniform color with no shadows, gradients, texture, reflections, floor plane, or lighting variation; crisp silhouette; generous padding; no halos or fringing; preserve label text exactly; no restyling; do not use #00ff00 anywhere in the subject
 ```

+Post-process note: after built-in generation, run `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" --input <source> --out <final.png> --auto-key border --soft-matte --transparent-threshold 12 --opaque-threshold 220 --despill`. Ask before using CLI `gpt-image-1.5 --background transparent --output-format png` for true/native transparency, failed chroma-key validation, or complex subjects such as hair, fur, glass, smoke, liquids, translucent materials, reflections, or soft shadows, unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
+
 ### style-transfer
 ```
 Use case: style-transfer
@@ -367,6 +413,17 @@ Primary request: place the subject from Image 2 next to the person in Image 1
 Constraints: match lighting, perspective, and scale; keep the base framing unchanged; no extra elements
 ```

+### character consistency workflow
+```
+Use case: identity-preserve
+Input images: Image 1: previous character anchor illustration
+Primary request: continue the story with the same character in a new scene and action
+Scene/backdrop: snowy forest after a winter storm
+Subject: same young forest hero gently helping a frightened squirrel out of a fallen tree
+Style/medium: same children's book watercolor illustration style as Image 1
+Constraints: do not redesign the character; preserve facial features, proportions, outfit, color palette, and personality; no text; no watermark
+```
+
 ### sketch-to-render
 ```
 Use case: sketch-to-render
--- a/dotfiles/agents/skills/.system/imagegen/scripts/image_gen.py
+++ b/dotfiles/agents/skills/.system/imagegen/scripts/image_gen.py
@@ -1,9 +1,10 @@
 #!/usr/bin/env python3
 """Fallback CLI for explicit image generation or editing with GPT Image models.

-Used only when the user explicitly opts into CLI fallback mode.
+Used only when the user explicitly opts into CLI fallback mode, or when explicit
+transparent output requires the `gpt-image-1.5` fallback path.

-Defaults to gpt-image-1.5 and a structured prompt augmentation workflow.
+Defaults to gpt-image-2 and a structured prompt augmentation workflow.
 """

 from __future__ import annotations
@@ -21,20 +22,26 @@ from typing import Any, Dict, Iterable, List, Optional, Tuple

 from io import BytesIO

-DEFAULT_MODEL = "gpt-image-1.5"
-DEFAULT_SIZE = "1024x1024"
-DEFAULT_QUALITY = "auto"
+DEFAULT_MODEL = "gpt-image-2"
+DEFAULT_SIZE = "auto"
+DEFAULT_QUALITY = "medium"
 DEFAULT_OUTPUT_FORMAT = "png"
 DEFAULT_CONCURRENCY = 5
 DEFAULT_DOWNSCALE_SUFFIX = "-web"
 DEFAULT_OUTPUT_PATH = "output/imagegen/output.png"
 GPT_IMAGE_MODEL_PREFIX = "gpt-image-"

-ALLOWED_SIZES = {"1024x1024", "1536x1024", "1024x1536", "auto"}
+ALLOWED_LEGACY_SIZES = {"1024x1024", "1536x1024", "1024x1536", "auto"}
 ALLOWED_QUALITIES = {"low", "medium", "high", "auto"}
 ALLOWED_BACKGROUNDS = {"transparent", "opaque", "auto", None}
 ALLOWED_INPUT_FIDELITIES = {"low", "high", None}

+GPT_IMAGE_2_MODEL = "gpt-image-2"
+GPT_IMAGE_2_MIN_PIXELS = 655_360
+GPT_IMAGE_2_MAX_PIXELS = 8_294_400
+GPT_IMAGE_2_MAX_EDGE = 3840
+GPT_IMAGE_2_MAX_RATIO = 3.0
+
 MAX_IMAGE_BYTES = 50 * 1024 * 1024
 MAX_BATCH_JOBS = 500

@@ -104,10 +111,46 @@ def _normalize_output_format(fmt: Optional[str]) -> str:
    return "jpeg" if fmt == "jpg" else fmt


-def _validate_size(size: str) -> None:
-    if size not in ALLOWED_SIZES:
+def _parse_size(size: str) -> Optional[Tuple[int, int]]:
+    match = re.fullmatch(r"([1-9][0-9]*)x([1-9][0-9]*)", size)
+    if not match:
+        return None
+    return int(match.group(1)), int(match.group(2))
+
+
+def _validate_gpt_image_2_size(size: str) -> None:
+    if size == "auto":
+        return
+
+    parsed = _parse_size(size)
+    if parsed is None:
+        _die("size must be auto or WIDTHxHEIGHT, for example 1024x1024.")
+
+    width, height = parsed
+    max_edge = max(width, height)
+    min_edge = min(width, height)
+    total_pixels = width * height
+
+    if max_edge > GPT_IMAGE_2_MAX_EDGE:
+        _die("gpt-image-2 size maximum edge length must be less than or equal to 3840px.")
+    if width % 16 != 0 or height % 16 != 0:
+        _die("gpt-image-2 size width and height must be multiples of 16px.")
+    if max_edge / min_edge > GPT_IMAGE_2_MAX_RATIO:
+        _die("gpt-image-2 size long edge to short edge ratio must not exceed 3:1.")
+    if total_pixels < GPT_IMAGE_2_MIN_PIXELS or total_pixels > GPT_IMAGE_2_MAX_PIXELS:
        _die(
-            "size must be one of 1024x1024, 1536x1024, 1024x1536, or auto for GPT image models."
+            "gpt-image-2 size total pixels must be at least 655,360 and no more than 8,294,400."
+        )
+
+
+def _validate_size(size: str, model: str) -> None:
+    if model == GPT_IMAGE_2_MODEL:
+        _validate_gpt_image_2_size(size)
+        return
+
+    if size not in ALLOWED_LEGACY_SIZES:
+        _die(
+            "size must be one of 1024x1024, 1536x1024, 1024x1536, or auto for this GPT Image model."
        )


@@ -138,17 +181,38 @@ def _validate_transparency(background: Optional[str], output_format: str) -> Non
        _die("transparent background requires output-format png or webp.")


+def _validate_model_specific_options(
+    *,
+    model: str,
+    background: Optional[str],
+    input_fidelity: Optional[str] = None,
+) -> None:
+    if model != GPT_IMAGE_2_MODEL:
+        return
+    if background == "transparent":
+        _die(
+            "transparent backgrounds are not supported in gpt-image-2, the latest model. "
+            "Use --model gpt-image-1.5 --background transparent --output-format png instead."
+        )
+    if input_fidelity is not None:
+        _die(
+            "input_fidelity is not supported in gpt-image-2 because image inputs always use high fidelity for this model."
+        )
+
+
 def _validate_generate_payload(payload: Dict[str, Any]) -> None:
-    _validate_model(str(payload.get("model", DEFAULT_MODEL)))
+    model = str(payload.get("model", DEFAULT_MODEL))
+    _validate_model(model)
    n = int(payload.get("n", 1))
    if n < 1 or n > 10:
        _die("n must be between 1 and 10")
    size = str(payload.get("size", DEFAULT_SIZE))
    quality = str(payload.get("quality", DEFAULT_QUALITY))
    background = payload.get("background")
-    _validate_size(size)
+    _validate_size(size, model)
    _validate_quality(quality)
    _validate_background(background)
+    _validate_model_specific_options(model=model, background=background)
    oc = payload.get("output_compression")
    if oc is not None and not (0 <= int(oc) <= 100):
        _die("output_compression must be between 0 and 100")
@@ -912,10 +976,15 @@ def main() -> int:
    if getattr(args, "downscale_max_dim", None) is not None and args.downscale_max_dim < 1:
        _die("--downscale-max-dim must be >= 1")

-    _validate_size(args.size)
+    _validate_model(args.model)
+    _validate_size(args.size, args.model)
    _validate_quality(args.quality)
    _validate_background(args.background)
-    _validate_model(args.model)
+    _validate_model_specific_options(
+        model=args.model,
+        background=args.background,
+        input_fidelity=getattr(args, "input_fidelity", None),
+    )
    _ensure_api_key(args.dry_run)

    args.func(args)
--- a/dotfiles/agents/skills/.system/imagegen/scripts/remove_chroma_key.py
+++ b/dotfiles/agents/skills/.system/imagegen/scripts/remove_chroma_key.py
@@ -0,0 +1,440 @@
+#!/usr/bin/env python3
+"""Remove a solid chroma-key background from an image.
+
+This helper supports the imagegen skill's built-in-first transparent workflow:
+generate an image on a flat key color, then convert that key color to alpha.
+"""
+
+from __future__ import annotations
+
+import argparse
+from io import BytesIO
+from pathlib import Path
+import re
+from statistics import median
+import sys
+from typing import Tuple
+
+
+Color = Tuple[int, int, int]
+KEY_DOMINANCE_THRESHOLD = 16.0
+ALPHA_NOISE_FLOOR = 8
+
+
+def _die(message: str, code: int = 1) -> None:
+    print(f"Error: {message}", file=sys.stderr)
+    raise SystemExit(code)
+
+
+def _dependency_hint(package: str) -> str:
+    return (
+        "Activate the repo-selected environment first, then install it with "
+        f"`uv pip install {package}`. If this repo uses a local virtualenv, start with "
+        "`source .venv/bin/activate`; otherwise use this repo's configured shared fallback "
+        "environment."
+    )
+
+
+def _load_pillow():
+    try:
+        from PIL import Image, ImageFilter
+    except ImportError:
+        _die(f"Pillow is required for chroma-key removal. {_dependency_hint('pillow')}")
+    return Image, ImageFilter
+
+
+def _parse_key_color(raw: str) -> Color:
+    value = raw.strip()
+    match = re.fullmatch(r"#?([0-9a-fA-F]{6})", value)
+    if not match:
+        _die("key color must be a hex RGB value like #00ff00.")
+    hex_value = match.group(1)
+    return (
+        int(hex_value[0:2], 16),
+        int(hex_value[2:4], 16),
+        int(hex_value[4:6], 16),
+    )
+
+
+def _validate_args(args: argparse.Namespace) -> None:
+    if args.tolerance < 0 or args.tolerance > 255:
+        _die("--tolerance must be between 0 and 255.")
+    if args.transparent_threshold < 0 or args.transparent_threshold > 255:
+        _die("--transparent-threshold must be between 0 and 255.")
+    if args.opaque_threshold < 0 or args.opaque_threshold > 255:
+        _die("--opaque-threshold must be between 0 and 255.")
+    if args.soft_matte and args.transparent_threshold >= args.opaque_threshold:
+        _die("--transparent-threshold must be lower than --opaque-threshold.")
+    if args.edge_feather < 0 or args.edge_feather > 64:
+        _die("--edge-feather must be between 0 and 64.")
+    if args.edge_contract < 0 or args.edge_contract > 16:
+        _die("--edge-contract must be between 0 and 16.")
+
+    src = Path(args.input)
+    if not src.exists():
+        _die(f"Input image not found: {src}")
+
+    out = Path(args.out)
+    if out.exists() and not args.force:
+        _die(f"Output already exists: {out} (use --force to overwrite)")
+
+    if out.suffix.lower() not in {".png", ".webp"}:
+        _die("--out must end in .png or .webp so the alpha channel is preserved.")
+
+
+def _channel_distance(a: Color, b: Color) -> int:
+    return max(abs(a[0] - b[0]), abs(a[1] - b[1]), abs(a[2] - b[2]))
+
+
+def _clamp_channel(value: float) -> int:
+    return max(0, min(255, int(round(value))))
+
+
+def _smoothstep(value: float) -> float:
+    value = max(0.0, min(1.0, value))
+    return value * value * (3.0 - 2.0 * value)
+
+
+def _soft_alpha(distance: int, transparent_threshold: float, opaque_threshold: float) -> int:
+    if distance <= transparent_threshold:
+        return 0
+    if distance >= opaque_threshold:
+        return 255
+    ratio = (float(distance) - transparent_threshold) / (
+        opaque_threshold - transparent_threshold
+    )
+    return _clamp_channel(255.0 * _smoothstep(ratio))
+
+
+def _dominance_alpha(rgb: Color, key: Color) -> int:
+    spill_channels = _spill_channels(key)
+    if not spill_channels:
+        return 255
+
+    channels = [float(value) for value in rgb]
+    non_spill = [idx for idx in range(3) if idx not in spill_channels]
+    key_strength = (
+        min(channels[idx] for idx in spill_channels)
+        if len(spill_channels) > 1
+        else channels[spill_channels[0]]
+    )
+    non_key_strength = max((channels[idx] for idx in non_spill), default=0.0)
+    dominance = key_strength - non_key_strength
+    if dominance <= 0:
+        return 255
+
+    denominator = max(1.0, float(max(key)) - non_key_strength)
+    alpha = 1.0 - min(1.0, dominance / denominator)
+    return _clamp_channel(alpha * 255.0)
+
+
+def _spill_channels(key: Color) -> list[int]:
+    key_max = max(key)
+    if key_max < 128:
+        return []
+    return [idx for idx, value in enumerate(key) if value >= key_max - 16 and value >= 128]
+
+
+def _key_channel_dominance(rgb: Color, key: Color) -> float:
+    spill_channels = _spill_channels(key)
+    if not spill_channels:
+        return 0.0
+
+    channels = [float(value) for value in rgb]
+    non_spill = [idx for idx in range(3) if idx not in spill_channels]
+    key_strength = (
+        min(channels[idx] for idx in spill_channels)
+        if len(spill_channels) > 1
+        else channels[spill_channels[0]]
+    )
+    non_key_strength = max((channels[idx] for idx in non_spill), default=0.0)
+    return key_strength - non_key_strength
+
+
+def _looks_key_colored(rgb: Color, key: Color, distance: int) -> bool:
+    if distance <= 32:
+        return True
+
+    spill_channels = _spill_channels(key)
+    if not spill_channels:
+        return True
+
+    return _key_channel_dominance(rgb, key) >= KEY_DOMINANCE_THRESHOLD
+
+
+def _cleanup_spill(rgb: Color, key: Color, alpha: int = 255) -> Color:
+    if alpha >= 252:
+        return rgb
+
+    spill_channels = _spill_channels(key)
+    if not spill_channels:
+        return rgb
+
+    channels = [float(value) for value in rgb]
+    non_spill = [idx for idx in range(3) if idx not in spill_channels]
+    if non_spill:
+        anchor = max(channels[idx] for idx in non_spill)
+        cap = max(0.0, anchor - 1.0)
+        for idx in spill_channels:
+            if channels[idx] > cap:
+                channels[idx] = cap
+
+    return (
+        _clamp_channel(channels[0]),
+        _clamp_channel(channels[1]),
+        _clamp_channel(channels[2]),
+    )
+
+
+def _apply_alpha_to_image(
+    image,
+    *,
+    key: Color,
+    tolerance: int,
+    spill_cleanup: bool,
+    soft_matte: bool,
+    transparent_threshold: float,
+    opaque_threshold: float,
+) -> int:
+    pixels = image.load()
+    width, height = image.size
+    transparent = 0
+
+    for y in range(height):
+        for x in range(width):
+            red, green, blue, alpha = pixels[x, y]
+            rgb = (red, green, blue)
+            distance = _channel_distance(rgb, key)
+            key_like = _looks_key_colored(rgb, key, distance)
+            output_alpha = (
+                min(
+                    _soft_alpha(distance, transparent_threshold, opaque_threshold),
+                    _dominance_alpha(rgb, key),
+                )
+                if soft_matte and key_like
+                else (0 if distance <= tolerance else 255)
+            )
+            output_alpha = int(round(output_alpha * (alpha / 255.0)))
+            if 0 < output_alpha <= ALPHA_NOISE_FLOOR:
+                output_alpha = 0
+
+            if output_alpha == 0:
+                pixels[x, y] = (0, 0, 0, 0)
+                transparent += 1
+                continue
+
+            if spill_cleanup and key_like:
+                red, green, blue = _cleanup_spill(rgb, key, output_alpha)
+            pixels[x, y] = (red, green, blue, output_alpha)
+
+    return transparent
+
+
+def _contract_alpha(image, pixels: int):
+    if pixels == 0:
+        return image
+
+    _, ImageFilter = _load_pillow()
+    alpha = image.getchannel("A")
+    for _ in range(pixels):
+        alpha = alpha.filter(ImageFilter.MinFilter(3))
+    image.putalpha(alpha)
+    return image
+
+
+def _apply_edge_feather(image, radius: float):
+    if radius == 0:
+        return image
+
+    _, ImageFilter = _load_pillow()
+    alpha = image.getchannel("A")
+    alpha = alpha.filter(ImageFilter.GaussianBlur(radius=radius))
+    image.putalpha(alpha)
+    return image
+
+
+def _encode_image(image, output_format: str) -> bytes:
+    out = BytesIO()
+    image.save(out, format=output_format.upper())
+    return out.getvalue()
+
+
+def _alpha_counts(image) -> tuple[int, int, int]:
+    pixels = image.load()
+    width, height = image.size
+    total = 0
+    transparent = 0
+    partial = 0
+
+    for y in range(height):
+        for x in range(width):
+            alpha = pixels[x, y][3]
+            total += 1
+            if alpha == 0:
+                transparent += 1
+            elif alpha < 255:
+                partial += 1
+
+    return total, transparent, partial
+
+
+def _sample_border_key(image, mode: str) -> Color:
+    width, height = image.size
+    pixels = image.load()
+    samples: list[Color] = []
+
+    if mode == "corners":
+        patch = max(1, min(width, height, 12))
+        boxes = [
+            (0, 0, patch, patch),
+            (width - patch, 0, width, patch),
+            (0, height - patch, patch, height),
+            (width - patch, height - patch, width, height),
+        ]
+        for left, top, right, bottom in boxes:
+            for y in range(top, bottom):
+                for x in range(left, right):
+                    red, green, blue = pixels[x, y][:3]
+                    samples.append((red, green, blue))
+    else:
+        band = max(1, min(width, height, 6))
+        step = max(1, min(width, height) // 256)
+        for x in range(0, width, step):
+            for y in range(band):
+                red, green, blue = pixels[x, y][:3]
+                samples.append((red, green, blue))
+                red, green, blue = pixels[x, height - 1 - y][:3]
+                samples.append((red, green, blue))
+        for y in range(0, height, step):
+            for x in range(band):
+                red, green, blue = pixels[x, y][:3]
+                samples.append((red, green, blue))
+                red, green, blue = pixels[width - 1 - x, y][:3]
+                samples.append((red, green, blue))
+
+    if not samples:
+        _die("Could not sample background key color from image border.")
+
+    return (
+        int(round(median(sample[0] for sample in samples))),
+        int(round(median(sample[1] for sample in samples))),
+        int(round(median(sample[2] for sample in samples))),
+    )
+
+
+def _remove_chroma_key(args: argparse.Namespace) -> None:
+    Image, _ = _load_pillow()
+    src = Path(args.input)
+    out = Path(args.out)
+
+    with Image.open(src) as image:
+        rgba = image.convert("RGBA")
+    key = (
+        _sample_border_key(rgba, args.auto_key)
+        if args.auto_key != "none"
+        else _parse_key_color(args.key_color)
+    )
+
+    transparent = _apply_alpha_to_image(
+        rgba,
+        key=key,
+        tolerance=args.tolerance,
+        spill_cleanup=args.spill_cleanup,
+        soft_matte=args.soft_matte,
+        transparent_threshold=args.transparent_threshold,
+        opaque_threshold=args.opaque_threshold,
+    )
+    rgba = _contract_alpha(rgba, args.edge_contract)
+    rgba = _apply_edge_feather(rgba, args.edge_feather)
+
+    total, transparent_after, partial_after = _alpha_counts(rgba)
+
+    out.parent.mkdir(parents=True, exist_ok=True)
+    output_format = "PNG" if out.suffix.lower() == ".png" else "WEBP"
+    out.write_bytes(_encode_image(rgba, output_format))
+
+    print(f"Wrote {out}")
+    print(f"Key color: #{key[0]:02x}{key[1]:02x}{key[2]:02x}")
+    print(f"Transparent pixels: {transparent_after}/{total}")
+    print(f"Partially transparent pixels: {partial_after}/{total}")
+    if transparent == 0:
+        print("Warning: no pixels matched the key color before feathering.", file=sys.stderr)
+
+
+def _build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description="Remove a solid chroma-key background and write an image with alpha."
+    )
+    parser.add_argument("--input", required=True, help="Input image path.")
+    parser.add_argument("--out", required=True, help="Output .png or .webp path.")
+    parser.add_argument(
+        "--key-color",
+        default="#00ff00",
+        help="Hex RGB key color to remove, for example #00ff00.",
+    )
+    parser.add_argument(
+        "--tolerance",
+        type=int,
+        default=12,
+        help="Hard-key per-channel tolerance for matching the key color, 0-255.",
+    )
+    parser.add_argument(
+        "--auto-key",
+        choices=["none", "corners", "border"],
+        default="none",
+        help="Sample the key color from image corners or border instead of --key-color.",
+    )
+    parser.add_argument(
+        "--soft-matte",
+        action="store_true",
+        help="Use a smooth alpha ramp between transparent and opaque thresholds.",
+    )
+    parser.add_argument(
+        "--transparent-threshold",
+        type=float,
+        default=12.0,
+        help="Soft-matte distance at or below which pixels become fully transparent.",
+    )
+    parser.add_argument(
+        "--opaque-threshold",
+        type=float,
+        default=96.0,
+        help="Soft-matte distance at or above which pixels become fully opaque.",
+    )
+    parser.add_argument(
+        "--edge-feather",
+        type=float,
+        default=0.0,
+        help="Optional alpha blur radius for softened edges, 0-64.",
+    )
+    parser.add_argument(
+        "--edge-contract",
+        type=int,
+        default=0,
+        help="Shrink the visible alpha matte by this many pixels before feathering.",
+    )
+    parser.add_argument(
+        "--spill-cleanup",
+        dest="spill_cleanup",
+        action="store_true",
+        help="Reduce obvious key-color spill on opaque pixels.",
+    )
+    parser.add_argument(
+        "--despill",
+        dest="spill_cleanup",
+        action="store_true",
+        help="Alias for --spill-cleanup; decontaminate key-color edge spill.",
+    )
+    parser.add_argument("--force", action="store_true", help="Overwrite an existing output file.")
+    return parser
+
+
+def main() -> None:
+    parser = _build_parser()
+    args = parser.parse_args()
+    _validate_args(args)
+    _remove_chroma_key(args)
+
+
+if __name__ == "__main__":
+    main()
--- a/dotfiles/agents/skills/.system/openai-docs/SKILL.md
+++ b/dotfiles/agents/skills/.system/openai-docs/SKILL.md
@@ -1,19 +1,22 @@
 ---
 name: "openai-docs"
-description: "Use when the user asks how to build with OpenAI products or APIs and needs up-to-date official documentation with citations, help choosing the latest model for a use case, or explicit GPT-5.4 upgrade and prompt-upgrade guidance; prioritize OpenAI docs MCP tools, use bundled references only as helper context, and restrict any fallback browsing to official OpenAI domains."
+description: "Use when the user asks how to build with OpenAI products or APIs and needs up-to-date official documentation with citations, help choosing the latest model for a use case, or model upgrade and prompt-upgrade guidance; prioritize OpenAI docs MCP tools, use bundled references only as helper context, and restrict any fallback browsing to official OpenAI domains."
 ---


 # OpenAI Docs

-Provide authoritative, current guidance from OpenAI developer docs using the developers.openai.com MCP server. Always prioritize the developer docs MCP tools over web.run for OpenAI-related questions. This skill may also load targeted files from `references/` for model-selection and GPT-5.4-specific requests, but current OpenAI docs remain authoritative. Only if the MCP server is installed and returns no meaningful results should you fall back to web search.
+Provide authoritative, current guidance from OpenAI developer docs using the developers.openai.com MCP server. Always prioritize the developer docs MCP tools over web.run for OpenAI-related questions. This skill may also load targeted files from `references/` for model-selection, model-upgrade, and prompt-upgrade requests, but current OpenAI docs remain authoritative. Only if the MCP server is installed and returns no meaningful results should you fall back to web search.

 ## Quick start

 - Use `mcp__openaiDeveloperDocs__search_openai_docs` to find the most relevant doc pages.
 - Use `mcp__openaiDeveloperDocs__fetch_openai_doc` to pull exact sections and quote/paraphrase accurately.
 - Use `mcp__openaiDeveloperDocs__list_openai_docs` only when you need to browse or discover pages without a clear query.
- Load only the relevant file from `references/` when the question is about model selection or a GPT-5.4 upgrade.
+- For model-selection, "latest model", or default-model questions, fetch `https://developers.openai.com/api/docs/guides/latest-model.md` first. If that is unavailable, load `references/latest-model.md`.
+- For model upgrades or prompt upgrades, run `node scripts/resolve-latest-model-info.js` from this skill directory when the script is present, then follow `references/upgrade-guide.md` unless the resolver returns newer guidance for a dynamic latest/current/default request.
+- Preserve explicit target requests: if the user names a target model like "migrate to GPT-5.4", keep that requested target even if `latest-model.md` names a newer model. Mention newer guidance only as optional.
+- If current remote guidance is needed, fetch both the returned migration and prompting guide URLs directly. If direct fetch fails, use MCP/search fallback; if that also fails, use bundled fallback references and disclose the fallback.

 ## OpenAI product snapshots

@@ -37,29 +40,39 @@ If MCP tools fail or no OpenAI docs resources are available:

 ## Workflow

-1. Clarify the product scope and whether the request is general docs lookup, model selection, a GPT-5.4 upgrade, or a GPT-5.4 prompt upgrade.
-2. If it is a model-selection request, load `references/latest-model.md`.
-3. If it is an explicit GPT-5.4 upgrade request, load `references/upgrading-to-gpt-5p4.md`.
-4. If the upgrade may require prompt changes, or the workflow is research-heavy, tool-heavy, coding-oriented, multi-agent, or long-running, also load `references/gpt-5p4-prompting-guide.md`.
-5. Search docs with a precise query.
-6. Fetch the best page and the exact section needed (use `anchor` when possible).
-7. For GPT-5.4 upgrade reviews, always make the per-usage-site output explicit: target model, starting reasoning recommendation, `phase` assessment when relevant, prompt blocks, and compatibility status.
-8. Answer with concise guidance and cite the doc source, using the reference files only as helper context.
+1. Clarify whether the request is general docs lookup, model selection, a model-string upgrade, prompt-upgrade guidance, or broader API/provider migration.
+2. For model-selection or upgrade requests, prefer current remote docs over bundled references when the user asks for latest/current/default guidance.
+   - Fetch `https://developers.openai.com/api/docs/guides/latest-model.md`.
+   - Find the latest model ID and explicit migration or prompt-guidance links.
+   - Prefer explicit links from the latest-model page over derived URLs.
+   - For explicit named-model requests, preserve the requested model target and do not silently retarget to the latest model. Mention newer remote guidance only as optional.
+   - For dynamic latest/current/default upgrades, run `node scripts/resolve-latest-model-info.js`, then fetch both returned guide URLs directly when possible.
+   - If direct guide fetch fails, use the developer-docs MCP tools or official OpenAI-domain search to find the same guide content.
+   - If remote docs are unavailable, use bundled fallback references and say that fallback guidance was used.
+3. For model upgrades, keep changes narrow: update active OpenAI API model defaults and directly related prompts only when safe.
+4. Leave historical docs, examples, eval baselines, fixtures, provider comparisons, provider registries, pricing tables, alias defaults, low-cost fallback paths, and ambiguous older model usage unchanged unless the user explicitly asks to upgrade them.
+5. Do not perform SDK, tooling, IDE, plugin, shell, auth, or provider-environment migrations as part of a model-and-prompt upgrade.
+6. If an upgrade needs API-surface changes, schema rewiring, tool-handler changes, or implementation work beyond a literal model-string replacement and prompt edits, report it as blocked or confirmation-needed.
+7. For general docs lookup, search docs with a precise query, fetch the best page and exact section needed, and answer with concise citations.

 ## Reference map

 Read only what you need:

- `references/latest-model.md` -> model-selection and "best/latest/current model" questions; verify every recommendation against current OpenAI docs before answering.
- `references/upgrading-to-gpt-5p4.md` -> only for explicit GPT-5.4 upgrade and upgrade-planning requests; verify the checklist and compatibility guidance against current OpenAI docs before answering.
- `references/gpt-5p4-prompting-guide.md` -> prompt rewrites and prompt-behavior upgrades for GPT-5.4; verify prompting guidance against current OpenAI docs before answering.
+- `https://developers.openai.com/api/docs/guides/latest-model.md` -> current model-selection and "best/latest/current model" questions.
+- `references/latest-model.md` -> bundled fallback for model-selection and "best/latest/current model" questions.
+- `references/upgrade-guide.md` -> bundled fallback for model upgrade and upgrade-planning requests.
+- `references/prompting-guide.md` -> bundled fallback for prompt rewrites and prompt-behavior upgrades.

 ## Quality rules

 - Treat OpenAI docs as the source of truth; avoid speculation.
+- Keep migration changes narrow and behavior-preserving.
+- Prefer prompt-only upgrades when possible.
+- Do not invent pricing, availability, parameters, API changes, or breaking changes.
 - Keep quotes short and within policy limits; prefer paraphrase with citations.
 - If multiple pages differ, call out the difference and cite both.
- Reference files are convenience guides only; for volatile guidance such as recommended models, upgrade instructions, or prompting advice, current OpenAI docs always win.
+- If official docs and repo behavior disagree, state the conflict and stop before making broad edits.
 - If docs do not cover the user’s need, say so and offer next steps.

 ## Tooling notes
--- a/dotfiles/agents/skills/.system/openai-docs/references/gpt-5p4-prompting-guide.md
+++ b/dotfiles/agents/skills/.system/openai-docs/references/gpt-5p4-prompting-guide.md
@@ -1,433 +0,0 @@
-# GPT-5.4 prompting upgrade guide
-
-Use this guide when prompts written for older models need to be adapted for GPT-5.4 during an upgrade. Start lean: keep the model-string change narrow, preserve the original task intent, and add only the smallest prompt changes needed to recover behavior.
-
-## Default upgrade posture
-
- Start with `model string only` whenever the old prompt is already short, explicit, and task-bounded.
- Move to `model string + light prompt rewrite` only when regressions appear in completeness, persistence, citation quality, verification, or verbosity.
- Prefer one or two targeted prompt additions over a broad rewrite.
- Treat reasoning effort as a last-mile knob. Start lower, then increase only after prompt-level fixes and evals.
- Before increasing reasoning effort, first add a completeness contract, a verification loop, and tool persistence rules - depending on the usage case.
- If the workflow clearly depends on implementation changes rather than prompt changes, treat it as blocked for prompt-only upgrade guidance.
- Do not classify a case as blocked just because the workflow uses tools; block only if the upgrade requires changing tool definitions, wiring, or other implementation details.
-
-## Behavioral differences to account for
-
-Current GPT-5.4 upgrade guidance suggests these strengths:
-
- stronger personality and tone adherence, with less drift over long answers
- better long-horizon and agentic workflow stamina
- stronger spreadsheet, finance, and formatting tasks
- more efficient tool selection and fewer unnecessary calls by default
- stronger structured generation and classification reliability
-
-The main places where prompt guidance still helps are:
-
- retrieval-heavy workflows that need persistent tool use and explicit completeness
- research and citation discipline
- verification before irreversible or high-impact actions
- terminal and tool workflow hygiene
- defaults and implied follow-through
- verbosity control for compact, information-dense answers
-
-Start with the smallest set of instructions that preserves correctness. Add the prompt blocks below only for workflows that actually need them.
-
-## Prompt rewrite patterns
-
-| Older prompt pattern | GPT-5.4 adjustment | Why | Example addition |
-| --- | --- | --- | --- |
-| Long, repetitive instructions that compensate for weaker instruction following | Remove duplicate scaffolding and keep only the constraints that materially change behavior | GPT-5.4 usually needs less repeated steering | Replace repeated reminders with one concise rule plus a verification block |
-| Fast assistant prompt with no verbosity control | Keep the prompt as-is first; add a verbosity clamp only if outputs become too long | Many GPT-4o or GPT-4.1 upgrades work with just a model-string swap | Add `output_verbosity_spec` only after a verbosity regression |
-| Tool-heavy agent prompt that assumes the model will keep searching until complete | Add persistence and verification rules | GPT-5.4 may use fewer tool calls by default for efficiency | Add `tool_persistence_rules` and `verification_loop` |
-| Tool-heavy workflow where later actions depend on earlier lookup or retrieval | Add prerequisite and missing-context rules before action steps | GPT-5.4 benefits from explicit dependency-aware routing when context is still thin | Add `dependency_checks` and `missing_context_gating` |
-| Retrieval workflow with several independent lookups | Add selective parallelism guidance | GPT-5.4 is strong at parallel tool use, but should not parallelize dependent steps | Add `parallel_tool_calling` |
-| Batch workflow prompt that often misses items | Add an explicit completeness contract | Item accounting benefits from direct instruction | Add `completeness_contract` |
-| Research prompt that needs grounding and citation discipline | Add research, citation, and empty-result recovery blocks | Multi-pass retrieval is stronger when the model is told how to react to weak or empty search results | Add `research_mode`, `citation_rules`, and `empty_result_handling`; add `tool_persistence_rules` when retrieval tools are already in use |
-| Coding or terminal prompt with shell misuse or early stop failures | Keep the same tool surface and add terminal hygiene and verification instructions | Tool-using coding workflows are not blocked just because tools exist; they usually need better prompt steering, not host rewiring | Add `terminal_tool_hygiene` and `verification_loop`, optionally `tool_persistence_rules` |
-| Multi-agent or support-triage workflow with escalation or completeness requirements | Add one lightweight control block for persistence, completeness, or verification | GPT-5.4 can be more efficient by default, so multi-step support flows benefit from an explicit completion or verification contract | Add at least one of `tool_persistence_rules`, `completeness_contract`, or `verification_loop` |
-
-## Prompt blocks
-
-Use these selectively. Do not add all of them by default.
-
-### `output_verbosity_spec`
-
-Use when:
-
- the upgraded model gets too wordy
- the host needs compact, information-dense answers
- the workflow benefits from a short overview plus a checklist
-
-```text
-<output_verbosity_spec>
- Default: 3-6 sentences or up to 6 bullets.
- If the user asked for a doc or report, use headings with short bullets.
- For multi-step tasks:
-  - Start with 1 short overview paragraph.
-  - Then provide a checklist with statuses: [done], [todo], or [blocked].
- Avoid repeating the user's request.
- Prefer compact, information-dense writing.
-</output_verbosity_spec>
-```
-
-### `default_follow_through_policy`
-
-Use when:
-
- the host expects the model to proceed on reversible, low-risk steps
- the upgraded model becomes too conservative or asks for confirmation too often
-
-```text
-<default_follow_through_policy>
- If the user's intent is clear and the next step is reversible and low-risk, proceed without asking permission.
- Only ask permission if the next step is:
-  (a) irreversible,
-  (b) has external side effects, or
-  (c) requires missing sensitive information or a choice that materially changes outcomes.
- If proceeding, state what you did and what remains optional.
-</default_follow_through_policy>
-```
-
-### `instruction_priority`
-
-Use when:
-
- users often change task shape, format, or tone mid-conversation
- the host needs an explicit override policy instead of relying on defaults
-
-```text
-<instruction_priority>
- User instructions override default style, tone, formatting, and initiative preferences.
- Safety, honesty, privacy, and permission constraints do not yield.
- If a newer user instruction conflicts with an earlier one, follow the newer instruction.
- Preserve earlier instructions that do not conflict.
-</instruction_priority>
-```
-
-### `tool_persistence_rules`
-
-Use when:
-
- the workflow needs multiple retrieval or verification steps
- the model starts stopping too early because it is trying to save tool calls
-
-```text
-<tool_persistence_rules>
- Use tools whenever they materially improve correctness, completeness, or grounding.
- Do not stop early just to save tool calls.
- Keep calling tools until:
-  (1) the task is complete, and
-  (2) verification passes.
- If a tool returns empty or partial results, retry with a different strategy.
-</tool_persistence_rules>
-```
-
-### `dig_deeper_nudge`
-
-Use when:
-
- the model is too literal or stops at the first plausible answer
- the task is safety- or accuracy-sensitive and needs a small initiative nudge before raising reasoning effort
-
-```text
-<dig_deeper_nudge>
- Do not stop at the first plausible answer.
- Look for second-order issues, edge cases, and missing constraints.
- If the task is safety- or accuracy-critical, perform at least one verification step.
-</dig_deeper_nudge>
-```
-
-### `dependency_checks`
-
-Use when:
-
- later actions depend on prerequisite lookup, memory retrieval, or discovery steps
- the model may be tempted to skip prerequisite work because the intended end state seems obvious
-
-```text
-<dependency_checks>
- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval is required.
- Do not skip prerequisite steps just because the intended final action seems obvious.
- If a later step depends on the output of an earlier one, resolve that dependency first.
-</dependency_checks>
-```
-
-### `parallel_tool_calling`
-
-Use when:
-
- the workflow has multiple independent retrieval steps
- wall-clock time matters but some steps still need sequencing
-
-```text
-<parallel_tool_calling>
- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.
- Do not parallelize steps with prerequisite dependencies or where one result determines the next action.
- After parallel retrieval, pause to synthesize before making more calls.
- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.
-</parallel_tool_calling>
-```
-
-### `completeness_contract`
-
-Use when:
-
- the task involves batches, lists, enumerations, or multiple deliverables
- missing items are a common failure mode
-
-```text
-<completeness_contract>
- Deliver all requested items.
- Maintain an itemized checklist of deliverables.
- For lists or batches:
-  - state the expected count,
-  - enumerate items 1..N,
-  - confirm that none are missing before finalizing.
- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.
-</completeness_contract>
-```
-
-### `empty_result_handling`
-
-Use when:
-
- the workflow frequently performs search, CRM, logs, or retrieval steps
- no-results failures are often false negatives
-
-```text
-<empty_result_handling>
-If a lookup returns empty or suspiciously small results:
- Do not conclude that no results exist immediately.
- Try at least 2 fallback strategies, such as a broader query, alternate filters, or another source.
- Only then report that no results were found, along with what you tried.
-</empty_result_handling>
-```
-
-### `verification_loop`
-
-Use when:
-
- the workflow has downstream impact
- accuracy, formatting, or completeness regressions matter
-
-```text
-<verification_loop>
-Before finalizing:
- Check correctness: does the output satisfy every requirement?
- Check grounding: are factual claims backed by retrieved sources or tool output?
- Check formatting: does the output match the requested schema or style?
- Check safety and irreversibility: if the next step has external side effects, ask permission first.
-</verification_loop>
-```
-
-### `missing_context_gating`
-
-Use when:
-
- required context is sometimes missing early in the workflow
- the model should prefer retrieval over guessing
-
-```text
-<missing_context_gating>
- If required context is missing, do not guess.
- Prefer the appropriate lookup tool when the context is retrievable; ask a minimal clarifying question only when it is not.
- If you must proceed, label assumptions explicitly and choose a reversible action.
-</missing_context_gating>
-```
-
-### `action_safety`
-
-Use when:
-
- the agent will actively take actions through tools
- the host benefits from a short pre-flight and post-flight execution frame
-
-```text
-<action_safety>
- Pre-flight: summarize the intended action and parameters in 1-2 lines.
- Execute via tool.
- Post-flight: confirm the outcome and any validation that was performed.
-</action_safety>
-```
-
-### `citation_rules`
-
-Use when:
-
- the workflow produces cited answers
- fabricated citations or wrong citation formats are costly
-
-```text
-<citation_rules>
- Only cite sources that were actually retrieved in this session.
- Never fabricate citations, URLs, IDs, or quote spans.
- If you cannot find a source for a claim, say so and either:
-  - soften the claim, or
-  - explain how to verify it with tools.
- Use exactly the citation format required by the host application.
-</citation_rules>
-```
-
-### `research_mode`
-
-Use when:
-
- the workflow is research-heavy
- the host uses web search or retrieval tools
-
-```text
-<research_mode>
- Do research in 3 passes:
-  1) Plan: list 3-6 sub-questions to answer.
-  2) Retrieve: search each sub-question and follow 1-2 second-order leads.
-  3) Synthesize: resolve contradictions and write the final answer with citations.
- Stop only when more searching is unlikely to change the conclusion.
-</research_mode>
-```
-
-If your host environment uses a specific research tool or requires a submit step, combine this with the host's finalization contract.
-
-### `structured_output_contract`
-
-Use when:
-
- the host depends on strict JSON, SQL, or other structured output
-
-```text
-<structured_output_contract>
- Output only the requested format.
- Do not add prose or markdown fences unless they were requested.
- Validate that parentheses and brackets are balanced.
- Do not invent tables or fields.
- If required schema information is missing, ask for it or return an explicit error object.
-</structured_output_contract>
-```
-
-### `bbox_extraction_spec`
-
-Use when:
-
- the workflow extracts OCR boxes, document regions, or other coordinates
- layout drift or missed dense regions are common failure modes
-
-```text
-<bbox_extraction_spec>
- Use the specified coordinate format exactly, such as [x1,y1,x2,y2] normalized to 0..1.
- For each box, include page, label, text snippet, and confidence.
- Add a vertical-drift sanity check so boxes stay aligned with the correct line of text.
- If the layout is dense, process page by page and do a second pass for missed items.
-</bbox_extraction_spec>
-```
-
-### `terminal_tool_hygiene`
-
-Use when:
-
- the prompt belongs to a terminal-based or coding-agent workflow
- tool misuse or shell misuse has been observed
-
-```text
-<terminal_tool_hygiene>
- Only run shell commands through the terminal tool.
- Never try to "run" tool names as shell commands.
- If a patch or edit tool exists, use it directly instead of emulating it in bash.
- After changes, run a lightweight verification step such as ls, tests, or a build before declaring the task done.
-</terminal_tool_hygiene>
-```
-
-### `user_updates_spec`
-
-Use when:
-
- the workflow is long-running and user updates matter
-
-```text
-<user_updates_spec>
- Only update the user when starting a new major phase or when the plan changes.
- Each update should contain:
-  - 1 sentence on what changed,
-  - 1 sentence on the next step.
- Do not narrate routine tool calls.
- Keep the user-facing update short, even when the actual work is exhaustive.
-</user_updates_spec>
-```
-
-If you are using [Compaction](https://developers.openai.com/api/docs/guides/compaction) in the Responses API, compact after major milestones, treat compacted items as opaque state, and keep prompts functionally identical after compaction.
-
-## Responses `phase` guidance
-
-For long-running Responses workflows, preambles, or tool-heavy agents that replay assistant items, review whether `phase` is already preserved.
-
- If the host already round-trips `phase`, keep it intact during the upgrade.
- If the host uses `previous_response_id` and does not manually replay assistant items, note that this may reduce manual `phase` handling needs.
- If reliable GPT-5.4 behavior would require adding or preserving `phase` and that would need code edits, treat the case as blocked for prompt-only or model-string-only migration guidance.
-
-## Example upgrade profiles
-
-### GPT-5.2
-
- Use `gpt-5.4`
- Match the current reasoning effort first
- Preserve the existing latency and quality profile before tuning prompt blocks
- If the repo does not expose the exact setting, emit `same` as the starting recommendation
-
-### GPT-5.3-Codex
-
- Use `gpt-5.4`
- Match the current reasoning effort first
- If you need Codex-style speed and efficiency, add verification blocks before increasing reasoning effort
- If the repo does not expose the exact setting, emit `same` as the starting recommendation
-
-### GPT-4o or GPT-4.1 assistant
-
- Use `gpt-5.4`
- Start with `none` reasoning effort
- Add `output_verbosity_spec` only if output becomes too verbose
-
-### Long-horizon agent
-
- Use `gpt-5.4`
- Start with `medium` reasoning effort
- Add `tool_persistence_rules`
- Add `completeness_contract`
- Add `verification_loop`
-
-### Research workflow
-
- Use `gpt-5.4`
- Start with `medium` reasoning effort
- Add `research_mode`
- Add `citation_rules`
- Add `empty_result_handling`
- Add `tool_persistence_rules` when the host already uses web or retrieval tools
- Add `parallel_tool_calling` when the retrieval steps are independent
-
-### Support triage or multi-agent workflow
-
- Use `gpt-5.4`
- Prefer `model string + light prompt rewrite` over `model string only`
- Add at least one of `tool_persistence_rules`, `completeness_contract`, or `verification_loop`
- Add more only if evals show a real regression
-
-### Coding or terminal workflow
-
- Use `gpt-5.4`
- Keep the model-string change narrow
- Match the current reasoning effort first if you are upgrading from GPT-5.3-Codex
- Add `terminal_tool_hygiene`
- Add `verification_loop`
- Add `dependency_checks` when actions depend on prerequisite lookup or discovery
- Add `tool_persistence_rules` if the agent stops too early
- Review whether `phase` is already preserved for long-running Responses flows or assistant preambles
- Do not classify this as blocked just because the workflow uses tools; block only if the upgrade requires changing tool definitions or wiring
- If the repo already uses Responses plus tools and no required host-side change is shown, prefer `model_string_plus_light_prompt_rewrite` over `blocked`
-
-## Prompt regression checklist
-
- Check whether the upgraded prompt still preserves the original task intent.
- Check whether the new prompt is leaner, not just longer.
- Check completeness, citation quality, dependency handling, verification behavior, and verbosity.
- For long-running Responses agents, check whether `phase` handling is already in place or needs implementation work.
- Confirm that each added prompt block addresses an observed regression.
- Remove prompt blocks that are not earning their keep.
--- a/dotfiles/agents/skills/.system/openai-docs/references/latest-model.md
+++ b/dotfiles/agents/skills/.system/openai-docs/references/latest-model.md
@@ -6,15 +6,10 @@ This file is a curated helper. Every recommendation here must be verified agains

 | Model ID | Use for |
 | --- | --- |
-| `gpt-5.4` | Default text plus reasoning for most new apps |
+| `gpt-5.4` | Default text plus reasoning for most new apps, including for coding use-cases |
 | `gpt-5.4-pro` | Only when the user explicitly asks for maximum reasoning or quality; substantially slower and more expensive |
-| `gpt-5-mini` | Cheaper and faster reasoning with good quality |
-| `gpt-5-nano` | High-throughput simple tasks and classification |
-| `gpt-5.4` | Explicit no-reasoning text path via `reasoning.effort: none` |
-| `gpt-4.1-mini` | Cheaper no-reasoning text |
-| `gpt-4.1-nano` | Fastest and cheapest no-reasoning text |
-| `gpt-5.3-codex` | Agentic coding, code editing, and tool-heavy coding workflows |
-| `gpt-5.1-codex-mini` | Cheaper coding workflows |
+| `gpt-5.4-mini` | Cheaper and faster reasoning with good quality, including for coding use-cases |
+| `gpt-5.4-nano` | High-throughput simple tasks and classification |
 | `gpt-image-1.5` | Best image generation and edit quality |
 | `gpt-image-1-mini` | Cost-optimized image generation |
 | `gpt-4o-mini-tts` | Text-to-speech |
--- a/dotfiles/agents/skills/.system/openai-docs/references/prompting-guide.md
+++ b/dotfiles/agents/skills/.system/openai-docs/references/prompting-guide.md
@@ -0,0 +1,599 @@
+# Prompt guidance for GPT-5.4
+
+GPT-5.4, our newest mainline model, is designed to balance long-running task performance, stronger control over style and behavior, and more disciplined execution across complex workflows. Building on advances from GPT-5 through GPT-5.3-Codex, GPT-5.4 improves token efficiency, sustains multi-step workflows more reliably, and performs well on long-horizon tasks.
+
+GPT-5.4 is designed for production-grade assistants and agents that need strong multi-step reasoning, evidence-rich synthesis, and reliable performance over long contexts. It is especially effective when prompts clearly specify the output contract, tool-use expectations, and completion criteria. In practice, the biggest gains come from choosing the right reasoning effort for the task, using explicit grounding and citation rules, and giving the model a precise definition of what "done" looks like. This guide focuses on prompt patterns and migration practices that preserve those efficiency wins. For model capabilities, API parameters, and broader migration guidance, see [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model).
+
+When troubleshooting cases where GPT-5.4 treats an intermediate update as the
+  final answer, verify your integration preserves the assistant message `phase`
+  field correctly. See [Phase parameter](#phase-parameter) for details.
+
+## Understand GPT-5.4 behavior
+
+### Where GPT-5.4 is strongest
+
+GPT-5.4 tends to work especially well in these areas:
+
+- Strong personality and tone adherence, with less drift over long answers
+- Agentic workflow robustness, with a stronger tendency to stick with multi-step work, retry, and complete agent loops end to end
+- Evidence-rich synthesis, especially in long-context or multi-tool workflows
+- Instruction adherence in modular, skill-based, and block-structured prompts when the contract is explicit
+- Long-context analysis across large, messy, or multi-document inputs
+- Batched or parallel tool calling while maintaining tool-call accuracy
+- Spreadsheet, finance, and Excel workflows that need instruction following, formatting fidelity, and stronger self-verification
+
+### Where explicit prompting still helps
+
+Even with those strengths, GPT-5.4 benefits from more explicit guidance in a few recurring patterns:
+
+- Low-context tool routing early in a session, when tool selection can be less reliable
+- Dependency-aware workflows that need explicit prerequisite and downstream-step checks
+- Reasoning effort selection, where higher effort is not always better and the right choice depends on task shape, not intuition
+- Research tasks that require disciplined source collection and consistent citations
+- Irreversible or high-impact actions that require verification before execution
+- Terminal or coding-agent environments where tool boundaries must stay clear
+
+These patterns are observed defaults, not guarantees. Start with the smallest prompt that passes your evals, and add blocks only when they fix a measured failure mode.
+
+## Use core prompt patterns
+
+### Keep outputs compact and structured
+
+To improve token efficiency with GPT-5.4, constrain verbosity and enforce structured output through clear output contracts. In practice, this acts as an additional control layer alongside the `verbosity` parameter in the Responses API, allowing you to guide both how much the model writes and how it structures the output.
+
+```xml
+<output_contract>
+- Return exactly the sections requested, in the requested order.
+- If the prompt defines a preamble, analysis block, or working section, do not treat it as extra output.
+- Apply length limits only to the section they are intended for.
+- If a format is required (JSON, Markdown, SQL, XML), output only that format.
+</output_contract>
+
+<verbosity_controls>
+- Prefer concise, information-dense writing.
+- Avoid repeating the user's request.
+- Keep progress updates brief.
+- Do not shorten the answer so aggressively that required evidence, reasoning, or completion checks are omitted.
+</verbosity_controls>
+```
+
+### Set clear defaults for follow-through
+
+Users often change the task, format, or tone mid-conversation. To keep the assistant aligned, define clear rules for when to proceed, when to ask, and how newer instructions override earlier defaults.
+
+Use a default follow-through policy like this:
+
+```xml
+<default_follow_through_policy>
+- If the user’s intent is clear and the next step is reversible and low-risk, proceed without asking.
+- Ask permission only if the next step is:
+  (a) irreversible,
+  (b) has external side effects (for example sending, purchasing, deleting, or writing to production), or
+  (c) requires missing sensitive information or a choice that would materially change the outcome.
+- If proceeding, briefly state what you did and what remains optional.
+</default_follow_through_policy>
+```
+
+Make instruction priority explicit:
+
+```xml
+<instruction_priority>
+- User instructions override default style, tone, formatting, and initiative preferences.
+- Safety, honesty, privacy, and permission constraints do not yield.
+- If a newer user instruction conflicts with an earlier one, follow the newer instruction.
+- Preserve earlier instructions that do not conflict.
+</instruction_priority>
+```
+
+Higher-priority developer or system instructions remain binding.
+
+**Guidance:** When instructions change mid-conversation, make the update explicit, scoped, and local. State what changed, what still applies, and whether the change affects the next turn or the rest of the conversation.
+
+### Handle mid-conversation instruction updates
+
+For mid-conversation updates, use explicit, scoped steering messages that state:
+
+1. Scope
+2. Override
+3. Carry forward
+
+```text
+<task_update>
+For the next response only:
+- Do not complete the task.
+- Only produce a plan.
+- Keep it to 5 bullets.
+
+All earlier instructions still apply unless they conflict with this update.
+</task_update>
+```
+
+If the task itself changes, say so directly:
+
+```text
+<task_update>
+The task has changed.
+Previous task: complete the workflow.
+Current task: review the workflow and identify risks only.
+
+Rules for this turn:
+- Do not execute actions.
+- Do not call destructive tools.
+- Return exactly:
+  1. Main risks
+  2. Missing information
+  3. Recommended next step
+</task_update>
+```
+
+### Make tool use persistent when correctness depends on it
+
+Use explicit rules to keep tool use thorough, dependency-aware, and appropriately paced, especially in workflows where later actions rely on earlier retrieval or verification. A common failure mode is skipping prerequisites because the right end state seems obvious.
+
+GPT-5.4 can be less reliable at tool routing early in a session, when context is still thin. Prompt for prerequisites, dependency checks, and exact tool intent.
+
+```xml
+<tool_persistence_rules>
+- Use tools whenever they materially improve correctness, completeness, or grounding.
+- Do not stop early when another tool call is likely to materially improve correctness or completeness.
+- Keep calling tools until:
+  (1) the task is complete, and
+  (2) verification passes (see <verification_loop>).
+- If a tool returns empty or partial results, retry with a different strategy.
+</tool_persistence_rules>
+```
+
+This is especially important for workflows where the final action depends on earlier lookup or retrieval steps. One of the most common failure modes is skipping prerequisites because the intended end state seems obvious.
+
+```xml
+<dependency_checks>
+- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval steps are required.
+- Do not skip prerequisite steps just because the intended final action seems obvious.
+- If the task depends on the output of a prior step, resolve that dependency first.
+</dependency_checks>
+```
+
+Prompt for parallelism when the work is independent and wall-clock matters. Prompt for sequencing when dependencies, ambiguity, or irreversible actions matter more than speed.
+
+```xml
+<parallel_tool_calling>
+- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.
+- Do not parallelize steps that have prerequisite dependencies or where one result determines the next action.
+- After parallel retrieval, pause to synthesize the results before making more calls.
+- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.
+</parallel_tool_calling>
+```
+
+### Force completeness on long-horizon tasks
+
+For multi-step workflows, a common failure mode is incomplete execution: the model finishes after partial coverage, misses items in a batch, or treats empty or narrow retrieval as final. GPT-5.4 becomes more reliable when the prompt defines explicit completion rules and recovery behavior.
+
+Coverage can be achieved through sequential or parallel retrieval, but completion rules should remain explicit either way.
+
+```xml
+<completeness_contract>
+- Treat the task as incomplete until all requested items are covered or explicitly marked [blocked].
+- Keep an internal checklist of required deliverables.
+- For lists, batches, or paginated results:
+  - determine expected scope when possible,
+  - track processed items or pages,
+  - confirm coverage before finalizing.
+- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.
+</completeness_contract>
+```
+
+For workflows where empty, partial, or noisy retrieval is common:
+
+```xml
+<empty_result_recovery>
+If a lookup returns empty, partial, or suspiciously narrow results:
+- do not immediately conclude that no results exist,
+- try at least one or two fallback strategies,
+  such as:
+  - alternate query wording,
+  - broader filters,
+  - a prerequisite lookup,
+  - or an alternate source or tool,
+- Only then report that no results were found, along with what you tried.
+</empty_result_recovery>
+```
+
+### Add a verification loop before high-impact actions
+
+Once the workflow appears complete, add a lightweight verification step before returning the answer or taking an irreversible action. This helps catch requirement misses, grounding issues, and format drift before commit.
+
+```xml
+<verification_loop>
+Before finalizing:
+- Check correctness: does the output satisfy every requirement?
+- Check grounding: are factual claims backed by the provided context or tool outputs?
+- Check formatting: does the output match the requested schema or style?
+- Check safety and irreversibility: if the next step has external side effects, ask permission first.
+</verification_loop>
+```
+
+```xml
+<missing_context_gating>
+- If required context is missing, do NOT guess.
+- Prefer the appropriate lookup tool when the missing context is retrievable; ask a minimal clarifying question only when it is not.
+- If you must proceed, label assumptions explicitly and choose a reversible action.
+</missing_context_gating>
+```
+
+For agents that actively take actions, add a short execution frame:
+
+```xml
+<action_safety>
+- Pre-flight: summarize the intended action and parameters in 1-2 lines.
+- Execute via tool.
+- Post-flight: confirm the outcome and any validation that was performed.
+</action_safety>
+```
+
+## Handle specialized workflows
+
+### Choose image detail explicitly for vision and computer use
+
+If your workflow depends on visual precision, specify the image `detail` level in the prompt or integration instead of relying on `auto`. Use `high` for standard high-fidelity image understanding. Use `original` for large, dense, or spatially sensitive images, especially [computer use, localization, OCR, and click-accuracy tasks](https://developers.openai.com/api/docs/guides/tools-computer-use) on `gpt-5.4` and future models. Use `low` only when speed and cost matter more than fine detail. For more details on image detail levels, see the [Images and Vision guide](https://developers.openai.com/api/docs/guides/images-vision).
+
+### Lock research and citations to retrieved evidence
+
+When citation quality matters, make both the source boundary and the format requirement explicit. This helps reduce fabricated references, unsupported claims, and citation-format drift.
+
+```xml
+<citation_rules>
+- Only cite sources retrieved in the current workflow.
+- Never fabricate citations, URLs, IDs, or quote spans.
+- Use exactly the citation format required by the host application.
+- Attach citations to the specific claims they support, not only at the end.
+</citation_rules>
+```
+
+```xml
+<grounding_rules>
+- Base claims only on provided context or tool outputs.
+- If sources conflict, state the conflict explicitly and attribute each side.
+- If the context is insufficient or irrelevant, narrow the answer or say you cannot support the claim.
+- If a statement is an inference rather than a directly supported fact, label it as an inference.
+</grounding_rules>
+```
+
+If your application requires inline citations, require inline citations. If it requires footnotes, require footnotes. The key is to lock the format and prevent the model from improvising unsupported references.
+
+### Research mode
+
+Push GPT-5.4 into a disciplined research mode. Use this pattern for research, review, and synthesis tasks. Do not force it onto short execution tasks or simple deterministic transforms.
+
+```xml
+<research_mode>
+- Do research in 3 passes:
+  1) Plan: list 3-6 sub-questions to answer.
+  2) Retrieve: search each sub-question and follow 1-2 second-order leads.
+  3) Synthesize: resolve contradictions and write the final answer with citations.
+- Stop only when more searching is unlikely to change the conclusion.
+</research_mode>
+```
+
+If your host environment uses a specific research tool or requires a submit step, combine this with the host's finalization contract.
+
+### Clamp strict output formats
+
+For SQL, JSON, or other parse-sensitive outputs, tell GPT-5.4 to emit only the target format and check it before finishing.
+
+```text
+<structured_output_contract>
+- Output only the requested format.
+- Do not add prose or markdown fences unless they were requested.
+- Validate that parentheses and brackets are balanced.
+- Do not invent tables or fields.
+- If required schema information is missing, ask for it or return an explicit error object.
+</structured_output_contract>
+```
+
+If you are extracting document regions or OCR boxes, define the coordinate system and add a drift check:
+
+```text
+<bbox_extraction_spec>
+- Use the specified coordinate format exactly, such as [x1,y1,x2,y2] normalized to 0..1.
+- For each box, include page, label, text snippet, and confidence.
+- Add a vertical-drift sanity check so boxes stay aligned with the correct line of text.
+- If the layout is dense, process page by page and do a second pass for missed items.
+</bbox_extraction_spec>
+```
+
+### Keep tool boundaries explicit in coding and terminal agents
+
+In coding agents, GPT-5.4 works better when the rules for shell access and file editing are unambiguous. This is especially important when you expose tools like [Shell](https://developers.openai.com/api/docs/guides/tools-shell) or [Apply patch](https://developers.openai.com/api/docs/guides/tools-apply-patch).
+
+### User updates
+
+GPT-5.4 does well with brief, outcome-based updates. Reuse the user-updates pattern from the 5.2 guide, but pair it with explicit completion and verification requirements.
+
+Recommended update spec:
+
+```xml
+<user_updates_spec>
+- Only update the user when starting a new major phase or when something changes the plan.
+- Each update: 1 sentence on outcome + 1 sentence on next step.
+- Do not narrate routine tool calls.
+- Keep the user-facing status short; keep the work exhaustive.
+</user_updates_spec>
+```
+
+For coding agents, see the Prompting patterns for coding tasks section below for more specific guidance.
+
+### Prompting patterns for coding tasks
+
+**Autonomy and persistence**
+
+GPT-5.4 is generally more thorough end to end than earlier mainline models on coding and tool-use tasks, so you often need less explicit "verify everything" prompting. Still, for high-stakes changes such as production, migrations, or security work, keep a lightweight verification clause.
+
+```xml
+<autonomy_and_persistence>
+Persist until the task is fully handled end-to-end within the current turn whenever feasible: do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes unless the user explicitly pauses or redirects you.
+
+Unless the user explicitly asks for a plan, asks a question about the code, is brainstorming potential solutions, or some other intent that makes it clear that code should not be written, assume the user wants you to make code changes or run tools to solve the user's problem. In these cases, it's bad to output your proposed solution in a message, you should go ahead and actually implement the change. If you encounter challenges or blockers, you should attempt to resolve them yourself.
+</autonomy_and_persistence>
+```
+
+**Intermediary updates**
+
+Keep updates sparse and high-signal. In coding tasks, prefer updates at key points.
+
+```xml
+<user_updates_spec>
+- Intermediary updates go to the `commentary` channel.
+- User updates are short updates while you are working. They are not final answers.
+- Use 1-2 sentence updates to communicate progress and new information while you work.
+- Do not begin responses with conversational interjections or meta commentary. Avoid openers such as acknowledgements ("Done -", "Got it", or "Great question") or similar framing.
+- Before exploring or doing substantial work, send a user update explaining your understanding of the request and your first step. Avoid commenting on the request or starting with phrases such as "Got it" or "Understood."
+- Provide updates roughly every 30 seconds while working.
+- When exploring, explain what context you are gathering and what you learned. Vary sentence structure so the updates do not become repetitive.
+- When working for a while, keep updates informative and varied, but stay concise.
+- When work is substantial, provide a longer plan after you have enough context. This is the only update that may be longer than 2 sentences and may contain formatting.
+- Before file edits, explain what you are about to change.
+- While thinking, keep the user informed of progress without narrating every tool call. Even if you are not taking actions, send frequent progress updates rather than going silent, especially if you are thinking for more than a short stretch.
+- Keep the tone of progress updates consistent with the assistant's overall personality.
+</user_updates_spec>
+```
+
+**Formatting**
+
+GPT-5.4 often defaults to more structured formatting and may overuse bullet lists. If you want a clean final response, explicitly clamp list shape.
+
+```xml
+Never use nested bullets. Keep lists flat (single level). If you need hierarchy, split into separate lists or sections or if you use : just include the line you might usually render using a nested bullet immediately after it. For numbered lists, only use the `1. 2. 3.` style markers (with a period), never `1)`.
+```
+
+**Frontend tasks**
+
+Use this only when additional frontend guidance is useful.
+
+```xml
+<frontend_tasks>
+When doing frontend design tasks, avoid generic, overbuilt layouts.
+
+Use these hard rules:
+- One composition: The first viewport must read as one composition, not a dashboard, unless it is a dashboard.
+- Brand first: On branded pages, the brand or product name must be a hero-level signal, not just nav text or an eyebrow. No headline should overpower the brand.
+- Brand test: If the first viewport could belong to another brand after removing the nav, the branding is too weak.
+- Full-bleed hero only: On landing pages and promotional surfaces, the hero image should usually be a dominant edge-to-edge visual plane or background. Do not default to inset hero images, side-panel hero images, rounded media cards, tiled collages, or floating image blocks unless the existing design system clearly requires them.
+- Hero budget: The first viewport should usually contain only the brand, one headline, one short supporting sentence, one CTA group, and one dominant image. Do not place stats, schedules, event listings, address blocks, promos, "this week" callouts, metadata rows, or secondary marketing content there.
+- No hero overlays: Do not place detached labels, floating badges, promo stickers, info chips, or callout boxes on top of hero media.
+- Cards: Default to no cards. Never use cards in the hero unless they are the container for a user interaction. If removing a border, shadow, background, or radius does not hurt interaction or understanding, it should not be a card.
+- One job per section: Each section should have one purpose, one headline, and usually one short supporting sentence.
+- Real visual anchor: Imagery should show the product, place, atmosphere, or context.
+- Reduce clutter: Avoid pill clusters, stat strips, icon rows, boxed promos, schedule snippets, and competing text blocks.
+- Use motion to create presence and hierarchy, not noise. Ship 2-3 intentional motions for visually led work, and prefer Framer Motion when it is available.
+
+Exception: If working within an existing website or design system, preserve the established patterns, structure, and visual language.
+</frontend_tasks>
+```
+
+```xml
+<terminal_tool_hygiene>
+- Only run shell commands via the terminal tool.
+- Never "run" tool names as shell commands.
+- If a patch or edit tool exists, use it directly; do not attempt it in bash.
+- After changes, run a lightweight verification step such as ls, tests, or a build before declaring the task done.
+</terminal_tool_hygiene>
+```
+
+### Document localization and OCR boxes
+
+For bbox tasks, be explicit about coordinate conventions and add drift tests.
+
+```xml
+<bbox_extraction_spec>
+- Use the specified coordinate format exactly (for example [x1,y1,x2,y2] normalized 0..1).
+- For each bbox, include: page, label, text snippet, confidence.
+- Add a vertical-drift sanity check:
+  - ensure bboxes align with the line of text (not shifted up or down).
+- If dense layout, process page by page and do a second pass for missed items.
+</bbox_extraction_spec>
+```
+
+### Use runtime and API integration notes
+
+For long-running or tool-heavy agents, the runtime contract matters as much as the prompt contract.
+
+#### Phase parameter
+
+For GPT-5.4, `gpt-5.3-codex`, and later Responses models, the `phase` field can
+help in the small number of long-running or tool-heavy flows where preambles or
+other intermediate assistant updates are mistaken for the final answer.
+
+- `phase` is optional at the API level, but it is highly recommended. Best-effort inference may exist server-side, but explicit round-tripping of `phase` is strictly better.
+- Use `phase` for long-running or tool-heavy agents that may emit commentary before tool calls or before a final answer.
+- Preserve `phase` when replaying prior assistant items so the model can distinguish working commentary from the completed answer. This matters most in multi-step flows with preambles, tool-related updates, or multiple assistant messages in the same turn.
+- Do not add `phase` to user messages.
+- If you use `previous_response_id`, that is usually the simplest path, since OpenAI can often recover prior state without manually replaying assistant items.
+- If you replay assistant history yourself, preserve the original `phase` values.
+- Missing or dropped `phase` can cause preambles to be interpreted as final answers and degrade behavior on those multi-step tasks.
+
+### Preserve behavior in long sessions
+
+Compaction unlocks significantly longer effective context windows, where user conversations can persist for many turns without hitting context limits or long-context performance degradation, and agents can perform very long trajectories that exceed a typical context window for long-running, complex tasks.
+
+If you are using [Compaction](https://developers.openai.com/api/docs/guides/compaction) in the Responses API, compact after major milestones, treat compacted items as opaque state, and keep prompts functionally identical after compaction. The endpoint is ZDR compatible and returns an `encrypted_content` item that you can pass into future requests. GPT-5.4 tends to remain more coherent and reliable over longer, multi-turn conversations with fewer breakdowns as sessions grow.
+
+For more guidance, see the [`/responses/compact` API reference](https://developers.openai.com/api/docs/api-reference/responses/compact).
+
+### Control personality for customer-facing workflows
+
+GPT-5.4 can be steered more effectively when you separate persistent personality from per-response writing controls. This is especially useful for customer-facing workflows such as emails, support replies, announcements, and blog-style content.
+
+- **Personality (persistent):** sets the default tone, verbosity, and decision style across the session.
+- **Writing controls (per response):** define the channel, register, formatting, and length for a specific artifact.
+- **Reminder:** personality should not override task-specific output requirements. If the user asks for JSON, return JSON.
+
+For natural, high-quality prose, the highest-leverage controls are:
+
+- Give the model a clear persona.
+- Specify the channel and emotional register.
+- Explicitly ban formatting when you want prose.
+- Use hard length limits.
+
+```xml
+<personality_and_writing_controls>
+- Persona: <one sentence>
+- Channel: <Slack | email | memo | PRD | blog>
+- Emotional register: <direct/calm/energized/etc.> + "not <overdo this>"
+- Formatting: <ban bullets/headers/markdown if you want prose>
+- Length: <hard limit, e.g. <=150 words or 3-5 sentences>
+- Default follow-through: if the request is clear and low-risk, proceed without asking permission.
+</personality_and_writing_controls>
+```
+
+For more personality patterns you can lift directly, see the [Prompt Personalities cookbook](https://developers.openai.com/cookbook/examples/gpt-5/prompt_personalities).
+
+**Professional memo mode**
+
+For memos, reviews, and other professional writing tasks, general writing instructions are often not enough. These workflows benefit from explicit guidance on specificity, domain conventions, synthesis, and calibrated certainty.
+
+```xml
+<memo_mode>
+- Write in a polished, professional memo style.
+- Use exact names, dates, entities, and authorities when supported by the record.
+- Follow domain-specific structure if one is requested.
+- Prefer precise conclusions over generic hedging.
+- When uncertainty is real, tie it to the exact missing fact or conflicting source.
+- Synthesize across documents rather than summarizing each one independently.
+</memo_mode>
+```
+
+This mode is especially useful for legal, policy, research, and executive-facing writing, where the goal is not just fluency, but disciplined synthesis and clear conclusions.
+
+## Tune reasoning and migration
+
+### Treat reasoning effort as a last-mile knob
+
+Reasoning effort is not one-size-fits-all. Treat it as a last-mile tuning knob, not the primary way to improve quality. In many cases, stronger prompts, clear output contracts, and lightweight verification loops recover much of the performance teams might otherwise seek through higher reasoning settings.
+
+Recommended defaults:
+
+- `none`: Best for fast, cost-sensitive, latency-sensitive tasks where the model does not need to think.
+- `low`: Works well for latency-sensitive tasks where a small amount of thinking can produce a meaningful accuracy gain, especially with complex instructions.
+- `medium` or `high`: Reserve for tasks that truly require stronger reasoning and can absorb the latency and cost tradeoff. Choose between them based on how much performance gain your task gets from additional reasoning.
+- `xhigh`: Avoid as a default unless your evals show clear benefits. It is best suited for long, agentic, reasoning-heavy tasks where maximum intelligence matters more than speed or cost.
+
+In practice, most teams should default to the `none`, `low`, or `medium` range.
+
+Start with `none` for execution-heavy workloads such as workflow steps, field extraction, support triage, and short structured transforms.
+
+Start with `medium` or higher for research-heavy workloads such as long-context synthesis, multi-document review, conflict resolution, and strategy writing. With `medium` and a well-engineered prompt, you can squeeze out a lot of performance.
+
+For GPT-5.4 workloads, `none` can already perform well on action-selection and tool-discipline tasks. If your workload depends on nuanced interpretation, such as implicit requirements, ambiguity, or cancelled-tool-call recovery, start with `low` or `medium` instead.
+
+Before increasing reasoning effort, first add:
+
+- `<completeness_contract>`
+- `<verification_loop>`
+- `<tool_persistence_rules>`
+
+If the model still feels too literal or stops at the first plausible answer, add an initiative nudge before raising reasoning effort:
+
+```xml
+<dig_deeper_nudge>
+- Don’t stop at the first plausible answer.
+- Look for second-order issues, edge cases, and missing constraints.
+- If the task is safety or accuracy critical, perform at least one verification step.
+</dig_deeper_nudge>
+```
+
+### Migrate prompts to GPT-5.4 one change at a time
+
+Use the same one-change-at-a-time discipline as the 5.2 guide: switch model first, pin `reasoning_effort`, run evals, then iterate.
+
+These starting points work well for many migrations:
+
+| Current setup             | Suggested GPT-5.4 start            | Notes                                                               |
+| ------------------------- | ---------------------------------- | ------------------------------------------------------------------- |
+| `gpt-5.2`                 | Match the current reasoning effort | Preserve the existing latency and quality profile first, then tune. |
+| `gpt-5.3-codex`           | Match the current reasoning effort | For coding workflows, keep the reasoning effort the same.           |
+| `gpt-4.1` or `gpt-4o`     | `none`                             | Keep snappy behavior, and increase only if evals regress.           |
+| Research-heavy assistants | `medium` or `high`                 | Use explicit research multi-pass and citation gating.               |
+| Long-horizon agents       | `medium` or `high`                 | Add tool persistence and completeness accounting.                   |
+
+### Small-model guidance for `gpt-5.4-mini` and `gpt-5.4-nano`
+
+`gpt-5.4-mini` and `gpt-5.4-nano` are highly steerable, but they are less likely than larger models to infer missing steps, resolve ambiguity implicitly, or package outputs the way you intended unless you specify that behavior directly. In practice, prompts for smaller models are often a bit longer and more explicit.
+
+**How `gpt-5.4-mini` differs**
+
+- `gpt-5.4-mini` is more literal and makes fewer assumptions.
+- It is strong when the task is clearly structured, but weaker on implicit workflows and ambiguity handling.
+- By default, it may try to keep the conversation going with a follow-up question unless you suppress that behavior explicitly.
+
+**Prompting `gpt-5.4-mini`**
+
+- Put critical rules first.
+- Specify the full execution order when tool use or side effects matter.
+- Do not rely on "you MUST" alone. Use structural scaffolding such as numbered steps, decision rules, and explicit action definitions.
+- Separate "do the action" from "report the action."
+- Show the correct flow, not just the final format.
+- Define ambiguity behavior explicitly: when to ask, abstain, or proceed.
+- Specify packaging directly: answer length, whether to ask a follow-up question, citation style, and section order.
+- Be careful with `output nothing else`. Prefer scoped instructions such as `after the final JSON, output nothing further`.
+
+**Prompting `gpt-5.4-nano`**
+
+- Use `gpt-5.4-nano` only for narrow, well-bounded tasks.
+- Prefer closed outputs: labels, enums, short JSON, or fixed templates.
+- Avoid multi-step orchestration unless the flow is extremely constrained.
+- Route ambiguous or planning-heavy tasks to a stronger model instead of over-prompting `gpt-5.4-nano`.
+
+**Good default pattern**
+
+1. Task
+2. Critical rule
+3. Exact step order
+4. Edge cases or clarification behavior
+5. Output format
+6. One correct example
+
+**Avoid**
+
+- Implied next steps
+- Unspecified edge cases
+- Schema-only prompts for tool workflows
+- Generic instructions without structure
+
+### Web search and deep research
+
+If you are migrating a research agent in particular, make these prompt updates before increasing reasoning effort:
+
+- Add `<research_mode>`
+- Add `<citation_rules>`
+- Add `<empty_result_recovery>`
+- Increase `reasoning_effort` one notch only after prompt fixes.
+
+You can start from the 5.2 research block and then layer in citation gating and finalization contracts as needed.
+
+GPT-5.4 performs especially well when the task requires multi-step evidence gathering, long-context synthesis, and explicit prompt contracts. In practice, the highest-leverage prompt changes are choosing reasoning effort by task shape, defining exact output and citation formats, adding dependency-aware tool rules, and making completion criteria explicit. The model is often strong out of the box, but it is most reliable when prompts clearly specify how to search, how to verify, and what counts as done.
+
+## Next steps
+
+- Read [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model) for model capabilities, parameters, and API compatibility details.
+- Read [Prompt engineering](https://developers.openai.com/api/docs/guides/prompt-engineering) for broader prompting strategies that apply across model families.
+- Read [Compaction](https://developers.openai.com/api/docs/guides/compaction) if you are building long-running GPT-5.4 sessions in the Responses API.
--- a/dotfiles/agents/skills/.system/openai-docs/references/upgrading-to-gpt-5p4.md
+++ b/dotfiles/agents/skills/.system/openai-docs/references/upgrading-to-gpt-5p4.md
@@ -2,6 +2,14 @@

 Use this guide when the user explicitly asks to upgrade an existing integration to GPT-5.4. Pair it with current OpenAI docs lookups. The default target string is `gpt-5.4`.

+## Freshness check
+
+Before applying this bundled guide, run `node scripts/resolve-latest-model-info.js` from the OpenAI Docs skill directory.
+
+- If the command returns `modelSlug: "gpt-5p4"`, continue with this bundled guide and use `references/prompting-guide.md` when prompt updates are needed.
+- If the command returns a different `modelSlug`, fetch both the returned `migrationGuideUrl` and `promptingGuideUrl` and use them as the current source of truth instead of the bundled references.
+- If the command fails, the metadata is missing, or either remote guide cannot be fetched, continue with the bundled fallback references and say the remote freshness check was unavailable.
+
 ## Upgrade posture

 Upgrade with the narrowest safe change set:
@@ -80,9 +88,9 @@ Default action:

 - replace the model string with `gpt-5.4`
 - add one or two targeted prompt blocks
- read `references/gpt-5p4-prompting-guide.md` to choose the smallest prompt changes that recover the old behavior
+- read `references/prompting-guide.md` to choose the smallest prompt changes that preserve the intended behavior and take advantage of relevant model-specific guidance
 - avoid broad prompt cleanup unrelated to the upgrade
- for research workflows, default to `research_mode` + `citation_rules` + `empty_result_handling`; add `tool_persistence_rules` when the host already uses retrieval tools
+- for research workflows, default to `research_mode` + `citation_rules` + `empty_result_recovery`; add `tool_persistence_rules` when the host already uses retrieval tools
 - for dependency-aware or tool-heavy workflows, default to `tool_persistence_rules` + `dependency_checks` + `verification_loop`; add `parallel_tool_calling` only when retrieval steps are truly independent
 - for coding or terminal workflows, default to `terminal_tool_hygiene` + `verification_loop`
 - for multi-agent support or triage workflows, default to at least one of `tool_persistence_rules`, `completeness_contract`, or `verification_loop`
--- a/dotfiles/agents/skills/.system/openai-docs/scripts/resolve-latest-model-info.js
+++ b/dotfiles/agents/skills/.system/openai-docs/scripts/resolve-latest-model-info.js
@@ -0,0 +1,147 @@
+#!/usr/bin/env node
+
+const fs = require("node:fs/promises");
+const path = require("node:path");
+
+const DEFAULT_URL =
+  "https://developers.openai.com/api/docs/guides/latest-model.md";
+const DEFAULT_BASE_URL = "https://developers.openai.com";
+
+function parseArgs(argv) {
+  const args = {
+    source: process.env.LATEST_MODEL_URL || DEFAULT_URL,
+    baseUrl: process.env.LATEST_MODEL_BASE_URL || DEFAULT_BASE_URL,
+  };
+
+  for (let i = 2; i < argv.length; i += 1) {
+    const arg = argv[i];
+    if (arg === "--source" || arg === "--url") {
+      args.source = argv[i + 1];
+      i += 1;
+    } else if (arg === "--base-url") {
+      args.baseUrl = argv[i + 1];
+      i += 1;
+    }
+  }
+
+  return args;
+}
+
+async function readSource(source) {
+  if (source.startsWith("file://")) {
+    return fs.readFile(new URL(source), "utf8");
+  }
+
+  if (!/^https?:\/\//.test(source)) {
+    return fs.readFile(path.resolve(source), "utf8");
+  }
+
+  const response = await fetch(source, {
+    headers: { accept: "text/markdown,text/plain,*/*" },
+  });
+
+  if (!response.ok) {
+    throw new Error(`failed to fetch ${source}: ${response.status}`);
+  }
+
+  return response.text();
+}
+
+function parseIndentedInfo(lines, startIndex) {
+  const info = {};
+
+  for (let i = startIndex + 1; i < lines.length; i += 1) {
+    const line = lines[i];
+    if (!line.trim()) {
+      continue;
+    }
+
+    const match = line.match(/^ {2}([A-Za-z][A-Za-z0-9_-]*):\s*(.+?)\s*$/);
+    if (!match) {
+      break;
+    }
+
+    info[match[1]] = match[2].replace(/^["']|["']$/g, "");
+  }
+
+  return info;
+}
+
+function parseFlatInfo(block) {
+  const info = {};
+
+  for (const line of block.split(/\r?\n/)) {
+    const match = line.match(/^([A-Za-z][A-Za-z0-9_-]*):\s*(.+?)\s*$/);
+    if (match) {
+      info[match[1]] = match[2].replace(/^["']|["']$/g, "");
+    }
+  }
+
+  return info;
+}
+
+function extractLatestModelInfo(markdown) {
+  const lines = markdown.split(/\r?\n/);
+  const latestModelInfoIndex = lines.findIndex((line) =>
+    /^latestModelInfo:\s*$/.test(line)
+  );
+
+  if (latestModelInfoIndex >= 0) {
+    return parseIndentedInfo(lines, latestModelInfoIndex);
+  }
+
+  const commentMatch = markdown.match(
+    /<!--\s*latestModelInfo\s*\n([\s\S]*?)\n\s*-->/m
+  );
+  if (commentMatch) {
+    return parseFlatInfo(commentMatch[1]);
+  }
+
+  return undefined;
+}
+
+function modelToSkillSlug(model) {
+  return model.trim().replace(/\./g, "p");
+}
+
+function absoluteUrl(baseUrl, value) {
+  return new URL(value, baseUrl).toString();
+}
+
+function normalizeInfo(info, baseUrl) {
+  const model = info?.model?.trim();
+  const migrationGuide = info?.migrationGuide?.trim();
+  const promptingGuide = info?.promptingGuide?.trim();
+
+  if (!model || !migrationGuide || !promptingGuide) {
+    throw new Error(
+      "latestModelInfo must include model, migrationGuide, and promptingGuide"
+    );
+  }
+
+  return {
+    model,
+    modelSlug: modelToSkillSlug(model),
+    migrationGuideUrl: absoluteUrl(baseUrl, migrationGuide),
+    promptingGuideUrl: absoluteUrl(baseUrl, promptingGuide),
+  };
+}
+
+async function main() {
+  const { source, baseUrl } = parseArgs(process.argv);
+  const markdown = await readSource(source);
+  const info = extractLatestModelInfo(markdown);
+
+  if (!info) {
+    throw new Error(`latestModelInfo block not found in ${source}`);
+  }
+
+  process.stdout.write(
+    `${JSON.stringify(normalizeInfo(info, baseUrl), null, 2)}\n`
+  );
+}
+
+main().catch((error) => {
+  console.error(error.message);
+  process.exit(1);
+});
--- a/dotfiles/agents/skills/disk-space-cleanup/SKILL.md
+++ b/dotfiles/agents/skills/disk-space-cleanup/SKILL.md
@@ -137,6 +137,18 @@ If `ncdu` is missing, use:
 nix run nixpkgs#ncdu -- -x "$HOME"
 ```

+For reusable, mount-safe snapshots on this machine, prefer the local wrapper:
+
+```bash
+safe_ncdu /
+sudo -n env HOME=/home/imalison safe_ncdu /
+safe_ncdu /nix/store
+safe_ncdu top ~/.cache/ncdu/latest-root.json.zst 30 /home/imalison
+safe_ncdu open ~/.cache/ncdu/latest-root.json.zst
+```
+
+`safe_ncdu` writes compressed ncdu exports under `~/.cache/ncdu`, records the exclude list beside the export, excludes mounted descendants of the scan root, and supports follow-up `top` queries without rescanning.
+
 For quick, non-blocking triage on very large trees, prefer bounded probes:

 ```bash
@@ -147,9 +159,16 @@ timeout 30s du -xh --max-depth=1 "$HOME/.local/share" 2>/dev/null | sort -h
 Machine-specific heavy hitters seen in practice:

 - `~/.cache/uv` can exceed 20G and is reclaimable with `uv cache clean`.
+- `~/.cache/pypoetry` can exceed 7G across artifacts, repository cache, and virtualenvs; inspect first, then use Poetry cache commands or targeted virtualenv removal.
+- `~/.cache/google-chrome` can exceed 8G across multiple Chrome profiles; close Chrome before clearing profile cache directories.
 - `~/.cache/spotify` can exceed 10G; treat as optional app-cache cleanup.
+- `~/.gradle` can exceed 8G, mostly under `caches/`; prefer Gradle-aware cleanup and expect dependency redownloads.
 - `~/.local/share/picom/debug.log` can grow past 15G when verbose picom debugging is enabled or crashes leave a stale log behind; if `picom` is not running, deleting or truncating the log is a high-yield low-risk win.
 - `~/.local/share/Trash` can exceed several GB; empty only with user approval.
+- `/var/lib/private/gitea-runner` can exceed 50G and is not visible to an unprivileged `ncdu /` scan; use `sudo -n env HOME=/home/imalison safe_ncdu /` when `/var` looks undercounted.
+  - Validated cleanup pattern: stop `gitea-runner-nix.service`, remove cache/work directories under `/var/lib/private/gitea-runner` (`.cache`, `.gradle`, `action-cache-dir`, `workspace`, stale nested `gitea-runner`, and nested `nix/.cache`/`nix/.local`), recreate `action-cache-dir`, `workspace`, and `.cache` owned by `gitea-runner:gitea-runner`, then restart the service.
+  - Preserve registration/config-like files such as `/var/lib/private/gitea-runner/nix/.runner`, `/var/lib/private/gitea-runner/nix/.labels`, `/var/lib/private/gitea-runner/.docker/config.json`, and SSH/Kube material.
+- `~/Projects/*/target` directories can dominate home usage. Recent example candidates included stale `target/` directories under `scrobble-scrubber`, `http-client-vcr`, `http-client`, `subtr-actor`, `http-types`, `subtr-actor-py`, `sdk`, and `async-h1`.

 ## Step 5: `/nix/store` Deep Dive

@@ -183,6 +202,25 @@ Common retention pattern on this machine:
 - Many `.direnv/flake-profile-*` symlinks under `~/Projects` and worktrees keep `nix-shell-env`/`ghc-shell-*` roots alive.
 - Old taffybar constellation repos under `~/Projects` can pin large Haskell closures through `.direnv` and `result` symlinks. Deleting `gtk-sni-tray`, `status-notifier-item`, `dbus-menu`, `dbus-hslogger`, and `gtk-strut` and then rerunning `nix-collect-garbage -d` reclaimed about 11G of store data in one validated run.
 - `find_store_path_gc_roots` is especially useful for proving GHC retention: many large `ghc-9.10.3-with-packages` paths are unique per project, while the base `ghc-9.10.3` and docs paths are shared.
+- NixOS system generations and a repo-root `nixos/result` symlink can pin multiple Android Studio and Android SDK versions. Check `/nix/var/nix/profiles/system-*-link`, `/run/current-system`, `/run/booted-system`, and `~/dotfiles/nixos/result` before assuming Android paths are pinned by project shells.
+- `~/Projects/railbird-mobile/.direnv/flake-profile-*` can pin large Android SDK system images. Removing stale direnv profiles there is a more targeted first step than deleting Android store paths directly.
+- For a repeatable `/nix/store` `ncdu` snapshot without driving the TUI, export and inspect it:
+
+```bash
+ncdu -0 -x -c -o /tmp/nix-store.ncdu.json.zst /nix/store
+zstdcat /tmp/nix-store.ncdu.json.zst | jq 'def sumd: if type=="array" then ((.[0].dsize // 0) + ([.[1:][] | sumd] | add // 0)) elif type=="object" then (.dsize // 0) else 0 end; .[3] | sumd'
+```
+
+- `nix-store --gc --print-dead` plus the Nix SQLite database is a fast way to estimate immediate GC wins before deleting anything:
+
+```bash
+nix-store --gc --print-dead > /tmp/nix-dead-paths.txt
+printf '%s\n' '.mode list' '.separator |' 'create temp table dead(path text);' \
+  '.import /tmp/nix-dead-paths.txt dead' \
+  'select count(*), sum(narSize) from ValidPaths join dead using(path);' \
+  | nix shell nixpkgs#sqlite --command sqlite3 /nix/var/nix/db/db.sqlite
+```
+
 - Quantify before acting:

 ```bash
--- a/dotfiles/agents/skills/email-unsubscribe-check/SKILL.md
+++ b/dotfiles/agents/skills/email-unsubscribe-check/SKILL.md
@@ -33,6 +33,9 @@ digraph unsubscribe_check {
 - Do not ask a kickoff question like "should I start now?".
 - Default scan window is `newer_than:7d` unless the user already specified a different range.
 - Only ask a follow-up question before starting if required information is missing and execution would otherwise be blocked.
+- Default user preference: they generally do not want subscription-style email in their inbox.
+- For obvious marketing/newsletter/digest mail with a working unsubscribe path, unsubscribe by default without asking for confirmation first.
+- Still ask first for borderline cases such as creator subscriptions, professional communities, event platforms, or anything that appears transactional/security-sensitive.

 ## How to Scan

@@ -42,6 +45,8 @@ digraph unsubscribe_check {
   - **Clearly unsubscribeable**: marketing, promos, digests user never engages with
   - **Ask user**: newsletters, community content, event platforms (might be wanted)

+When the user's standing preference is to keep subscriptions out of the inbox, treat the **Clearly unsubscribeable** bucket as auto-actionable.
+
 ## Unsubscribe Execution

 For each confirmed sender, do ALL of these:
@@ -95,6 +100,7 @@ gws gmail users messages batchModify \
 - Community digests the user doesn't engage with
 - Financial marketing (not transactional alerts)
 - "Your weekly/daily/monthly" summaries
+- Messages with explicit unsubscribe/manage-preferences links whose primary purpose is promotional or newsletter delivery

 ## Signals to NOT Auto-Unsubscribe (Ask First)

--- a/dotfiles/codex/config.toml
+++ b/dotfiles/codex/config.toml
@@ -1,4 +1,4 @@
-model = "gpt-5.4"
+model = "gpt-5.5"
 model_reasoning_effort = "high"
 personality = "pragmatic"
 [projects."/home/imalison/Projects/nixpkgs"]
@@ -130,3 +130,6 @@ steer = true

 [plugins."google-drive@openai-curated"]
 enabled = true
+
+[tui.model_availability_nux]
+"gpt-5.5" = 4
--- a/dotfiles/config/taffybar/AGENTS.md
+++ b/dotfiles/config/taffybar/AGENTS.md
@@ -6,6 +6,7 @@

 ## Multiplexer session titling
 - If the `TMUX` or `ZELLIJ` environment variable is set, treat this chat as the controller for the current tmux or zellij session.
+- Use `set_multiplexer_title '<project> - <task>'` to update the title. The command detects tmux vs. zellij internally, prefers tmux when both are present, and no-ops outside a multiplexer.
 - Maintain a session/window/pane title that updates when the task focus changes substantially.
 - Prefer automatic titling: infer a concise <task> from the current user request and context without asking.
 - Title format: "<project> - <task>".
@@ -14,13 +15,7 @@
  - <task> is a short, user-friendly description of what we are doing.
 - Ask for a short descriptive <task> only when the task is ambiguous or you are not confident in an inferred title.
 - When the task changes substantially, update the <task> automatically if clear; otherwise ask for an updated <task>.
- When a title is provided or updated, immediately run the matching command for the active multiplexer:
-
-  tmux rename-session '<project> - <task>' \; rename-window '<project> - <task>' \; select-pane -T '<project> - <task>'
-
-  zellij action rename-session '<project> - <task>' && zellij action rename-tab '<project> - <task>' && zellij action rename-pane '<project> - <task>'
-
- Assume you are inside the active multiplexer, so do not use tmux `-t` or zellij targeting flags unless the user asks to target a specific session/tab/pane.
+- When a title is provided or updated, immediately run `set_multiplexer_title '<project> - <task>'`; do not call raw tmux or zellij rename commands unless debugging the helper itself.

 ## Pane usage
 - Do not create extra panes or windows unless the user asks.
--- a/dotfiles/lib/functions/set_multiplexer_title
+++ b/dotfiles/lib/functions/set_multiplexer_title
@@ -0,0 +1,41 @@
+#!/usr/bin/env sh
+
+set_multiplexer_title() {
+    if [ "$#" -lt 1 ]; then
+        echo "usage: set_multiplexer_title <title>" >&2
+        return 2
+    fi
+
+    title="$*"
+
+    if [ -n "${TMUX:-}" ]; then
+        multiplexer="tmux"
+    elif [ -n "${ZELLIJ:-}" ]; then
+        multiplexer="zellij"
+    else
+        return 0
+    fi
+
+    state_dir="${HOME}/.agents/state"
+    state_file="$state_dir/${multiplexer}-title"
+    mkdir -p "$state_dir"
+
+    if [ -f "$state_file" ]; then
+        last_title=$(cat "$state_file" 2>/dev/null || true)
+        if [ "$last_title" = "$title" ]; then
+            return 0
+        fi
+    fi
+
+    if [ "$multiplexer" = "tmux" ]; then
+        tmux rename-session "$title" \; rename-window "$title" \; select-pane -T "$title"
+    else
+        zellij action rename-session "$title" &&
+            zellij action rename-tab "$title" &&
+            zellij action rename-pane "$title"
+    fi
+
+    printf '%s' "$title" > "$state_file"
+}
+
+set_multiplexer_title "$@"