codex: externalize generated system skills

This commit is contained in:
2026-04-25 16:15:08 -07:00
committed by Ivan Anthony Malison
parent 2ac7faf884
commit 1bc595dc13
51 changed files with 72 additions and 6916 deletions

2
dotfiles/agents/skills/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
.system/
codex-primary-runtime/

View File

@@ -1 +0,0 @@
22c0ca9bd55ca4ff

View File

@@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf of
any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don\'t include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,356 +0,0 @@
---
name: "imagegen"
description: "Generate or edit raster images when the task benefits from AI-created bitmap visuals such as photos, illustrations, textures, sprites, mockups, or transparent-background cutouts. Use when Codex should create a brand-new image, transform an existing image, or derive visual variants from references, and the output should be a bitmap asset rather than repo-native code or vector. Do not use when the task is better handled by editing existing SVG/vector/code-native assets, extending an established icon or logo system, or building the visual directly in HTML/CSS/canvas."
---
# Image Generation Skill
Generates or edits images for the current project (for example website assets, game assets, UI mockups, product mockups, wireframes, logo design, photorealistic images, or infographics).
## Top-level modes and rules
This skill has exactly two top-level modes:
- **Default built-in tool mode (preferred):** built-in `image_gen` tool for normal image generation, editing, and simple transparent-image requests. Does not require `OPENAI_API_KEY`.
- **Fallback CLI mode:** `scripts/image_gen.py` CLI. Use when the user explicitly asks for the CLI/API/model path, or after the user explicitly confirms a true model-native transparency fallback with `gpt-image-1.5`. Requires `OPENAI_API_KEY`.
Within CLI fallback, the CLI exposes three subcommands:
- `generate`
- `edit`
- `generate-batch`
Rules:
- Use the built-in `image_gen` tool by default for normal image generation and editing requests.
- Do not switch to CLI fallback for ordinary quality, size, or file-path control.
- If the user explicitly asks for a transparent image/background, stay on built-in `image_gen` first: prompt for a flat removable chroma-key background, then remove it locally with the installed helper at `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`.
- Never silently switch from built-in `image_gen` or CLI `gpt-image-2` to CLI `gpt-image-1.5`. Treat this as a model/path downgrade and ask the user before doing it, unless the user has already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
- If a transparent request appears too complex for clean chroma-key removal, asks for true/native transparency, or local removal fails validation, explain that true transparency requires CLI `gpt-image-1.5 --background transparent --output-format png` because `gpt-image-2` does not support `background=transparent`, then ask whether to proceed. Run the CLI fallback only after the user confirms.
- The word `batch` by itself does not mean CLI fallback. If the user asks for many assets or says to batch-generate assets without explicitly asking for CLI/API/model controls, stay on the built-in path and issue one built-in call per requested asset or variant.
- If the built-in tool fails or is unavailable, tell the user the CLI fallback exists and that it requires `OPENAI_API_KEY`. Proceed only if the user explicitly asks for that fallback.
- If the user explicitly asks for CLI mode, use the bundled `scripts/image_gen.py` workflow. Do not create one-off SDK runners.
- Never modify `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
Built-in save-path policy:
- In built-in tool mode, Codex saves generated images under `$CODEX_HOME/*` by default.
- Do not describe or rely on OS temp as the default built-in destination.
- Do not describe or rely on a destination-path argument (if any) on the built-in `image_gen` tool. If a specific location is needed, generate first and then move or copy the selected output from `$CODEX_HOME/generated_images/...`.
- Save-path precedence in built-in mode:
1. If the user names a destination, move or copy the selected output there.
2. If the image is meant for the current project, move or copy the final selected image into the workspace before finishing.
3. If the image is only for preview or brainstorming, render it inline; the underlying file can remain at the default `$CODEX_HOME/*` path.
- Never leave a project-referenced asset only at the default `$CODEX_HOME/*` path.
- Do not overwrite an existing asset unless the user explicitly asked for replacement; otherwise create a sibling versioned filename such as `hero-v2.png` or `item-icon-edited.png`.
Shared prompt guidance for both modes lives in `references/prompting.md` and `references/sample-prompts.md`.
Fallback-only docs/resources for CLI mode:
- `references/cli.md`
- `references/image-api.md`
- `references/codex-network.md`
- `scripts/image_gen.py`
Local post-processing helper:
- `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`: removes a flat chroma-key background from a generated image and writes a PNG/WebP with alpha. Prefer auto-key sampling, soft matte, and despill for antialiased edges.
## When to use
- Generate a new image (concept art, product shot, cover, website hero)
- Generate a new image using one or more reference images for style, composition, or mood
- Edit an existing image (inpainting, lighting or weather transformations, background replacement, object removal, compositing, transparent background)
- Produce many assets or variants for one task
## When not to use
- Extending or matching an existing SVG/vector icon set, logo system, or illustration library inside the repo
- Creating simple shapes, diagrams, wireframes, or icons that are better produced directly in SVG, HTML/CSS, or canvas
- Making a small project-local asset edit when the source file already exists in an editable native format
- Any task where the user clearly wants deterministic code-native output instead of a generated bitmap
## Decision tree
Think about two separate questions:
1. **Intent:** is this a new image or an edit of an existing image?
2. **Execution strategy:** is this one asset or many assets/variants?
Intent:
- If the user wants to modify an existing image while preserving parts of it, treat the request as **edit**.
- If the user provides images only as references for style, composition, mood, or subject guidance, treat the request as **generate**.
- If the user provides no images, treat the request as **generate**.
Built-in edit semantics:
- Built-in edit mode is for images already visible in the conversation context, such as attached images or images generated earlier in the thread.
- If the user wants to edit a local image file with the built-in tool, first load it with built-in `view_image` tool so the image is visible in the conversation context, then proceed with the built-in edit flow.
- Do not promise arbitrary filesystem-path editing through the built-in tool.
- If a local file still needs direct file-path control, masks, or other explicit CLI-only parameters, use the explicit CLI fallback only when the user asks for it.
- For edits, preserve invariants aggressively and save non-destructively by default.
Execution strategy:
- In the built-in default path, produce many assets or variants by issuing one `image_gen` call per requested asset or variant.
- In the CLI fallback path, use the CLI `generate-batch` subcommand only when the user explicitly chose CLI mode and needs many prompts/assets.
- For many distinct assets, do not use `n` as a substitute for separate prompts. `n` is for variants of one prompt; distinct assets need distinct built-in calls or distinct CLI `generate-batch` jobs.
Assume the user wants a new image unless they clearly ask to change an existing one.
## Workflow
1. Decide the top-level mode: built-in by default, including simple transparent-output requests; fallback CLI only if explicitly requested or after the user explicitly confirms a transparent-output fallback.
2. Decide the intent: `generate` or `edit`.
3. Decide whether the output is preview-only or meant to be consumed by the current project.
4. Decide the execution strategy: single asset vs repeated built-in calls vs CLI `generate-batch`.
5. Collect inputs up front: prompt(s), exact text (verbatim), constraints/avoid list, and any input images.
6. For every input image, label its role explicitly:
- reference image
- edit target
- supporting insert/style/compositing input
7. If the edit target is only on the local filesystem and you are staying on the built-in path, inspect it with `view_image` first so the image is available in conversation context.
8. If the user asked for a photo, illustration, sprite, product image, banner, or other explicitly raster-style asset, use `image_gen` rather than substituting SVG/HTML/CSS placeholders. If the request is for an icon, logo, or UI graphic that should match existing repo-native SVG/vector/code assets, prefer editing those directly instead.
9. Augment the prompt based on specificity:
- If the user's prompt is already specific and detailed, normalize it into a clear spec without adding creative requirements.
- If the user's prompt is generic, add tasteful augmentation only when it materially improves output quality.
10. Use the built-in `image_gen` tool by default.
11. For transparent-output requests, follow the transparent image guidance below: generate with built-in `image_gen` on a flat chroma-key background, copy the selected output into the workspace or `tmp/imagegen/`, run the installed `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py` helper, and validate the alpha result before using it. If this path looks unsuitable or fails, ask before switching to CLI `gpt-image-1.5`.
12. Inspect outputs and validate: subject, style, composition, text accuracy, and invariants/avoid items.
13. Iterate with a single targeted change, then re-check.
14. For preview-only work, render the image inline; the underlying file may remain at the default `$CODEX_HOME/generated_images/...` path.
15. For project-bound work, move or copy the selected artifact into the workspace and update any consuming code or references. Never leave a project-referenced asset only at the default `$CODEX_HOME/generated_images/...` path.
16. For batches or multi-asset requests, persist every requested deliverable final in the workspace unless the user explicitly asked to keep outputs preview-only. Discarded variants do not need to be kept unless requested.
17. If the user explicitly chooses or confirms the CLI fallback, then use the fallback-only docs for model, quality, size, `input_fidelity`, masks, output format, output paths, and network setup.
18. Always report the final saved path(s) for any workspace-bound asset(s), plus the final prompt or prompt set and whether the built-in tool or fallback CLI mode was used.
## Transparent image requests
Transparent-image requests still use built-in `image_gen` first. Because the built-in tool does not expose a true transparent-background control, create a removable chroma-key source image and then convert the key color to alpha locally.
Default sequence:
1. Use built-in `image_gen` to generate the requested subject on a perfectly flat solid chroma-key background.
2. Choose a key color that is unlikely to appear in the subject: default `#00ff00`, use `#ff00ff` for green subjects, and avoid `#0000ff` for blue subjects.
3. After generation, move or copy the selected source image from `$CODEX_HOME/generated_images/...` into the workspace or `tmp/imagegen/`.
4. Run the installed helper path, not a project-relative script path:
```bash
python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" \
--input <source> \
--out <final.png> \
--auto-key border \
--soft-matte \
--transparent-threshold 12 \
--opaque-threshold 220 \
--despill
```
5. Validate that the output has an alpha channel, transparent corners, plausible subject coverage, and no obvious key-color fringe. If a thin fringe remains, retry once with `--edge-contract 1`; use `--edge-feather 0.25` only when the edge is visibly stair-stepped and the subject is not shiny or reflective.
6. Save the final alpha PNG/WebP in the project if the asset is project-bound. Never leave a project-referenced transparent asset only under `$CODEX_HOME/*`.
Prompt transparent requests like this:
```text
Create the requested subject on a perfectly flat solid #00ff00 chroma-key background for background removal.
The background must be one uniform color with no shadows, gradients, texture, reflections, floor plane, or lighting variation.
Keep the subject fully separated from the background with crisp edges and generous padding.
Do not use #00ff00 anywhere in the subject.
No cast shadow, no contact shadow, no reflection, no watermark, and no text unless explicitly requested.
```
Do not automatically use CLI `gpt-image-1.5 --background transparent --output-format png` instead of chroma keying. Ask the user first when the user asks for true/native transparency, when local removal fails validation, or when the requested image is complex: hair, fur, feathers, smoke, glass, liquids, translucent materials, reflective objects, soft shadows, realistic product grounding, or subject colors that conflict with all practical key colors.
Use a concise confirmation like:
```text
This likely needs true native transparency. The default built-in path uses a chroma-key background plus local removal, but true transparency requires the CLI fallback with gpt-image-1.5 because gpt-image-2 does not support background=transparent. It also requires OPENAI_API_KEY. Should I proceed with that CLI fallback?
```
## Prompt augmentation
Reformat user prompts into a structured, production-oriented spec. Make the user's goal clearer and more actionable, but do not blindly add detail.
Treat this as prompt-shaping guidance, not a closed schema. Use only the lines that help, and add a short extra labeled line when it materially improves clarity.
### Specificity policy
Use the user's prompt specificity to decide how much augmentation is appropriate:
- If the prompt is already specific and detailed, preserve that specificity and only normalize/structure it.
- If the prompt is generic, you may add tasteful augmentation when it will materially improve the result.
Allowed augmentations:
- composition or framing hints
- polish level or intended-use hints
- practical layout guidance
- reasonable scene concreteness that supports the stated request
Not allowed augmentations:
- extra characters or objects that are not implied by the request
- brand names, slogans, palettes, or narrative beats that are not implied
- arbitrary side-specific placement unless the surrounding layout supports it
## Use-case taxonomy (exact slugs)
Classify each request into one of these buckets and keep the slug consistent across prompts and references.
Generate:
- photorealistic-natural — candid/editorial lifestyle scenes with real texture and natural lighting.
- product-mockup — product/packaging shots, catalog imagery, merch concepts.
- ui-mockup — app/web interface mockups and wireframes; specify the desired fidelity.
- infographic-diagram — diagrams/infographics with structured layout and text.
- scientific-educational — classroom explainers, scientific diagrams, and learning visuals with required labels and accuracy constraints.
- ads-marketing — campaign concepts and ad creatives with audience, brand position, scene, and exact tagline/copy.
- productivity-visual — slide, chart, workflow, and data-heavy business visuals.
- logo-brand — logo/mark exploration, vector-friendly.
- illustration-story — comics, childrens book art, narrative scenes.
- stylized-concept — style-driven concept art, 3D/stylized renders.
- historical-scene — period-accurate/world-knowledge scenes.
Edit:
- text-localization — translate/replace in-image text, preserve layout.
- identity-preserve — try-on, person-in-scene; lock face/body/pose.
- precise-object-edit — remove/replace a specific element (including interior swaps).
- lighting-weather — time-of-day/season/atmosphere changes only.
- background-extraction — transparent background / clean cutout. Use built-in `image_gen` with chroma-key removal first for simple opaque subjects; ask before using CLI true transparency for complex subjects.
- style-transfer — apply reference style while changing subject/scene.
- compositing — multi-image insert/merge with matched lighting/perspective.
- sketch-to-render — drawing/line art to photoreal render.
## Shared prompt schema
Use the following labeled spec as shared prompt scaffolding for both top-level modes:
```text
Use case: <taxonomy slug>
Asset type: <where the asset will be used>
Primary request: <user's main prompt>
Input images: <Image 1: role; Image 2: role> (optional)
Scene/backdrop: <environment>
Subject: <main subject>
Style/medium: <photo/illustration/3D/etc>
Composition/framing: <wide/close/top-down; placement>
Lighting/mood: <lighting + mood>
Color palette: <palette notes>
Materials/textures: <surface details>
Text (verbatim): "<exact text>"
Constraints: <must keep/must avoid>
Avoid: <negative constraints>
```
Notes:
- `Asset type` and `Input images` are prompt scaffolding, not dedicated CLI flags.
- `Scene/backdrop` refers to the visual setting. It is not the same as the fallback CLI `background` parameter, which controls output transparency behavior.
- Fallback-only execution notes such as `Quality:`, `Input fidelity:`, masks, output format, and output paths belong in the CLI path only. Do not treat them as built-in `image_gen` tool arguments.
Augmentation rules:
- Keep it short.
- Add only the details needed to improve the prompt materially.
- For edits, explicitly list invariants (`change only X; keep Y unchanged`).
- If any critical detail is missing and blocks success, ask a question; otherwise proceed.
## Examples
### Generation example (hero image)
```text
Use case: product-mockup
Asset type: landing page hero
Primary request: a minimal hero image of a ceramic coffee mug
Style/medium: clean product photography
Composition/framing: wide composition with usable negative space for page copy if needed
Lighting/mood: soft studio lighting
Constraints: no logos, no text, no watermark
```
### Edit example (invariants)
```text
Use case: precise-object-edit
Asset type: product photo background replacement
Primary request: replace only the background with a warm sunset gradient
Constraints: change only the background; keep the product and its edges unchanged; no text; no watermark
```
## Prompting best practices
- Structure prompt as scene/backdrop -> subject -> details -> constraints.
- Include intended use (ad, UI mock, infographic) to set the mode and polish level.
- Use camera/composition language for photorealism.
- Only use SVG/vector stand-ins when the user explicitly asked for vector output or a non-image placeholder.
- Quote exact text and specify typography + placement.
- For tricky words, spell them letter-by-letter and require verbatim rendering.
- For multi-image inputs, reference images by index and describe how they should be used.
- For edits, repeat invariants every iteration to reduce drift.
- Iterate with single-change follow-ups.
- If the prompt is generic, add only the extra detail that will materially help.
- If the prompt is already detailed, normalize it instead of expanding it.
- For CLI fallback only, see `references/cli.md` and `references/image-api.md` for model, `quality`, `input_fidelity`, masks, output format, and output-path guidance.
- For transparent images, use the built-in-first chroma-key workflow unless the request is complex enough to need true CLI transparency; ask before switching to CLI `gpt-image-1.5`.
More principles shared by both modes: `references/prompting.md`.
Copy/paste specs shared by both modes: `references/sample-prompts.md`.
## Guidance by asset type
Asset-type templates (website assets, game assets, wireframes, logo) are consolidated in `references/sample-prompts.md`.
## gpt-image-2 guidance for CLI fallback
The fallback CLI defaults to `gpt-image-2`.
- Use `gpt-image-2` for new CLI/API workflows unless the request needs true model-native transparent output.
- If a transparent request may need CLI fallback, ask before using `gpt-image-1.5` unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Explain that the built-in chroma-key path is the default, but true transparency requires `gpt-image-1.5` because `gpt-image-2` does not support `background=transparent`.
- `gpt-image-2` always uses high fidelity for image inputs; do not set `input_fidelity` with this model.
- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
- Use `quality low` for fast drafts, thumbnails, and quick iterations. Use `medium`, `high`, or `auto` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
- Square images are typically fastest to generate. Use `1024x1024` for fast square drafts.
- If the user asks for 4K-style output, use `3840x2160` for landscape or `2160x3840` for portrait.
- `gpt-image-2` size may be `auto` or `WIDTHxHEIGHT` if all constraints hold: max edge `<= 3840px`, both edges multiples of `16px`, long-to-short ratio `<= 3:1`, total pixels between `655,360` and `8,294,400`.
Popular `gpt-image-2` sizes:
- `1024x1024` square
- `1536x1024` landscape
- `1024x1536` portrait
- `2048x2048` 2K square
- `2048x1152` 2K landscape
- `3840x2160` 4K landscape
- `2160x3840` 4K portrait
- `auto`
## Fallback CLI mode only
### Temp and output conventions
These conventions apply only to the CLI fallback. They do not describe built-in `image_gen` output behavior.
- Use `tmp/imagegen/` for intermediate files (for example JSONL batches); delete them when done.
- Write final artifacts under `output/imagegen/`.
- Use `--out` or `--out-dir` to control output paths; keep filenames stable and descriptive.
### Dependencies
Prefer `uv` for dependency management in this repo.
Required Python package:
```bash
uv pip install openai
```
Required for local chroma-key removal and optional downscaling:
```bash
uv pip install pillow
```
Portability note:
- If you are using the installed skill outside this repo, install dependencies into that environment with its package manager.
- In uv-managed environments, `uv pip install ...` remains the preferred path.
### Environment
- `OPENAI_API_KEY` must be set for live API calls.
- Do not ask the user for `OPENAI_API_KEY` when using the built-in `image_gen` tool.
- Never ask the user to paste the full key in chat. Ask them to set it locally and confirm when ready.
If the key is missing, give the user these steps:
1. Create an API key in the OpenAI platform UI: https://platform.openai.com/api-keys
2. Set `OPENAI_API_KEY` as an environment variable in their system.
3. Offer to guide them through setting the environment variable for their OS/shell if needed.
If installation is not possible in this environment, tell the user which dependency is missing and how to install it into their active environment.
### Script-mode notes
- CLI commands + examples: `references/cli.md`
- API parameter quick reference: `references/image-api.md`
- Network approvals / sandbox settings for CLI mode: `references/codex-network.md`
## Reference map
- `references/prompting.md`: shared prompting principles for both modes.
- `references/sample-prompts.md`: shared copy/paste prompt recipes for both modes.
- `references/cli.md`: fallback-only CLI usage via `scripts/image_gen.py`.
- `references/image-api.md`: fallback-only API/CLI parameter reference.
- `references/codex-network.md`: fallback-only network/sandbox troubleshooting for CLI mode.
- `scripts/image_gen.py`: fallback-only CLI implementation. Do not load or use it unless the user explicitly chooses CLI mode or explicitly confirms a transparent request's true CLI transparency fallback.
- `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`: local post-processing helper for built-in transparent-image requests.

View File

@@ -1,6 +0,0 @@
interface:
display_name: "Image Gen"
short_description: "Generate or edit images for websites, games, and more"
icon_small: "./assets/imagegen-small.svg"
icon_large: "./assets/imagegen.png"
default_prompt: "Use $imagegen to make or edit an image for this project."

View File

@@ -1,5 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path fill="currentColor" d="M7.51 6.827a1 1 0 1 1 .278 1.982 1 1 0 0 1-.278-1.982Z"/>
<path fill="currentColor" fill-rule="evenodd" d="M8.31 4.47c.368-.016.699.008 1.016.124l.186.075c.423.194.786.5 1.047.888l.067.107c.148.253.235.533.3.848.073.354.126.797.193 1.343l.277 2.25.088.745c.024.224.041.425.049.605.013.322-.004.615-.085.896l-.04.12a2.53 2.53 0 0 1-.802 1.115l-.16.118c-.281.189-.596.292-.956.366a9.46 9.46 0 0 1-.6.1l-.743.094-2.25.277c-.547.067-.99.121-1.35.136a2.765 2.765 0 0 1-.896-.085l-.12-.039a2.533 2.533 0 0 1-1.115-.802l-.118-.161c-.189-.28-.292-.596-.366-.956a9.42 9.42 0 0 1-.1-.599l-.094-.744-.276-2.25a17.884 17.884 0 0 1-.137-1.35c-.015-.367.009-.698.124-1.015l.076-.185c.193-.423.5-.787.887-1.048l.107-.067c.253-.148.534-.234.849-.3.354-.073.796-.126 1.343-.193l2.25-.277.744-.088c.224-.024.425-.041.606-.049Zm-2.905 5.978a1.47 1.47 0 0 0-.875.074c-.127.052-.267.146-.475.344-.212.204-.462.484-.822.889l-.314.351c.018.115.036.219.055.313.061.295.127.458.206.575l.07.094c.167.211.39.372.645.465l.109.032c.119.027.273.038.499.029.308-.013.7-.06 1.264-.13l2.25-.275.727-.093.198-.03-2.05-1.64a16.848 16.848 0 0 0-.96-.738c-.18-.121-.31-.19-.421-.23l-.106-.03Zm2.95-4.915c-.154.006-.33.021-.536.043l-.729.086-2.25.276c-.564.07-.956.118-1.257.18a1.937 1.937 0 0 0-.478.15l-.097.057a1.47 1.47 0 0 0-.515.608l-.044.107c-.048.133-.073.307-.06.608.012.307.06.7.129 1.264l.22 1.8.178-.197c.145-.159.278-.298.403-.418.255-.243.507-.437.809-.56l.181-.067a2.526 2.526 0 0 1 1.328-.06l.118.029c.27.079.517.215.772.387.287.194.619.46 1.03.789l2.52 2.016c.146-.148.26-.326.332-.524l.031-.109c.027-.119.039-.273.03-.499a8.311 8.311 0 0 0-.044-.536l-.086-.728-.276-2.25c-.07-.564-.118-.956-.18-1.258a1.935 1.935 0 0 0-.15-.477l-.057-.098a1.468 1.468 0 0 0-.608-.515l-.107-.043c-.133-.049-.306-.074-.607-.061Z" clip-rule="evenodd"/>
<path fill="currentColor" d="M7.783 1.272c.36.014.803.07 1.35.136l2.25.277.743.095c.224.03.423.062.6.099.36.074.675.177.955.366l.161.118c.364.29.642.675.802 1.115l.04.12c.081.28.098.574.085.896a9.42 9.42 0 0 1-.05.605l-.087.745-.277 2.25c-.067.547-.12.989-.193 1.343a2.765 2.765 0 0 1-.3.848l-.067.107a2.534 2.534 0 0 1-.415.474l-.086.064a.532.532 0 0 1-.622-.858l.13-.13c.04-.046.077-.094.111-.145l.057-.098c.055-.109.104-.256.15-.477.062-.302.11-.694.18-1.258l.276-2.25.086-.728c.022-.207.037-.382.043-.536.01-.226-.002-.38-.029-.5l-.032-.108a1.469 1.469 0 0 0-.464-.646l-.094-.069c-.118-.08-.28-.145-.575-.206a8.285 8.285 0 0 0-.53-.088l-.728-.092-2.25-.276c-.565-.07-.956-.117-1.264-.13a1.94 1.94 0 0 0-.5.029l-.108.032a1.469 1.469 0 0 0-.647.465l-.068.094c-.054.08-.102.18-.146.33l-.04.1a.533.533 0 0 1-.98-.403l.055-.166c.059-.162.133-.314.23-.457l.117-.16c.29-.365.675-.643 1.115-.803l.12-.04c.28-.08.574-.097.896-.084Z"/>
</svg>

Before

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.7 KiB

View File

@@ -1,242 +0,0 @@
# CLI reference (`scripts/image_gen.py`)
This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.
`generate-batch` is a CLI subcommand in this fallback path. It is not a top-level mode of the skill.
The word `batch` in a user request is not CLI opt-in by itself.
## What this CLI does
- `generate`: generate a new image from a prompt
- `edit`: edit one or more existing images
- `generate-batch`: run many generation jobs from a JSONL file after the user explicitly chooses CLI/API/model controls
Real API calls require **network access** + `OPENAI_API_KEY`. `--dry-run` does not.
## Quick start (works from any repo)
Set a stable path to the skill CLI (default `CODEX_HOME` is `~/.codex`):
```
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export IMAGE_GEN="$CODEX_HOME/skills/.system/imagegen/scripts/image_gen.py"
```
Install dependencies into that environment with its package manager. In uv-managed environments, `uv pip install ...` remains the preferred path.
## Quick start
Dry-run (no API call; no network required; does not require the `openai` package):
```bash
python "$IMAGE_GEN" generate \
--prompt "Test" \
--out output/imagegen/test.png \
--dry-run
```
Notes:
- One-off dry-runs print the API payload and the computed output path(s).
- Repo-local finals should live under `output/imagegen/`.
Generate (requires `OPENAI_API_KEY` + network):
```bash
python "$IMAGE_GEN" generate \
--prompt "A cozy alpine cabin at dawn" \
--size 1024x1024 \
--out output/imagegen/alpine-cabin.png
```
Edit:
```bash
python "$IMAGE_GEN" edit \
--image input.png \
--prompt "Replace only the background with a warm sunset" \
--out output/imagegen/sunset-edit.png
```
## Guardrails
- Use the bundled CLI directly (`python "$IMAGE_GEN" ...`) after activating the correct environment.
- Do **not** create one-off runners (for example `gen_images.py`) unless the user explicitly asks for a custom wrapper.
- **Never modify** `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
- Do not silently downgrade from CLI `gpt-image-2` or built-in `image_gen` to CLI `gpt-image-1.5`; ask first unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
## Defaults
- Model: `gpt-image-2`
- Supported model family for this CLI: GPT Image models (`gpt-image-*`)
- Size: `auto`
- Quality: `medium`
- Output format: `png`
- Default one-off output path: `output/imagegen/output.png`
- Background: unspecified unless `--background` is set
## gpt-image-2 size and model guidance
`gpt-image-2` is the default model for new CLI fallback work.
- Use `--quality low` for fast drafts, thumbnails, and quick iterations.
- Use `--quality medium`, `--quality high`, or `--quality auto` for final assets, dense text, diagrams, identity-sensitive edits, and high-resolution outputs.
- Square images are typically fastest. Use `--size 1024x1024` for quick square drafts.
- If the user asks for 4K-style output, use `--size 3840x2160` for landscape or `--size 2160x3840` for portrait.
- Do not pass `--input-fidelity` with `gpt-image-2`; this model always uses high fidelity for image inputs.
- Do not use `--background transparent` with `gpt-image-2`; the default transparent-image workflow uses built-in `image_gen` on a flat chroma-key background plus local removal. Use `gpt-image-1.5` only after the user explicitly confirms the true-transparent CLI fallback, unless they already requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
Popular `gpt-image-2` sizes:
- `1024x1024`
- `1536x1024`
- `1024x1536`
- `2048x2048`
- `2048x1152`
- `3840x2160`
- `2160x3840`
- `auto`
`gpt-image-2` size constraints:
- max edge `<= 3840px`
- both edges multiples of `16px`
- long edge to short edge ratio `<= 3:1`
- total pixels between `655,360` and `8,294,400`
- outputs above `2560x1440` total pixels are experimental
Fast draft:
```bash
python "$IMAGE_GEN" generate \
--prompt "A product thumbnail of a matte ceramic mug on a stone surface" \
--quality low \
--size 1024x1024 \
--out output/imagegen/mug-draft.png
```
Final 2K landscape:
```bash
python "$IMAGE_GEN" generate \
--prompt "A polished landing-page hero image of a matte ceramic mug on a stone surface" \
--quality high \
--size 2048x1152 \
--out output/imagegen/mug-hero.png
```
4K landscape:
```bash
python "$IMAGE_GEN" generate \
--prompt "A detailed architectural visualization at golden hour" \
--size 3840x2160 \
--quality high \
--out output/imagegen/architecture-4k.png
```
True transparent fallback request:
Ask for confirmation before using this command unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
```bash
python "$IMAGE_GEN" generate \
--model gpt-image-1.5 \
--prompt "A clean product cutout on a transparent background" \
--background transparent \
--output-format png \
--out output/imagegen/product-cutout.png
```
When using this path, explain briefly that built-in `image_gen` plus chroma-key removal is the default transparent-image path, but this request needs true model-native transparency. `gpt-image-2` does not support `background=transparent`, so `gpt-image-1.5` is required for this confirmed fallback.
## Quality, input fidelity, and masks (CLI fallback only)
These are explicit CLI controls. They are not built-in `image_gen` tool arguments.
- `--quality` works for `generate`, `edit`, and `generate-batch`: `low|medium|high|auto`
- `--input-fidelity` is **edit-only** and validated as `low|high`; it is not supported for `gpt-image-2`
- `--mask` is **edit-only**
Example:
```bash
python "$IMAGE_GEN" edit \
--model gpt-image-1.5 \
--image input.png \
--prompt "Change only the background" \
--quality high \
--input-fidelity high \
--out output/imagegen/background-edit.png
```
Mask notes:
- For multi-image edits, pass repeated `--image` flags. Their order is meaningful, so describe each image by index and role in the prompt.
- The CLI accepts a single `--mask`.
- Image and mask must be the same size and format and each under 50MB.
- Masks must include an alpha channel.
- If multiple input images are provided, the mask applies to the first image.
- Masking is prompt-guided; do not promise exact pixel-perfect mask boundaries.
- Use a PNG mask when possible; the script treats mask handling as best-effort and does not perform full preflight validation beyond file checks/warnings.
- In the edit prompt, repeat invariants (`change only the background; keep the subject unchanged`) to reduce drift.
## Output handling
- Use `tmp/imagegen/` for temporary JSONL inputs or scratch files.
- Use `output/imagegen/` for final outputs.
- Reruns fail if a target file already exists unless you pass `--force`.
- `--out-dir` changes one-off naming to `image_1.<ext>`, `image_2.<ext>`, and so on.
- Downscaled copies use the default suffix `-web` unless you override it.
## Common recipes
Generate with augmentation fields:
```bash
python "$IMAGE_GEN" generate \
--prompt "A minimal hero image of a ceramic coffee mug" \
--use-case "product-mockup" \
--style "clean product photography" \
--composition "wide product shot with usable negative space for page copy" \
--constraints "no logos, no text" \
--out output/imagegen/mug-hero.png
```
Generate + also write a downscaled copy for fast web loading:
```bash
python "$IMAGE_GEN" generate \
--prompt "A cozy alpine cabin at dawn" \
--size 1024x1024 \
--downscale-max-dim 1024 \
--out output/imagegen/alpine-cabin.png
```
Generate multiple prompts concurrently (async batch):
```bash
mkdir -p tmp/imagegen output/imagegen/batch
cat > tmp/imagegen/prompts.jsonl << 'EOF'
{"prompt":"Cavernous hangar interior with a compact shuttle parked near the center","use_case":"stylized-concept","composition":"wide-angle, low-angle","lighting":"volumetric light rays through drifting fog","constraints":"no logos or trademarks; no watermark","size":"1536x1024"}
{"prompt":"Gray wolf in profile in a snowy forest","use_case":"photorealistic-natural","composition":"eye-level","constraints":"no logos or trademarks; no watermark","size":"1024x1024"}
EOF
python "$IMAGE_GEN" generate-batch \
--input tmp/imagegen/prompts.jsonl \
--out-dir output/imagegen/batch \
--concurrency 5
rm -f tmp/imagegen/prompts.jsonl
```
Notes:
- `generate-batch` requires `--out-dir`.
- generate-batch requires --out-dir.
- Use `--concurrency` to control parallelism (default `5`).
- Per-job overrides are supported in JSONL (for example `size`, `quality`, `background`, `output_format`, `output_compression`, `moderation`, `n`, `model`, `out`, and prompt-augmentation fields).
- `--n` generates multiple variants for a single prompt; `generate-batch` is for many different prompts.
- In batch mode, per-job `out` is treated as a filename under `--out-dir`.
- For many requested deliverable assets, provide one prompt/job per distinct asset and use semantic filenames when possible.
## CLI notes
- Supported sizes depend on the model. `gpt-image-2` supports flexible constrained sizes; older GPT Image models support `1024x1024`, `1536x1024`, `1024x1536`, or `auto`.
- True transparent CLI outputs require `output_format` to be `png` or `webp` and are not supported by `gpt-image-2`.
- `--prompt-file`, `--output-compression`, `--moderation`, `--max-attempts`, `--fail-fast`, `--force`, and `--no-augment` are supported.
- This CLI is intended for GPT Image models. Do not assume older non-GPT image-model behavior applies here.
## See also
- API parameter quick reference for fallback CLI mode: `references/image-api.md`
- Prompt examples shared across both top-level modes: `references/sample-prompts.md`
- Network/sandbox notes for fallback CLI mode: `references/codex-network.md`
- Built-in-first transparent image workflow: `SKILL.md` and `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`

View File

@@ -1,33 +0,0 @@
# Codex network approvals / sandbox notes
This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.
This guidance is intentionally isolated from `SKILL.md` because it can vary by environment and may become stale. Prefer the defaults in your environment when in doubt.
## Why am I asked to approve image generation calls?
The fallback CLI uses the OpenAI Image API, so it needs outbound network access. In many Codex setups, network access is disabled by default and/or the approval policy requires confirmation before networked commands run.
## Important note about approvals vs network
- `--ask-for-approval never` suppresses approval prompts.
- It does **not** by itself enable network access.
- In `workspace-write`, network access still depends on your Codex configuration (for example `[sandbox_workspace_write] network_access = true`).
## How do I reduce repeated approval prompts?
If you trust the repo and want fewer prompts, use a configuration or profile that both:
- enables network for the sandbox mode you plan to use
- sets an approval policy that matches your risk tolerance
Example `~/.codex/config.toml` pattern:
```toml
approval_policy = "on-request"
sandbox_mode = "workspace-write"
[sandbox_workspace_write]
network_access = true
```
If you want quieter automation after network is enabled, you can choose a stricter approval policy, but do that intentionally and with care.
## Safety note
Enabling network and reducing approvals lowers friction, but increases risk if you run untrusted code or work in an untrusted repository.

View File

@@ -1,90 +0,0 @@
# Image API quick reference
This file is for the fallback CLI mode only. Use it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.
These parameters describe the Image API and bundled CLI fallback surface. Do not assume they are normal arguments on the built-in `image_gen` tool.
## Scope
- This fallback CLI is intended for GPT Image models (`gpt-image-2`, `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`).
- The built-in `image_gen` tool and the fallback CLI do not expose the same controls.
## Model summary
| Model | Quality | Input fidelity | Resolutions | Recommended use |
| --- | --- | --- | --- | --- |
| `gpt-image-2` | `low`, `medium`, `high`, `auto` | Always high fidelity for image inputs; do not set `input_fidelity` | `auto` or flexible sizes that satisfy the constraints below | Default for new CLI/API workflows: high-quality generation and editing, text-heavy images, photorealism, compositing, identity-sensitive edits, and workflows where fewer retries matter |
| `gpt-image-1.5` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | True transparent-background fallback and backward-compatible workflows |
| `gpt-image-1` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Legacy compatibility |
| `gpt-image-1-mini` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Cost-sensitive draft batches and lower-stakes previews |
## gpt-image-2 sizes
`gpt-image-2` accepts `auto` or any `WIDTHxHEIGHT` size that satisfies all constraints:
- Maximum edge length must be less than or equal to `3840px`.
- Both edges must be multiples of `16px`.
- Long edge to short edge ratio must not exceed `3:1`.
- Total pixels must be at least `655,360` and no more than `8,294,400`.
Popular sizes:
| Label | Size | Notes |
| --- | --- | --- |
| Square | `1024x1024` | Typical fast default |
| Landscape | `1536x1024` | Standard landscape |
| Portrait | `1024x1536` | Standard portrait |
| 2K square | `2048x2048` | Larger square output |
| 2K landscape | `2048x1152` | Widescreen output |
| 4K landscape | `3840x2160` | Widescreen 4K output |
| 4K portrait | `2160x3840` | Vertical 4K output |
| Auto | `auto` | Default size |
Square images are typically fastest to generate. For 4K-style output, use `3840x2160` or `2160x3840`.
## Endpoints
- Generate: `POST /v1/images/generations` (`client.images.generate(...)`)
- Edit: `POST /v1/images/edits` (`client.images.edit(...)`)
## Core parameters for GPT Image models
- `prompt`: text prompt
- `model`: image model
- `n`: number of images (1-10)
- `size`: `auto` by default for `gpt-image-2`; flexible `WIDTHxHEIGHT` sizes are allowed only for `gpt-image-2`; older GPT Image models use `1024x1024`, `1536x1024`, `1024x1536`, or `auto`
- `quality`: `low`, `medium`, `high`, or `auto`
- `background`: output transparency behavior (`transparent`, `opaque`, or `auto`) for generated output; this is not the same thing as the prompt's visual scene/backdrop
- `output_format`: `png` (default), `jpeg`, `webp`
- `output_compression`: 0-100 (jpeg/webp only)
- `moderation`: `auto` (default) or `low`
## Edit-specific parameters
- `image`: one or more input images. For GPT Image models, you can provide up to 16 images.
- `mask`: optional mask image
- `input_fidelity`: `low` or `high` only for models that support it; do not set this for `gpt-image-2`
Model-specific note for `input_fidelity`:
- `gpt-image-2` always uses high fidelity for image inputs and does not support setting `input_fidelity`.
- `gpt-image-1` and `gpt-image-1-mini` preserve all input images, but the first image gets richer textures and finer details.
- `gpt-image-1.5` preserves the first 5 input images with higher fidelity.
## Transparent backgrounds
`gpt-image-2` does not currently support the Image API `background=transparent` parameter. The skill's default transparent-image path is built-in `image_gen` with a flat chroma-key background, followed by local alpha extraction with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py"`.
Use CLI `gpt-image-1.5` with `background=transparent` and a transparent-capable output format such as `png` or `webp` only after the user explicitly confirms that fallback, unless they already requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. If the user asks for true/native transparency, the subject is too complex for clean chroma-key removal, or local background removal fails validation, explain the tradeoff and ask before switching.
## Output
- `data[]` list with `b64_json` per image
- The bundled `scripts/image_gen.py` CLI decodes `b64_json` and writes output files for you.
## Limits and notes
- Input images and masks must be under 50MB.
- Use the edits endpoint when the user requests changes to an existing image.
- Masking is prompt-guided; exact shapes are not guaranteed.
- Large sizes and high quality increase latency and cost.
- Use `quality=low` for fast drafts, thumbnails, and quick iterations. Use `medium` or `high` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
- High `input_fidelity` can materially increase input token usage on models that support it.
- If a request fails because a specific option is unsupported by the selected GPT Image model, retry manually without that option only when the option is not required by the user. If true transparent CLI output is required, ask before switching to `gpt-image-1.5` instead of dropping `background=transparent`, unless the user already explicitly chose that fallback.
## Important boundary
- `quality`, `input_fidelity`, explicit masks, `background`, `output_format`, and related parameters are fallback-only execution controls.
- Do not assume they are built-in `image_gen` tool arguments.

View File

@@ -1,118 +0,0 @@
# Prompting best practices
These prompting principles are shared by both top-level modes of the skill:
- built-in `image_gen` tool (default)
- explicit `scripts/image_gen.py` CLI fallback
This file is about prompt structure, specificity, and iteration. Fallback-only execution controls such as `quality`, `input_fidelity`, masks, output format, and output paths live in the fallback docs.
## Contents
- [Structure](#structure)
- [Specificity policy](#specificity-policy)
- [Allowed and disallowed augmentation](#allowed-and-disallowed-augmentation)
- [Composition and layout](#composition-and-layout)
- [Constraints and invariants](#constraints-and-invariants)
- [Text in images](#text-in-images)
- [Input images and references](#input-images-and-references)
- [Iterate deliberately](#iterate-deliberately)
- [Transparent images](#transparent-images)
- [Fallback-only execution controls](#fallback-only-execution-controls)
- [Use-case tips](#use-case-tips)
- [Where to find copy/paste recipes](#where-to-find-copypaste-recipes)
## Structure
- Use a consistent order: scene/backdrop -> subject -> key details -> constraints -> output intent.
- Include intended use (ad, UI mock, infographic) to set the level of polish.
- For complex requests, use short labeled lines instead of one long paragraph.
## Specificity policy
- If the user prompt is already specific and detailed, normalize it into a clean spec without adding creative requirements.
- If the prompt is generic, you may add tasteful detail when it materially improves the output.
- Treat examples in `sample-prompts.md` as fully-authored recipes, not as the default amount of augmentation to add to every request.
- For photorealism, include `photorealistic` directly when that is the goal, plus concrete real-world texture such as pores, wrinkles, fabric wear, material grain, or imperfect everyday detail.
## Allowed and disallowed augmentation
Allowed augmentation for generic prompts:
- composition and framing cues
- intended-use or polish-level hints
- practical layout guidance
- reasonable scene concreteness that supports the request
Do not add:
- extra characters, props, or objects that are not implied
- brand palettes, slogans, or story beats that are not implied
- arbitrary side-specific placement unless the surrounding layout supports it
## Composition and layout
- Specify framing and viewpoint (close-up, wide, top-down) and placement only when it materially helps.
- Call out negative space if the asset clearly needs room for UI or copy.
- Avoid making left/right layout decisions unless the user or surrounding layout supports them.
- For people, describe body framing, scale, gaze, and object interactions when they matter (`full body visible`, `looking down at the book`, `hands naturally gripping the handlebars`).
## Constraints and invariants
- State what must not change (`keep background unchanged`).
- For edits, say `change only X; keep Y unchanged` and repeat invariants on every iteration to reduce drift.
## Text in images
- Put literal text in quotes or ALL CAPS and specify typography (font style, size, color, placement).
- Spell uncommon words letter-by-letter if accuracy matters.
- For in-image copy, require verbatim rendering and no extra characters.
- In CLI fallback mode, use `medium` or `high` quality for small text, dense infographics, data-heavy slides, multi-font layouts, legends, axes, and footnotes.
## Input images and references
- Do not assume that every provided image is an edit target.
- Label each image by index and role (`Image 1: edit target`, `Image 2: style reference`).
- If the user provides images for style, composition, or mood guidance and does not ask to modify them, treat the request as generation with references.
- If the user asks to preserve an existing image while changing specific parts, treat the request as an edit.
- For compositing, describe how the images interact (`place the subject from Image 2 into Image 1`).
## Iterate deliberately
- Start with a clean base prompt, then make small single-change edits.
- Re-specify critical constraints when you iterate.
- Prefer one targeted follow-up at a time over rewriting the whole prompt.
## Transparent images
- Use built-in `image_gen` first for transparent-image requests. If the subject is clearly too complex for chroma-key removal, explain the fallback and ask before switching to CLI.
- Prompt for a perfectly flat solid chroma-key background, usually `#00ff00`; use `#ff00ff` when the subject is green, and avoid key colors that appear in the subject.
- Explicitly prohibit shadows, gradients, floor planes, reflections, texture, and lighting variation in the background.
- Ask for crisp edges, generous padding, and no use of the key color inside the subject.
- After generation, remove the background locally with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" --input <source> --out <final.png> --auto-key border --soft-matte --transparent-threshold 12 --opaque-threshold 220 --despill` and validate the alpha result before shipping it.
- Use soft matte and despill for antialiased edges; hard tolerance-only removal is mainly for flat pixel-art or exact-color fixtures.
- Use CLI `gpt-image-1.5 --background transparent --output-format png` only after the user explicitly confirms the fallback, or when the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Ask first for true/native transparency requests, failed chroma-key validation, or complex transparent subjects such as hair, fur, glass, smoke, liquids, translucent materials, reflective objects, or soft shadows.
## Fallback-only execution controls
- `quality`, `input_fidelity`, explicit masks, output format, and output paths are fallback-only execution controls.
- Do not assume they are built-in `image_gen` tool arguments.
- If the user explicitly chooses CLI fallback, see `references/cli.md` and `references/image-api.md` for those controls.
- In CLI fallback mode, `gpt-image-2` is the default. It supports `quality=low|medium|high|auto`; use `low` for fast drafts and thumbnails, and move to `medium`, `high`, or `auto` for final assets.
- `gpt-image-2` always uses high fidelity for image inputs, so do not set `input_fidelity` with that model.
- If a transparent request needs true CLI transparency, ask before using `gpt-image-1.5` unless the user already explicitly chose it. Explain that built-in chroma-key removal is the default path, but `gpt-image-2` does not support `background=transparent`.
- If the user asks for 4K-style output with `gpt-image-2`, use `3840x2160` for landscape or `2160x3840` for portrait.
## Use-case tips
Generate:
- photorealistic-natural: Prompt as if a real photo is captured in the moment; use photography language (lens, lighting, framing); call for real texture; avoid over-stylized polish unless requested.
- product-mockup: Describe the product/packaging and materials; ensure clean silhouette and label clarity; if in-image text is needed, require verbatim rendering and specify typography.
- ui-mockup: Describe the target fidelity first (shippable mockup or low-fi wireframe), then focus on layout, hierarchy, and practical UI elements; avoid concept-art language.
- infographic-diagram: Define the audience and layout flow; label parts explicitly; require verbatim text; prefer higher quality in CLI mode for dense labels.
- logo-brand: Keep it simple and scalable; ask for a strong silhouette and balanced negative space; avoid decorative flourishes unless requested.
- ads-marketing: Write like a creative brief; include brand positioning, audience, desired vibe, scene, and exact tagline if text must appear.
- productivity-visual: Name the exact artifact (slide, chart, workflow diagram), define the canvas and hierarchy, provide real labels/data, and ask for readable typography and polished spacing.
- scientific-educational: Define audience, lesson objective, required labels, scientific constraints, arrows, and scan-friendly whitespace.
- illustration-story: Define panels or scene beats; keep each action concrete.
- stylized-concept: Specify style cues, material finish, and rendering approach (3D, painterly, clay) without inventing new story elements.
- historical-scene: State the location/date and required period accuracy; constrain clothing, props, and environment to match the era.
Edit:
- text-localization: Change only the text; preserve layout, typography, spacing, and hierarchy; no extra words or reflow unless needed.
- identity-preserve: Lock identity (face, body, pose, hair, expression); change only the specified elements; match lighting and shadows.
- precise-object-edit: Specify exactly what to remove/replace; preserve surrounding texture and lighting; keep everything else unchanged.
- lighting-weather: Change only environmental conditions (light, shadows, atmosphere, precipitation); keep geometry, framing, and subject identity.
- background-extraction: For simple opaque subjects, request a clean cutout on a perfectly flat chroma-key background; crisp silhouette; generous padding; no shadows; no halos; preserve label text exactly; no restyling. Ask before using true CLI transparency for complex subjects.
- style-transfer: Specify style cues to preserve (palette, texture, brushwork) and what must change; add `no extra elements` to prevent drift.
- compositing: Reference inputs by index; specify what moves where; match lighting, perspective, and scale; keep the base framing unchanged.
- sketch-to-render: Preserve layout, proportions, and perspective; choose materials and lighting that support the supplied sketch without adding new elements.
## Where to find copy/paste recipes
For copy/paste prompt specs (examples only), see `references/sample-prompts.md`. This file focuses on principles, specificity, and iteration patterns.

View File

@@ -1,433 +0,0 @@
# Sample prompts (copy/paste)
These prompt recipes are shared across both top-level modes of the skill:
- built-in `image_gen` tool (default)
- `scripts/image_gen.py` CLI fallback for explicit CLI/API/model requests or user-confirmed true-transparent-output fallback requests
Use these as starting points. They are intentionally complete prompt recipes, not the default amount of augmentation to add to every user request.
When adapting a user's prompt:
- keep user-provided requirements
- only add detail according to the specificity policy in `SKILL.md`
- do not treat every example below as permission to invent extra story elements
The labeled lines are prompt scaffolding, not a closed schema. `Asset type` and `Input images` are prompt-only scaffolding; the CLI does not expose them as dedicated flags.
Execution details such as explicit CLI flags, `quality`, `input_fidelity`, masks, output formats, and local output paths depend on mode. Use the built-in tool by default, including simple transparent-image requests. For transparent images, prompt for a flat chroma-key background and remove it locally with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py"`; only apply CLI-specific controls when the user explicitly opts into fallback mode or explicitly confirms that the transparent request should use true CLI transparency.
CLI model notes:
- `gpt-image-2` is the fallback CLI default for new workflows.
- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
- For 4K-style `gpt-image-2` output, use `3840x2160` or `2160x3840`.
- If transparent output needs true CLI fallback, ask before using `gpt-image-1.5` unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Explain that built-in chroma-key removal is the default path, but `gpt-image-2` does not support `background=transparent`.
- Do not set `input_fidelity` with `gpt-image-2`; image inputs already use high fidelity.
For prompting principles (structure, specificity, invariants, iteration), see `references/prompting.md`.
## Generate
### photorealistic-natural
```
Use case: photorealistic-natural
Primary request: candid photo of an elderly sailor on a small fishing boat adjusting a net
Scene/backdrop: coastal water with soft haze
Subject: weathered skin with wrinkles and sun texture
Style/medium: photorealistic candid photo
Composition/framing: medium close-up, eye-level
Lighting/mood: soft coastal daylight, shallow depth of field, subtle film grain
Materials/textures: real skin texture, worn fabric, salt-worn wood
Constraints: natural color balance; no heavy retouching; no glamorization; no watermark
Avoid: studio polish; staged look
```
### product-mockup
```
Use case: product-mockup
Primary request: premium product photo of a matte black shampoo bottle with a minimal label
Scene/backdrop: clean studio gradient from light gray to white
Subject: single bottle centered with subtle reflection
Style/medium: premium product photography
Composition/framing: centered, slight three-quarter angle, generous padding
Lighting/mood: softbox lighting, clean highlights, controlled shadows
Materials/textures: matte plastic, crisp label printing
Constraints: no logos or trademarks; no watermark
```
### ui-mockup
```
Use case: ui-mockup
Primary request: mobile app home screen for a local farmers market with vendors and daily specials
Asset type: mobile app screen
Style/medium: realistic product UI, not concept art
Composition/framing: clean vertical mobile layout with clear hierarchy
Constraints: practical layout, clear typography, no logos or trademarks, no watermark
```
### infographic-diagram
```
Use case: infographic-diagram
Primary request: detailed infographic of an automatic coffee machine flow
Scene/backdrop: clean, light neutral background
Subject: bean hopper -> grinder -> brew group -> boiler -> water tank -> drip tray
Style/medium: clean vector-like infographic with clear callouts and arrows
Composition/framing: vertical poster layout, top-to-bottom flow
Text (verbatim): "Bean Hopper", "Grinder", "Brew Group", "Boiler", "Water Tank", "Drip Tray"
Constraints: clear labels, strong contrast, no logos or trademarks, no watermark
```
### scientific-educational
```
Use case: scientific-educational
Primary request: biology diagram titled "Cellular Respiration at a Glance" for high school students
Scene/backdrop: clean white classroom handout background
Subject: glucose turns into energy inside a cell; include glycolysis, Krebs cycle, and electron transport chain
Style/medium: flat scientific diagram with consistent icons, arrows, and readable labels
Composition/framing: landscape slide-style layout with clear hierarchy and generous whitespace
Text (verbatim): "Cellular Respiration at a Glance", "Glucose", "Pyruvate", "ATP", "NADH", "FADH2", "CO2", "O2", "H2O"
Constraints: scientifically plausible; avoid tiny text; no extra decoration; no watermark
```
### logo-brand
```
Use case: logo-brand
Primary request: original logo for "Field & Flour", a local bakery
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: single centered logo on a plain background with generous padding
Constraints: strong silhouette, balanced negative space; original design only; no gradients unless essential; no trademarks; no watermark
```
### illustration-story
```
Use case: illustration-story
Primary request: 4-panel comic about a pet left alone at home
Scene/backdrop: cozy living room across panels
Subject: pet reacting to the owner leaving, then relaxing, then returning to a composed pose
Style/medium: comic illustration with clear panels
Composition/framing: 4 equal-sized vertical panels, readable actions per panel
Constraints: no text; no logos or trademarks; no watermark
```
### stylized-concept
```
Use case: stylized-concept
Primary request: cavernous hangar interior with tall support beams and drifting fog
Scene/backdrop: industrial hangar interior, deep scale, light haze
Subject: compact shuttle parked near the center
Style/medium: cinematic concept art, industrial realism
Composition/framing: wide-angle, low-angle
Lighting/mood: volumetric light rays cutting through fog
Constraints: no logos or trademarks; no watermark
```
### ads-marketing
```
Use case: ads-marketing
Primary request: campaign image for a streetwear brand called Thread
Subject: group of friends hanging out together in a stylish urban setting
Style/medium: polished youth streetwear campaign photography
Composition/framing: vertical ad layout with natural poses and integrated headline space
Lighting/mood: contemporary, energetic, tasteful
Text (verbatim): "Yours to Create."
Constraints: render the tagline exactly once; clean legible typography; no extra text; no watermarks; no unrelated logos
```
### productivity-visual
```
Use case: productivity-visual
Primary request: one pitch-deck slide titled "Market Opportunity"
Asset type: fundraising slide image
Style/medium: clean modern deck slide, white background, crisp sans-serif typography
Subject: TAM/SAM/SOM concentric-circle diagram plus a small growth bar chart from 2021 to 2026
Composition/framing: 16:9 landscape slide, clear data hierarchy, polished spacing
Text (verbatim): "Market Opportunity", "TAM: $42B", "SAM: $8.7B", "SOM: $340M", "AGI Research, 2024", "Internal analysis"
Constraints: readable labels, no clip art, no stock photography, no decorative clutter, no watermark
```
### historical-scene
```
Use case: historical-scene
Primary request: outdoor crowd scene in Bethel, New York on August 16, 1969
Scene/backdrop: open field with period-appropriate staging
Subject: crowd in period-accurate clothing, authentic environment
Style/medium: photorealistic photo
Composition/framing: wide shot, eye-level
Constraints: period-accurate details; no modern objects; no logos or trademarks; no watermark
```
## Asset type templates (taxonomy-aligned)
### Website assets template
```
Use case: <photorealistic-natural|stylized-concept|product-mockup|infographic-diagram|ui-mockup>
Asset type: <hero image / section illustration / blog header>
Primary request: <short description>
Scene/backdrop: <environment or abstract backdrop>
Subject: <main subject>
Style/medium: <photo/illustration/3D>
Composition/framing: <wide/centered; note usable negative space only if needed>
Lighting/mood: <soft/bright/neutral>
Color palette: <brand colors or neutral>
Constraints: <no text; no logos; no watermark; leave room for UI if needed>
```
### Website assets example: minimal hero background
```
Use case: stylized-concept
Asset type: landing page hero background
Primary request: minimal abstract background with a soft gradient and subtle texture
Style/medium: matte illustration / soft-rendered abstract background
Composition/framing: wide composition with usable negative space for page copy
Lighting/mood: gentle studio glow
Color palette: restrained neutral palette
Constraints: no text; no logos; no watermark
```
### Website assets example: feature section illustration
```
Use case: stylized-concept
Asset type: feature section illustration
Primary request: simple abstract shapes suggesting connection and flow
Scene/backdrop: subtle light-gray backdrop with faint texture
Style/medium: flat illustration; soft shadows; restrained contrast
Composition/framing: centered cluster; open margins for UI
Color palette: muted neutral palette
Constraints: no text; no logos; no watermark
```
### Website assets example: blog header image
```
Use case: photorealistic-natural
Asset type: blog header image
Primary request: overhead desk scene with notebook, pen, and coffee cup
Scene/backdrop: warm wooden tabletop
Style/medium: photorealistic photo
Composition/framing: wide crop with clean room for page copy
Lighting/mood: soft morning light
Constraints: no text; no logos; no watermark
```
### Game assets template
```
Use case: stylized-concept
Asset type: <game environment concept art / game character concept / game UI icon / tileable game texture>
Primary request: <biome/scene/character/icon/material>
Scene/backdrop: <location + set dressing> (if applicable)
Subject: <main focal element(s)>
Style/medium: <realistic/stylized>; <concept art / character render / UI icon / texture>
Composition/framing: <wide/establishing/top-down>; <camera angle>; <focal point placement>
Lighting/mood: <time of day>; <mood>; <volumetric/fog/etc>
Constraints: no logos or trademarks; no watermark
```
### Game assets example: environment concept art
```
Use case: stylized-concept
Asset type: game environment concept art
Primary request: cavernous hangar interior with tall support beams and drifting fog
Scene/backdrop: industrial hangar interior, deep scale, light haze
Subject: compact shuttle parked near the center
Style/medium: cinematic concept art, industrial realism
Composition/framing: wide-angle, low-angle
Lighting/mood: volumetric light rays cutting through fog
Constraints: no logos or trademarks; no watermark
```
### Game assets example: character concept
```
Use case: stylized-concept
Asset type: game character concept
Primary request: desert scout character with layered travel gear
Subject: long coat, satchel, practical travel clothing
Style/medium: character render; stylized realism
Composition/framing: neutral hero pose on a simple backdrop
Constraints: no logos or trademarks; no watermark
```
### Game assets example: UI icon
```
Use case: stylized-concept
Asset type: game UI icon
Primary request: round shield icon with a subtle rune pattern
Style/medium: painted game UI icon
Composition/framing: centered icon; generous padding; clear silhouette
Constraints: no text; no background scene elements; no logos or trademarks; no watermark
```
### Game assets example: tileable texture
```
Use case: stylized-concept
Asset type: tileable game texture
Primary request: worn sandstone blocks
Style/medium: seamless tileable texture; PBR-ish look
Scene/backdrop: neutral lighting reference only
Constraints: seamless edges; no obvious focal elements; no text; no logos or trademarks; no watermark
```
### Wireframe template
```
Use case: ui-mockup
Asset type: website wireframe
Primary request: <page or flow to sketch>
Style/medium: low-fi grayscale wireframe
Composition/framing: <landscape or portrait to match expected device>
Subject: <sections in order; grid/columns; key labels>
Constraints: no color; no logos; no real photos; no watermark
```
### Wireframe example: homepage (desktop)
```
Use case: ui-mockup
Asset type: website wireframe
Primary request: SaaS homepage layout with clear hierarchy
Style/medium: low-fi grayscale wireframe
Subject: top nav; hero with headline and CTA; three feature cards; testimonial strip; pricing preview; footer
Composition/framing: landscape desktop layout
Constraints: label major blocks; no color; no logos; no real photos; no watermark
```
### Wireframe example: pricing page
```
Use case: ui-mockup
Asset type: website wireframe
Primary request: pricing page layout with comparison table
Style/medium: low-fi grayscale wireframe
Subject: header; plan toggle; 3 pricing cards; comparison table; FAQ accordion; footer
Composition/framing: desktop or tablet layout
Constraints: label key areas; no color; no logos; no real photos; no watermark
```
### Wireframe example: mobile onboarding flow
```
Use case: ui-mockup
Asset type: mobile onboarding wireframe
Primary request: three-screen mobile onboarding flow
Style/medium: low-fi grayscale wireframe
Subject: screen 1 headline and CTA; screen 2 feature bullets; screen 3 form fields and CTA
Composition/framing: portrait mobile layout
Constraints: label screens and blocks; no color; no logos; no real photos; no watermark
```
### Logo template
```
Use case: logo-brand
Asset type: logo concept
Primary request: <brand idea or symbol concept>
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: centered mark; clear silhouette; generous margin
Color palette: <1-2 colors; high contrast>
Text (verbatim): "<exact name>" (only if needed)
Constraints: no gradients; no mockups; no 3D; no watermark
```
### Logo example: abstract symbol mark
```
Use case: logo-brand
Asset type: logo concept
Primary request: geometric leaf symbol suggesting sustainability and growth
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: centered mark; clear silhouette
Color palette: deep green and off-white
Constraints: no text unless requested; no gradients; no mockups; no 3D; no watermark
```
### Logo example: monogram mark
```
Use case: logo-brand
Asset type: logo concept
Primary request: interlocking monogram of the letters "AV"
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: centered mark; balanced spacing
Color palette: black on white
Constraints: no gradients; no mockups; no 3D; no watermark
```
### Logo example: wordmark
```
Use case: logo-brand
Asset type: logo concept
Primary request: clean wordmark for a modern studio
Style/medium: vector wordmark; flat colors; minimal
Text (verbatim): "Studio North"
Composition/framing: centered text; even letter spacing
Constraints: no gradients; no mockups; no 3D; no watermark
```
## Edit
### text-localization
```
Use case: text-localization
Input images: Image 1: original infographic
Primary request: replace "Bean Hopper", "Grinder", "Brew Group", "Boiler", "Water Tank", and "Drip Tray" with "Tolva", "Molino", "Grupo de infusión", "Caldera", "Depósito de agua", and "Bandeja de goteo"
Constraints: change only the text; preserve layout, typography, spacing, and hierarchy; no extra words; do not alter logos or imagery
```
### identity-preserve
```
Use case: identity-preserve
Input images: Image 1: person photo; Image 2..N: clothing references
Primary request: replace only the clothing with the provided garments
Constraints: preserve face, body shape, pose, hair, expression, and identity; match lighting and shadows; keep the background unchanged; no accessories or text
```
### precise-object-edit
```
Use case: precise-object-edit
Input images: Image 1: room photo
Primary request: replace only the white chairs with wooden chairs
Constraints: preserve camera angle, room lighting, floor shadows, and surrounding objects; keep all other aspects unchanged
```
### lighting-weather
```
Use case: lighting-weather
Input images: Image 1: original photo
Primary request: make it look like a winter evening with gentle snowfall
Constraints: preserve subject identity, geometry, camera angle, and composition; change only lighting, atmosphere, and weather
```
### background-extraction
```
Use case: background-extraction
Input images: Image 1: product photo
Primary request: isolate the product on a clean transparent background
Scene/backdrop: perfectly flat solid #00ff00 chroma-key background for local background removal
Constraints: background must be one uniform color with no shadows, gradients, texture, reflections, floor plane, or lighting variation; crisp silhouette; generous padding; no halos or fringing; preserve label text exactly; no restyling; do not use #00ff00 anywhere in the subject
```
Post-process note: after built-in generation, run `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" --input <source> --out <final.png> --auto-key border --soft-matte --transparent-threshold 12 --opaque-threshold 220 --despill`. Ask before using CLI `gpt-image-1.5 --background transparent --output-format png` for true/native transparency, failed chroma-key validation, or complex subjects such as hair, fur, glass, smoke, liquids, translucent materials, reflections, or soft shadows, unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
### style-transfer
```
Use case: style-transfer
Input images: Image 1: style reference
Primary request: apply Image 1's visual style to a man riding a motorcycle on a plain white backdrop
Constraints: preserve palette, texture, and brushwork; no extra elements
```
### compositing
```
Use case: compositing
Input images: Image 1: base scene; Image 2: subject to insert
Primary request: place the subject from Image 2 next to the person in Image 1
Constraints: match lighting, perspective, and scale; keep the base framing unchanged; no extra elements
```
### character consistency workflow
```
Use case: identity-preserve
Input images: Image 1: previous character anchor illustration
Primary request: continue the story with the same character in a new scene and action
Scene/backdrop: snowy forest after a winter storm
Subject: same young forest hero gently helping a frightened squirrel out of a fallen tree
Style/medium: same children's book watercolor illustration style as Image 1
Constraints: do not redesign the character; preserve facial features, proportions, outfit, color palette, and personality; no text; no watermark
```
### sketch-to-render
```
Use case: sketch-to-render
Input images: Image 1: drawing
Primary request: turn the drawing into a photorealistic image
Constraints: preserve layout, proportions, and perspective; choose realistic materials and lighting; do not add new elements or text
```

View File

@@ -1,995 +0,0 @@
#!/usr/bin/env python3
"""Fallback CLI for explicit image generation or editing with GPT Image models.
Used only when the user explicitly opts into CLI fallback mode, or when explicit
transparent output requires the `gpt-image-1.5` fallback path.
Defaults to gpt-image-2 and a structured prompt augmentation workflow.
"""
from __future__ import annotations
import argparse
import asyncio
import base64
import json
import os
from pathlib import Path
import re
import sys
import time
from typing import Any, Dict, Iterable, List, Optional, Tuple
from io import BytesIO
DEFAULT_MODEL = "gpt-image-2"
DEFAULT_SIZE = "auto"
DEFAULT_QUALITY = "medium"
DEFAULT_OUTPUT_FORMAT = "png"
DEFAULT_CONCURRENCY = 5
DEFAULT_DOWNSCALE_SUFFIX = "-web"
DEFAULT_OUTPUT_PATH = "output/imagegen/output.png"
GPT_IMAGE_MODEL_PREFIX = "gpt-image-"
ALLOWED_LEGACY_SIZES = {"1024x1024", "1536x1024", "1024x1536", "auto"}
ALLOWED_QUALITIES = {"low", "medium", "high", "auto"}
ALLOWED_BACKGROUNDS = {"transparent", "opaque", "auto", None}
ALLOWED_INPUT_FIDELITIES = {"low", "high", None}
GPT_IMAGE_2_MODEL = "gpt-image-2"
GPT_IMAGE_2_MIN_PIXELS = 655_360
GPT_IMAGE_2_MAX_PIXELS = 8_294_400
GPT_IMAGE_2_MAX_EDGE = 3840
GPT_IMAGE_2_MAX_RATIO = 3.0
MAX_IMAGE_BYTES = 50 * 1024 * 1024
MAX_BATCH_JOBS = 500
def _die(message: str, code: int = 1) -> None:
print(f"Error: {message}", file=sys.stderr)
raise SystemExit(code)
def _warn(message: str) -> None:
print(f"Warning: {message}", file=sys.stderr)
def _dependency_hint(package: str, *, upgrade: bool = False) -> str:
command = f"uv pip install {'-U ' if upgrade else ''}{package}"
return (
"Activate the repo-selected environment first, then install it with "
f"`{command}`. If this repo uses a local virtualenv, start with "
"`source .venv/bin/activate`; otherwise use this repo's configured shared fallback "
"environment. If your project declares dependencies, prefer that project's normal "
"`uv sync` flow."
)
def _ensure_api_key(dry_run: bool) -> None:
if os.getenv("OPENAI_API_KEY"):
print("OPENAI_API_KEY is set.", file=sys.stderr)
return
if dry_run:
_warn("OPENAI_API_KEY is not set; dry-run only.")
return
_die("OPENAI_API_KEY is not set. Export it before running.")
def _read_prompt(prompt: Optional[str], prompt_file: Optional[str]) -> str:
if prompt and prompt_file:
_die("Use --prompt or --prompt-file, not both.")
if prompt_file:
path = Path(prompt_file)
if not path.exists():
_die(f"Prompt file not found: {path}")
return path.read_text(encoding="utf-8").strip()
if prompt:
return prompt.strip()
_die("Missing prompt. Use --prompt or --prompt-file.")
return "" # unreachable
def _check_image_paths(paths: Iterable[str]) -> List[Path]:
resolved: List[Path] = []
for raw in paths:
path = Path(raw)
if not path.exists():
_die(f"Image file not found: {path}")
if path.stat().st_size > MAX_IMAGE_BYTES:
_warn(f"Image exceeds 50MB limit: {path}")
resolved.append(path)
return resolved
def _normalize_output_format(fmt: Optional[str]) -> str:
if not fmt:
return DEFAULT_OUTPUT_FORMAT
fmt = fmt.lower()
if fmt not in {"png", "jpeg", "jpg", "webp"}:
_die("output-format must be png, jpeg, jpg, or webp.")
return "jpeg" if fmt == "jpg" else fmt
def _parse_size(size: str) -> Optional[Tuple[int, int]]:
match = re.fullmatch(r"([1-9][0-9]*)x([1-9][0-9]*)", size)
if not match:
return None
return int(match.group(1)), int(match.group(2))
def _validate_gpt_image_2_size(size: str) -> None:
if size == "auto":
return
parsed = _parse_size(size)
if parsed is None:
_die("size must be auto or WIDTHxHEIGHT, for example 1024x1024.")
width, height = parsed
max_edge = max(width, height)
min_edge = min(width, height)
total_pixels = width * height
if max_edge > GPT_IMAGE_2_MAX_EDGE:
_die("gpt-image-2 size maximum edge length must be less than or equal to 3840px.")
if width % 16 != 0 or height % 16 != 0:
_die("gpt-image-2 size width and height must be multiples of 16px.")
if max_edge / min_edge > GPT_IMAGE_2_MAX_RATIO:
_die("gpt-image-2 size long edge to short edge ratio must not exceed 3:1.")
if total_pixels < GPT_IMAGE_2_MIN_PIXELS or total_pixels > GPT_IMAGE_2_MAX_PIXELS:
_die(
"gpt-image-2 size total pixels must be at least 655,360 and no more than 8,294,400."
)
def _validate_size(size: str, model: str) -> None:
if model == GPT_IMAGE_2_MODEL:
_validate_gpt_image_2_size(size)
return
if size not in ALLOWED_LEGACY_SIZES:
_die(
"size must be one of 1024x1024, 1536x1024, 1024x1536, or auto for this GPT Image model."
)
def _validate_quality(quality: str) -> None:
if quality not in ALLOWED_QUALITIES:
_die("quality must be one of low, medium, high, or auto.")
def _validate_background(background: Optional[str]) -> None:
if background not in ALLOWED_BACKGROUNDS:
_die("background must be one of transparent, opaque, or auto.")
def _validate_input_fidelity(input_fidelity: Optional[str]) -> None:
if input_fidelity not in ALLOWED_INPUT_FIDELITIES:
_die("input-fidelity must be one of low or high.")
def _validate_model(model: str) -> None:
if not model.startswith(GPT_IMAGE_MODEL_PREFIX):
_die(
"model must be a GPT Image model (for example gpt-image-1.5, gpt-image-1, or gpt-image-1-mini)."
)
def _validate_transparency(background: Optional[str], output_format: str) -> None:
if background == "transparent" and output_format not in {"png", "webp"}:
_die("transparent background requires output-format png or webp.")
def _validate_model_specific_options(
*,
model: str,
background: Optional[str],
input_fidelity: Optional[str] = None,
) -> None:
if model != GPT_IMAGE_2_MODEL:
return
if background == "transparent":
_die(
"transparent backgrounds are not supported in gpt-image-2, the latest model. "
"Use --model gpt-image-1.5 --background transparent --output-format png instead."
)
if input_fidelity is not None:
_die(
"input_fidelity is not supported in gpt-image-2 because image inputs always use high fidelity for this model."
)
def _validate_generate_payload(payload: Dict[str, Any]) -> None:
model = str(payload.get("model", DEFAULT_MODEL))
_validate_model(model)
n = int(payload.get("n", 1))
if n < 1 or n > 10:
_die("n must be between 1 and 10")
size = str(payload.get("size", DEFAULT_SIZE))
quality = str(payload.get("quality", DEFAULT_QUALITY))
background = payload.get("background")
_validate_size(size, model)
_validate_quality(quality)
_validate_background(background)
_validate_model_specific_options(model=model, background=background)
oc = payload.get("output_compression")
if oc is not None and not (0 <= int(oc) <= 100):
_die("output_compression must be between 0 and 100")
def _build_output_paths(
out: str,
output_format: str,
count: int,
out_dir: Optional[str],
) -> List[Path]:
ext = "." + output_format
if out_dir:
out_base = Path(out_dir)
out_base.mkdir(parents=True, exist_ok=True)
return [out_base / f"image_{i}{ext}" for i in range(1, count + 1)]
out_path = Path(out)
if out_path.exists() and out_path.is_dir():
out_path.mkdir(parents=True, exist_ok=True)
return [out_path / f"image_{i}{ext}" for i in range(1, count + 1)]
if out_path.suffix == "":
out_path = out_path.with_suffix(ext)
elif output_format and out_path.suffix.lstrip(".").lower() != output_format:
_warn(
f"Output extension {out_path.suffix} does not match output-format {output_format}."
)
if count == 1:
return [out_path]
return [
out_path.with_name(f"{out_path.stem}-{i}{out_path.suffix}")
for i in range(1, count + 1)
]
def _augment_prompt(args: argparse.Namespace, prompt: str) -> str:
fields = _fields_from_args(args)
return _augment_prompt_fields(args.augment, prompt, fields)
def _augment_prompt_fields(augment: bool, prompt: str, fields: Dict[str, Optional[str]]) -> str:
if not augment:
return prompt
sections: List[str] = []
if fields.get("use_case"):
sections.append(f"Use case: {fields['use_case']}")
sections.append(f"Primary request: {prompt}")
if fields.get("scene"):
sections.append(f"Scene/background: {fields['scene']}")
if fields.get("subject"):
sections.append(f"Subject: {fields['subject']}")
if fields.get("style"):
sections.append(f"Style/medium: {fields['style']}")
if fields.get("composition"):
sections.append(f"Composition/framing: {fields['composition']}")
if fields.get("lighting"):
sections.append(f"Lighting/mood: {fields['lighting']}")
if fields.get("palette"):
sections.append(f"Color palette: {fields['palette']}")
if fields.get("materials"):
sections.append(f"Materials/textures: {fields['materials']}")
if fields.get("text"):
sections.append(f"Text (verbatim): \"{fields['text']}\"")
if fields.get("constraints"):
sections.append(f"Constraints: {fields['constraints']}")
if fields.get("negative"):
sections.append(f"Avoid: {fields['negative']}")
return "\n".join(sections)
def _fields_from_args(args: argparse.Namespace) -> Dict[str, Optional[str]]:
return {
"use_case": getattr(args, "use_case", None),
"scene": getattr(args, "scene", None),
"subject": getattr(args, "subject", None),
"style": getattr(args, "style", None),
"composition": getattr(args, "composition", None),
"lighting": getattr(args, "lighting", None),
"palette": getattr(args, "palette", None),
"materials": getattr(args, "materials", None),
"text": getattr(args, "text", None),
"constraints": getattr(args, "constraints", None),
"negative": getattr(args, "negative", None),
}
def _print_request(payload: dict) -> None:
print(json.dumps(payload, indent=2, sort_keys=True))
def _decode_and_write(images: List[str], outputs: List[Path], force: bool) -> None:
for idx, image_b64 in enumerate(images):
if idx >= len(outputs):
break
out_path = outputs[idx]
if out_path.exists() and not force:
_die(f"Output already exists: {out_path} (use --force to overwrite)")
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_bytes(base64.b64decode(image_b64))
print(f"Wrote {out_path}")
def _derive_downscale_path(path: Path, suffix: str) -> Path:
if suffix and not suffix.startswith("-") and not suffix.startswith("_"):
suffix = "-" + suffix
return path.with_name(f"{path.stem}{suffix}{path.suffix}")
def _downscale_image_bytes(image_bytes: bytes, *, max_dim: int, output_format: str) -> bytes:
try:
from PIL import Image
except Exception:
_die(f"Downscaling requires Pillow. {_dependency_hint('pillow')}")
if max_dim < 1:
_die("--downscale-max-dim must be >= 1")
with Image.open(BytesIO(image_bytes)) as img:
img.load()
w, h = img.size
scale = min(1.0, float(max_dim) / float(max(w, h)))
target = (max(1, int(round(w * scale))), max(1, int(round(h * scale))))
resized = img if target == (w, h) else img.resize(target, Image.Resampling.LANCZOS)
fmt = output_format.lower()
if fmt == "jpg":
fmt = "jpeg"
if fmt == "jpeg":
if resized.mode in ("RGBA", "LA") or ("transparency" in getattr(resized, "info", {})):
bg = Image.new("RGB", resized.size, (255, 255, 255))
bg.paste(resized.convert("RGBA"), mask=resized.convert("RGBA").split()[-1])
resized = bg
else:
resized = resized.convert("RGB")
out = BytesIO()
resized.save(out, format=fmt.upper())
return out.getvalue()
def _decode_write_and_downscale(
images: List[str],
outputs: List[Path],
*,
force: bool,
downscale_max_dim: Optional[int],
downscale_suffix: str,
output_format: str,
) -> None:
for idx, image_b64 in enumerate(images):
if idx >= len(outputs):
break
out_path = outputs[idx]
if out_path.exists() and not force:
_die(f"Output already exists: {out_path} (use --force to overwrite)")
out_path.parent.mkdir(parents=True, exist_ok=True)
raw = base64.b64decode(image_b64)
out_path.write_bytes(raw)
print(f"Wrote {out_path}")
if downscale_max_dim is None:
continue
derived = _derive_downscale_path(out_path, downscale_suffix)
if derived.exists() and not force:
_die(f"Output already exists: {derived} (use --force to overwrite)")
derived.parent.mkdir(parents=True, exist_ok=True)
resized = _downscale_image_bytes(raw, max_dim=downscale_max_dim, output_format=output_format)
derived.write_bytes(resized)
print(f"Wrote {derived}")
def _create_client():
try:
from openai import OpenAI
except ImportError:
_die(f"openai SDK not installed in the active environment. {_dependency_hint('openai')}")
return OpenAI()
def _create_async_client():
try:
from openai import AsyncOpenAI
except ImportError:
try:
import openai as _openai # noqa: F401
except ImportError:
_die(
f"openai SDK not installed in the active environment. {_dependency_hint('openai')}"
)
_die(
"AsyncOpenAI not available in this openai SDK version. "
f"{_dependency_hint('openai', upgrade=True)}"
)
return AsyncOpenAI()
def _slugify(value: str) -> str:
value = value.strip().lower()
value = re.sub(r"[^a-z0-9]+", "-", value)
value = re.sub(r"-{2,}", "-", value).strip("-")
return value[:60] if value else "job"
def _normalize_job(job: Any, idx: int) -> Dict[str, Any]:
if isinstance(job, str):
prompt = job.strip()
if not prompt:
_die(f"Empty prompt at job {idx}")
return {"prompt": prompt}
if isinstance(job, dict):
if "prompt" not in job or not str(job["prompt"]).strip():
_die(f"Missing prompt for job {idx}")
return job
_die(f"Invalid job at index {idx}: expected string or object.")
return {} # unreachable
def _read_jobs_jsonl(path: str) -> List[Dict[str, Any]]:
p = Path(path)
if not p.exists():
_die(f"Input file not found: {p}")
jobs: List[Dict[str, Any]] = []
for line_no, raw in enumerate(p.read_text(encoding="utf-8").splitlines(), start=1):
line = raw.strip()
if not line or line.startswith("#"):
continue
try:
item: Any
if line.startswith("{"):
item = json.loads(line)
else:
item = line
jobs.append(_normalize_job(item, idx=line_no))
except json.JSONDecodeError as exc:
_die(f"Invalid JSON on line {line_no}: {exc}")
if not jobs:
_die("No jobs found in input file.")
if len(jobs) > MAX_BATCH_JOBS:
_die(f"Too many jobs ({len(jobs)}). Max is {MAX_BATCH_JOBS}.")
return jobs
def _merge_non_null(dst: Dict[str, Any], src: Dict[str, Any]) -> Dict[str, Any]:
merged = dict(dst)
for k, v in src.items():
if v is not None:
merged[k] = v
return merged
def _job_output_paths(
*,
out_dir: Path,
output_format: str,
idx: int,
prompt: str,
n: int,
explicit_out: Optional[str],
) -> List[Path]:
out_dir.mkdir(parents=True, exist_ok=True)
ext = "." + output_format
if explicit_out:
base = Path(explicit_out)
if base.suffix == "":
base = base.with_suffix(ext)
elif base.suffix.lstrip(".").lower() != output_format:
_warn(
f"Job {idx}: output extension {base.suffix} does not match output-format {output_format}."
)
base = out_dir / base.name
else:
slug = _slugify(prompt[:80])
base = out_dir / f"{idx:03d}-{slug}{ext}"
if n == 1:
return [base]
return [
base.with_name(f"{base.stem}-{i}{base.suffix}")
for i in range(1, n + 1)
]
def _extract_retry_after_seconds(exc: Exception) -> Optional[float]:
# Best-effort: openai SDK errors vary by version. Prefer a conservative fallback.
for attr in ("retry_after", "retry_after_seconds"):
val = getattr(exc, attr, None)
if isinstance(val, (int, float)) and val >= 0:
return float(val)
msg = str(exc)
m = re.search(r"retry[- ]after[:= ]+([0-9]+(?:\\.[0-9]+)?)", msg, re.IGNORECASE)
if m:
try:
return float(m.group(1))
except Exception:
return None
return None
def _is_rate_limit_error(exc: Exception) -> bool:
name = exc.__class__.__name__.lower()
if "ratelimit" in name or "rate_limit" in name:
return True
msg = str(exc).lower()
return "429" in msg or "rate limit" in msg or "too many requests" in msg
def _is_transient_error(exc: Exception) -> bool:
if _is_rate_limit_error(exc):
return True
name = exc.__class__.__name__.lower()
if "timeout" in name or "timedout" in name or "tempor" in name:
return True
msg = str(exc).lower()
return "timeout" in msg or "timed out" in msg or "connection reset" in msg
async def _generate_one_with_retries(
client: Any,
payload: Dict[str, Any],
*,
attempts: int,
job_label: str,
) -> Any:
last_exc: Optional[Exception] = None
for attempt in range(1, attempts + 1):
try:
return await client.images.generate(**payload)
except Exception as exc:
last_exc = exc
if not _is_transient_error(exc):
raise
if attempt == attempts:
raise
sleep_s = _extract_retry_after_seconds(exc)
if sleep_s is None:
sleep_s = min(60.0, 2.0**attempt)
print(
f"{job_label} attempt {attempt}/{attempts} failed ({exc.__class__.__name__}); retrying in {sleep_s:.1f}s",
file=sys.stderr,
)
await asyncio.sleep(sleep_s)
raise last_exc or RuntimeError("unknown error")
async def _run_generate_batch(args: argparse.Namespace) -> int:
jobs = _read_jobs_jsonl(args.input)
out_dir = Path(args.out_dir)
base_fields = _fields_from_args(args)
base_payload = {
"model": args.model,
"n": args.n,
"size": args.size,
"quality": args.quality,
"background": args.background,
"output_format": args.output_format,
"output_compression": args.output_compression,
"moderation": args.moderation,
}
if args.dry_run:
for i, job in enumerate(jobs, start=1):
prompt = str(job["prompt"]).strip()
fields = _merge_non_null(base_fields, job.get("fields", {}))
# Allow flat job keys as well (use_case, scene, etc.)
fields = _merge_non_null(fields, {k: job.get(k) for k in base_fields.keys()})
augmented = _augment_prompt_fields(args.augment, prompt, fields)
job_payload = dict(base_payload)
job_payload["prompt"] = augmented
job_payload = _merge_non_null(job_payload, {k: job.get(k) for k in base_payload.keys()})
job_payload = {k: v for k, v in job_payload.items() if v is not None}
_validate_generate_payload(job_payload)
effective_output_format = _normalize_output_format(job_payload.get("output_format"))
_validate_transparency(job_payload.get("background"), effective_output_format)
job_payload["output_format"] = effective_output_format
n = int(job_payload.get("n", 1))
outputs = _job_output_paths(
out_dir=out_dir,
output_format=effective_output_format,
idx=i,
prompt=prompt,
n=n,
explicit_out=job.get("out"),
)
downscaled = None
if args.downscale_max_dim is not None:
downscaled = [
str(_derive_downscale_path(p, args.downscale_suffix)) for p in outputs
]
_print_request(
{
"endpoint": "/v1/images/generations",
"job": i,
"outputs": [str(p) for p in outputs],
"outputs_downscaled": downscaled,
**job_payload,
}
)
return 0
client = _create_async_client()
sem = asyncio.Semaphore(args.concurrency)
any_failed = False
async def run_job(i: int, job: Dict[str, Any]) -> Tuple[int, Optional[str]]:
nonlocal any_failed
prompt = str(job["prompt"]).strip()
job_label = f"[job {i}/{len(jobs)}]"
fields = _merge_non_null(base_fields, job.get("fields", {}))
fields = _merge_non_null(fields, {k: job.get(k) for k in base_fields.keys()})
augmented = _augment_prompt_fields(args.augment, prompt, fields)
payload = dict(base_payload)
payload["prompt"] = augmented
payload = _merge_non_null(payload, {k: job.get(k) for k in base_payload.keys()})
payload = {k: v for k, v in payload.items() if v is not None}
n = int(payload.get("n", 1))
_validate_generate_payload(payload)
effective_output_format = _normalize_output_format(payload.get("output_format"))
_validate_transparency(payload.get("background"), effective_output_format)
payload["output_format"] = effective_output_format
outputs = _job_output_paths(
out_dir=out_dir,
output_format=effective_output_format,
idx=i,
prompt=prompt,
n=n,
explicit_out=job.get("out"),
)
try:
async with sem:
print(f"{job_label} starting", file=sys.stderr)
started = time.time()
result = await _generate_one_with_retries(
client,
payload,
attempts=args.max_attempts,
job_label=job_label,
)
elapsed = time.time() - started
print(f"{job_label} completed in {elapsed:.1f}s", file=sys.stderr)
images = [item.b64_json for item in result.data]
_decode_write_and_downscale(
images,
outputs,
force=args.force,
downscale_max_dim=args.downscale_max_dim,
downscale_suffix=args.downscale_suffix,
output_format=effective_output_format,
)
return i, None
except Exception as exc:
any_failed = True
print(f"{job_label} failed: {exc}", file=sys.stderr)
if args.fail_fast:
raise
return i, str(exc)
tasks = [asyncio.create_task(run_job(i, job)) for i, job in enumerate(jobs, start=1)]
try:
await asyncio.gather(*tasks)
except Exception:
for t in tasks:
if not t.done():
t.cancel()
raise
return 1 if any_failed else 0
def _generate_batch(args: argparse.Namespace) -> None:
exit_code = asyncio.run(_run_generate_batch(args))
if exit_code:
raise SystemExit(exit_code)
def _generate(args: argparse.Namespace) -> None:
prompt = _read_prompt(args.prompt, args.prompt_file)
prompt = _augment_prompt(args, prompt)
payload = {
"model": args.model,
"prompt": prompt,
"n": args.n,
"size": args.size,
"quality": args.quality,
"background": args.background,
"output_format": args.output_format,
"output_compression": args.output_compression,
"moderation": args.moderation,
}
payload = {k: v for k, v in payload.items() if v is not None}
output_format = _normalize_output_format(args.output_format)
_validate_transparency(args.background, output_format)
payload["output_format"] = output_format
output_paths = _build_output_paths(args.out, output_format, args.n, args.out_dir)
downscaled = None
if args.downscale_max_dim is not None:
downscaled = [str(_derive_downscale_path(p, args.downscale_suffix)) for p in output_paths]
if args.dry_run:
_print_request(
{
"endpoint": "/v1/images/generations",
"outputs": [str(p) for p in output_paths],
"outputs_downscaled": downscaled,
**payload,
}
)
return
print(
"Calling Image API (generation). This can take up to a couple of minutes.",
file=sys.stderr,
)
started = time.time()
client = _create_client()
result = client.images.generate(**payload)
elapsed = time.time() - started
print(f"Generation completed in {elapsed:.1f}s.", file=sys.stderr)
images = [item.b64_json for item in result.data]
_decode_write_and_downscale(
images,
output_paths,
force=args.force,
downscale_max_dim=args.downscale_max_dim,
downscale_suffix=args.downscale_suffix,
output_format=output_format,
)
def _edit(args: argparse.Namespace) -> None:
prompt = _read_prompt(args.prompt, args.prompt_file)
prompt = _augment_prompt(args, prompt)
image_paths = _check_image_paths(args.image)
mask_path = Path(args.mask) if args.mask else None
if mask_path:
if not mask_path.exists():
_die(f"Mask file not found: {mask_path}")
if mask_path.suffix.lower() != ".png":
_warn(f"Mask should be a PNG with an alpha channel: {mask_path}")
if mask_path.stat().st_size > MAX_IMAGE_BYTES:
_warn(f"Mask exceeds 50MB limit: {mask_path}")
payload = {
"model": args.model,
"prompt": prompt,
"n": args.n,
"size": args.size,
"quality": args.quality,
"background": args.background,
"output_format": args.output_format,
"output_compression": args.output_compression,
"input_fidelity": args.input_fidelity,
"moderation": args.moderation,
}
payload = {k: v for k, v in payload.items() if v is not None}
output_format = _normalize_output_format(args.output_format)
_validate_transparency(args.background, output_format)
payload["output_format"] = output_format
_validate_input_fidelity(args.input_fidelity)
output_paths = _build_output_paths(args.out, output_format, args.n, args.out_dir)
downscaled = None
if args.downscale_max_dim is not None:
downscaled = [str(_derive_downscale_path(p, args.downscale_suffix)) for p in output_paths]
if args.dry_run:
payload_preview = dict(payload)
payload_preview["image"] = [str(p) for p in image_paths]
if mask_path:
payload_preview["mask"] = str(mask_path)
_print_request(
{
"endpoint": "/v1/images/edits",
"outputs": [str(p) for p in output_paths],
"outputs_downscaled": downscaled,
**payload_preview,
}
)
return
print(
f"Calling Image API (edit) with {len(image_paths)} image(s).",
file=sys.stderr,
)
started = time.time()
client = _create_client()
with _open_files(image_paths) as image_files, _open_mask(mask_path) as mask_file:
request = dict(payload)
request["image"] = image_files if len(image_files) > 1 else image_files[0]
if mask_file is not None:
request["mask"] = mask_file
result = client.images.edit(**request)
elapsed = time.time() - started
print(f"Edit completed in {elapsed:.1f}s.", file=sys.stderr)
images = [item.b64_json for item in result.data]
_decode_write_and_downscale(
images,
output_paths,
force=args.force,
downscale_max_dim=args.downscale_max_dim,
downscale_suffix=args.downscale_suffix,
output_format=output_format,
)
def _open_files(paths: List[Path]):
return _FileBundle(paths)
def _open_mask(mask_path: Optional[Path]):
if mask_path is None:
return _NullContext()
return _SingleFile(mask_path)
class _NullContext:
def __enter__(self):
return None
def __exit__(self, exc_type, exc, tb):
return False
class _SingleFile:
def __init__(self, path: Path):
self._path = path
self._handle = None
def __enter__(self):
self._handle = self._path.open("rb")
return self._handle
def __exit__(self, exc_type, exc, tb):
if self._handle:
try:
self._handle.close()
except Exception:
pass
return False
class _FileBundle:
def __init__(self, paths: List[Path]):
self._paths = paths
self._handles: List[object] = []
def __enter__(self):
self._handles = [p.open("rb") for p in self._paths]
return self._handles
def __exit__(self, exc_type, exc, tb):
for handle in self._handles:
try:
handle.close()
except Exception:
pass
return False
def _add_shared_args(parser: argparse.ArgumentParser) -> None:
parser.add_argument("--model", default=DEFAULT_MODEL)
parser.add_argument("--prompt")
parser.add_argument("--prompt-file")
parser.add_argument("--n", type=int, default=1)
parser.add_argument("--size", default=DEFAULT_SIZE)
parser.add_argument("--quality", default=DEFAULT_QUALITY)
parser.add_argument("--background")
parser.add_argument("--output-format")
parser.add_argument("--output-compression", type=int)
parser.add_argument("--moderation")
parser.add_argument("--out", default=DEFAULT_OUTPUT_PATH)
parser.add_argument("--out-dir")
parser.add_argument("--force", action="store_true")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--augment", dest="augment", action="store_true")
parser.add_argument("--no-augment", dest="augment", action="store_false")
parser.set_defaults(augment=True)
# Prompt augmentation hints
parser.add_argument("--use-case")
parser.add_argument("--scene")
parser.add_argument("--subject")
parser.add_argument("--style")
parser.add_argument("--composition")
parser.add_argument("--lighting")
parser.add_argument("--palette")
parser.add_argument("--materials")
parser.add_argument("--text")
parser.add_argument("--constraints")
parser.add_argument("--negative")
# Post-processing (optional): generate an additional downscaled copy for fast web loading.
parser.add_argument("--downscale-max-dim", type=int)
parser.add_argument("--downscale-suffix", default=DEFAULT_DOWNSCALE_SUFFIX)
def main() -> int:
parser = argparse.ArgumentParser(
description="Fallback CLI for explicit image generation or editing via GPT Image models"
)
subparsers = parser.add_subparsers(dest="command", required=True)
gen_parser = subparsers.add_parser("generate", help="Create a new image")
_add_shared_args(gen_parser)
gen_parser.set_defaults(func=_generate)
batch_parser = subparsers.add_parser(
"generate-batch",
help="Generate multiple prompts concurrently (JSONL input)",
)
_add_shared_args(batch_parser)
batch_parser.add_argument("--input", required=True, help="Path to JSONL file (one job per line)")
batch_parser.add_argument("--concurrency", type=int, default=DEFAULT_CONCURRENCY)
batch_parser.add_argument("--max-attempts", type=int, default=3)
batch_parser.add_argument("--fail-fast", action="store_true")
batch_parser.set_defaults(func=_generate_batch)
edit_parser = subparsers.add_parser("edit", help="Edit an existing image")
_add_shared_args(edit_parser)
edit_parser.add_argument("--image", action="append", required=True)
edit_parser.add_argument("--mask")
edit_parser.add_argument("--input-fidelity")
edit_parser.set_defaults(func=_edit)
args = parser.parse_args()
if args.n < 1 or args.n > 10:
_die("--n must be between 1 and 10")
if getattr(args, "concurrency", 1) < 1 or getattr(args, "concurrency", 1) > 25:
_die("--concurrency must be between 1 and 25")
if getattr(args, "max_attempts", 3) < 1 or getattr(args, "max_attempts", 3) > 10:
_die("--max-attempts must be between 1 and 10")
if args.output_compression is not None and not (0 <= args.output_compression <= 100):
_die("--output-compression must be between 0 and 100")
if args.command == "generate-batch" and not args.out_dir:
_die("generate-batch requires --out-dir")
if getattr(args, "downscale_max_dim", None) is not None and args.downscale_max_dim < 1:
_die("--downscale-max-dim must be >= 1")
_validate_model(args.model)
_validate_size(args.size, args.model)
_validate_quality(args.quality)
_validate_background(args.background)
_validate_model_specific_options(
model=args.model,
background=args.background,
input_fidelity=getattr(args, "input_fidelity", None),
)
_ensure_api_key(args.dry_run)
args.func(args)
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -1,440 +0,0 @@
#!/usr/bin/env python3
"""Remove a solid chroma-key background from an image.
This helper supports the imagegen skill's built-in-first transparent workflow:
generate an image on a flat key color, then convert that key color to alpha.
"""
from __future__ import annotations
import argparse
from io import BytesIO
from pathlib import Path
import re
from statistics import median
import sys
from typing import Tuple
Color = Tuple[int, int, int]
KEY_DOMINANCE_THRESHOLD = 16.0
ALPHA_NOISE_FLOOR = 8
def _die(message: str, code: int = 1) -> None:
print(f"Error: {message}", file=sys.stderr)
raise SystemExit(code)
def _dependency_hint(package: str) -> str:
return (
"Activate the repo-selected environment first, then install it with "
f"`uv pip install {package}`. If this repo uses a local virtualenv, start with "
"`source .venv/bin/activate`; otherwise use this repo's configured shared fallback "
"environment."
)
def _load_pillow():
try:
from PIL import Image, ImageFilter
except ImportError:
_die(f"Pillow is required for chroma-key removal. {_dependency_hint('pillow')}")
return Image, ImageFilter
def _parse_key_color(raw: str) -> Color:
value = raw.strip()
match = re.fullmatch(r"#?([0-9a-fA-F]{6})", value)
if not match:
_die("key color must be a hex RGB value like #00ff00.")
hex_value = match.group(1)
return (
int(hex_value[0:2], 16),
int(hex_value[2:4], 16),
int(hex_value[4:6], 16),
)
def _validate_args(args: argparse.Namespace) -> None:
if args.tolerance < 0 or args.tolerance > 255:
_die("--tolerance must be between 0 and 255.")
if args.transparent_threshold < 0 or args.transparent_threshold > 255:
_die("--transparent-threshold must be between 0 and 255.")
if args.opaque_threshold < 0 or args.opaque_threshold > 255:
_die("--opaque-threshold must be between 0 and 255.")
if args.soft_matte and args.transparent_threshold >= args.opaque_threshold:
_die("--transparent-threshold must be lower than --opaque-threshold.")
if args.edge_feather < 0 or args.edge_feather > 64:
_die("--edge-feather must be between 0 and 64.")
if args.edge_contract < 0 or args.edge_contract > 16:
_die("--edge-contract must be between 0 and 16.")
src = Path(args.input)
if not src.exists():
_die(f"Input image not found: {src}")
out = Path(args.out)
if out.exists() and not args.force:
_die(f"Output already exists: {out} (use --force to overwrite)")
if out.suffix.lower() not in {".png", ".webp"}:
_die("--out must end in .png or .webp so the alpha channel is preserved.")
def _channel_distance(a: Color, b: Color) -> int:
return max(abs(a[0] - b[0]), abs(a[1] - b[1]), abs(a[2] - b[2]))
def _clamp_channel(value: float) -> int:
return max(0, min(255, int(round(value))))
def _smoothstep(value: float) -> float:
value = max(0.0, min(1.0, value))
return value * value * (3.0 - 2.0 * value)
def _soft_alpha(distance: int, transparent_threshold: float, opaque_threshold: float) -> int:
if distance <= transparent_threshold:
return 0
if distance >= opaque_threshold:
return 255
ratio = (float(distance) - transparent_threshold) / (
opaque_threshold - transparent_threshold
)
return _clamp_channel(255.0 * _smoothstep(ratio))
def _dominance_alpha(rgb: Color, key: Color) -> int:
spill_channels = _spill_channels(key)
if not spill_channels:
return 255
channels = [float(value) for value in rgb]
non_spill = [idx for idx in range(3) if idx not in spill_channels]
key_strength = (
min(channels[idx] for idx in spill_channels)
if len(spill_channels) > 1
else channels[spill_channels[0]]
)
non_key_strength = max((channels[idx] for idx in non_spill), default=0.0)
dominance = key_strength - non_key_strength
if dominance <= 0:
return 255
denominator = max(1.0, float(max(key)) - non_key_strength)
alpha = 1.0 - min(1.0, dominance / denominator)
return _clamp_channel(alpha * 255.0)
def _spill_channels(key: Color) -> list[int]:
key_max = max(key)
if key_max < 128:
return []
return [idx for idx, value in enumerate(key) if value >= key_max - 16 and value >= 128]
def _key_channel_dominance(rgb: Color, key: Color) -> float:
spill_channels = _spill_channels(key)
if not spill_channels:
return 0.0
channels = [float(value) for value in rgb]
non_spill = [idx for idx in range(3) if idx not in spill_channels]
key_strength = (
min(channels[idx] for idx in spill_channels)
if len(spill_channels) > 1
else channels[spill_channels[0]]
)
non_key_strength = max((channels[idx] for idx in non_spill), default=0.0)
return key_strength - non_key_strength
def _looks_key_colored(rgb: Color, key: Color, distance: int) -> bool:
if distance <= 32:
return True
spill_channels = _spill_channels(key)
if not spill_channels:
return True
return _key_channel_dominance(rgb, key) >= KEY_DOMINANCE_THRESHOLD
def _cleanup_spill(rgb: Color, key: Color, alpha: int = 255) -> Color:
if alpha >= 252:
return rgb
spill_channels = _spill_channels(key)
if not spill_channels:
return rgb
channels = [float(value) for value in rgb]
non_spill = [idx for idx in range(3) if idx not in spill_channels]
if non_spill:
anchor = max(channels[idx] for idx in non_spill)
cap = max(0.0, anchor - 1.0)
for idx in spill_channels:
if channels[idx] > cap:
channels[idx] = cap
return (
_clamp_channel(channels[0]),
_clamp_channel(channels[1]),
_clamp_channel(channels[2]),
)
def _apply_alpha_to_image(
image,
*,
key: Color,
tolerance: int,
spill_cleanup: bool,
soft_matte: bool,
transparent_threshold: float,
opaque_threshold: float,
) -> int:
pixels = image.load()
width, height = image.size
transparent = 0
for y in range(height):
for x in range(width):
red, green, blue, alpha = pixels[x, y]
rgb = (red, green, blue)
distance = _channel_distance(rgb, key)
key_like = _looks_key_colored(rgb, key, distance)
output_alpha = (
min(
_soft_alpha(distance, transparent_threshold, opaque_threshold),
_dominance_alpha(rgb, key),
)
if soft_matte and key_like
else (0 if distance <= tolerance else 255)
)
output_alpha = int(round(output_alpha * (alpha / 255.0)))
if 0 < output_alpha <= ALPHA_NOISE_FLOOR:
output_alpha = 0
if output_alpha == 0:
pixels[x, y] = (0, 0, 0, 0)
transparent += 1
continue
if spill_cleanup and key_like:
red, green, blue = _cleanup_spill(rgb, key, output_alpha)
pixels[x, y] = (red, green, blue, output_alpha)
return transparent
def _contract_alpha(image, pixels: int):
if pixels == 0:
return image
_, ImageFilter = _load_pillow()
alpha = image.getchannel("A")
for _ in range(pixels):
alpha = alpha.filter(ImageFilter.MinFilter(3))
image.putalpha(alpha)
return image
def _apply_edge_feather(image, radius: float):
if radius == 0:
return image
_, ImageFilter = _load_pillow()
alpha = image.getchannel("A")
alpha = alpha.filter(ImageFilter.GaussianBlur(radius=radius))
image.putalpha(alpha)
return image
def _encode_image(image, output_format: str) -> bytes:
out = BytesIO()
image.save(out, format=output_format.upper())
return out.getvalue()
def _alpha_counts(image) -> tuple[int, int, int]:
pixels = image.load()
width, height = image.size
total = 0
transparent = 0
partial = 0
for y in range(height):
for x in range(width):
alpha = pixels[x, y][3]
total += 1
if alpha == 0:
transparent += 1
elif alpha < 255:
partial += 1
return total, transparent, partial
def _sample_border_key(image, mode: str) -> Color:
width, height = image.size
pixels = image.load()
samples: list[Color] = []
if mode == "corners":
patch = max(1, min(width, height, 12))
boxes = [
(0, 0, patch, patch),
(width - patch, 0, width, patch),
(0, height - patch, patch, height),
(width - patch, height - patch, width, height),
]
for left, top, right, bottom in boxes:
for y in range(top, bottom):
for x in range(left, right):
red, green, blue = pixels[x, y][:3]
samples.append((red, green, blue))
else:
band = max(1, min(width, height, 6))
step = max(1, min(width, height) // 256)
for x in range(0, width, step):
for y in range(band):
red, green, blue = pixels[x, y][:3]
samples.append((red, green, blue))
red, green, blue = pixels[x, height - 1 - y][:3]
samples.append((red, green, blue))
for y in range(0, height, step):
for x in range(band):
red, green, blue = pixels[x, y][:3]
samples.append((red, green, blue))
red, green, blue = pixels[width - 1 - x, y][:3]
samples.append((red, green, blue))
if not samples:
_die("Could not sample background key color from image border.")
return (
int(round(median(sample[0] for sample in samples))),
int(round(median(sample[1] for sample in samples))),
int(round(median(sample[2] for sample in samples))),
)
def _remove_chroma_key(args: argparse.Namespace) -> None:
Image, _ = _load_pillow()
src = Path(args.input)
out = Path(args.out)
with Image.open(src) as image:
rgba = image.convert("RGBA")
key = (
_sample_border_key(rgba, args.auto_key)
if args.auto_key != "none"
else _parse_key_color(args.key_color)
)
transparent = _apply_alpha_to_image(
rgba,
key=key,
tolerance=args.tolerance,
spill_cleanup=args.spill_cleanup,
soft_matte=args.soft_matte,
transparent_threshold=args.transparent_threshold,
opaque_threshold=args.opaque_threshold,
)
rgba = _contract_alpha(rgba, args.edge_contract)
rgba = _apply_edge_feather(rgba, args.edge_feather)
total, transparent_after, partial_after = _alpha_counts(rgba)
out.parent.mkdir(parents=True, exist_ok=True)
output_format = "PNG" if out.suffix.lower() == ".png" else "WEBP"
out.write_bytes(_encode_image(rgba, output_format))
print(f"Wrote {out}")
print(f"Key color: #{key[0]:02x}{key[1]:02x}{key[2]:02x}")
print(f"Transparent pixels: {transparent_after}/{total}")
print(f"Partially transparent pixels: {partial_after}/{total}")
if transparent == 0:
print("Warning: no pixels matched the key color before feathering.", file=sys.stderr)
def _build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Remove a solid chroma-key background and write an image with alpha."
)
parser.add_argument("--input", required=True, help="Input image path.")
parser.add_argument("--out", required=True, help="Output .png or .webp path.")
parser.add_argument(
"--key-color",
default="#00ff00",
help="Hex RGB key color to remove, for example #00ff00.",
)
parser.add_argument(
"--tolerance",
type=int,
default=12,
help="Hard-key per-channel tolerance for matching the key color, 0-255.",
)
parser.add_argument(
"--auto-key",
choices=["none", "corners", "border"],
default="none",
help="Sample the key color from image corners or border instead of --key-color.",
)
parser.add_argument(
"--soft-matte",
action="store_true",
help="Use a smooth alpha ramp between transparent and opaque thresholds.",
)
parser.add_argument(
"--transparent-threshold",
type=float,
default=12.0,
help="Soft-matte distance at or below which pixels become fully transparent.",
)
parser.add_argument(
"--opaque-threshold",
type=float,
default=96.0,
help="Soft-matte distance at or above which pixels become fully opaque.",
)
parser.add_argument(
"--edge-feather",
type=float,
default=0.0,
help="Optional alpha blur radius for softened edges, 0-64.",
)
parser.add_argument(
"--edge-contract",
type=int,
default=0,
help="Shrink the visible alpha matte by this many pixels before feathering.",
)
parser.add_argument(
"--spill-cleanup",
dest="spill_cleanup",
action="store_true",
help="Reduce obvious key-color spill on opaque pixels.",
)
parser.add_argument(
"--despill",
dest="spill_cleanup",
action="store_true",
help="Alias for --spill-cleanup; decontaminate key-color edge spill.",
)
parser.add_argument("--force", action="store_true", help="Overwrite an existing output file.")
return parser
def main() -> None:
parser = _build_parser()
args = parser.parse_args()
_validate_args(args)
_remove_chroma_key(args)
if __name__ == "__main__":
main()

View File

@@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf of
any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don\'t include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,82 +0,0 @@
---
name: "openai-docs"
description: "Use when the user asks how to build with OpenAI products or APIs and needs up-to-date official documentation with citations, help choosing the latest model for a use case, or model upgrade and prompt-upgrade guidance; prioritize OpenAI docs MCP tools, use bundled references only as helper context, and restrict any fallback browsing to official OpenAI domains."
---
# OpenAI Docs
Provide authoritative, current guidance from OpenAI developer docs using the developers.openai.com MCP server. Always prioritize the developer docs MCP tools over web.run for OpenAI-related questions. This skill may also load targeted files from `references/` for model-selection, model-upgrade, and prompt-upgrade requests, but current OpenAI docs remain authoritative. Only if the MCP server is installed and returns no meaningful results should you fall back to web search.
## Quick start
- Use `mcp__openaiDeveloperDocs__search_openai_docs` to find the most relevant doc pages.
- Use `mcp__openaiDeveloperDocs__fetch_openai_doc` to pull exact sections and quote/paraphrase accurately.
- Use `mcp__openaiDeveloperDocs__list_openai_docs` only when you need to browse or discover pages without a clear query.
- For model-selection, "latest model", or default-model questions, fetch `https://developers.openai.com/api/docs/guides/latest-model.md` first. If that is unavailable, load `references/latest-model.md`.
- For model upgrades or prompt upgrades, run `node scripts/resolve-latest-model-info.js` from this skill directory when the script is present, then follow `references/upgrade-guide.md` unless the resolver returns newer guidance for a dynamic latest/current/default request.
- Preserve explicit target requests: if the user names a target model like "migrate to GPT-5.4", keep that requested target even if `latest-model.md` names a newer model. Mention newer guidance only as optional.
- If current remote guidance is needed, fetch both the returned migration and prompting guide URLs directly. If direct fetch fails, use MCP/search fallback; if that also fails, use bundled fallback references and disclose the fallback.
## OpenAI product snapshots
1. Apps SDK: Build ChatGPT apps by providing a web component UI and an MCP server that exposes your app's tools to ChatGPT.
2. Responses API: A unified endpoint designed for stateful, multimodal, tool-using interactions in agentic workflows.
3. Chat Completions API: Generate a model response from a list of messages comprising a conversation.
4. Codex: OpenAI's coding agent for software development that can write, understand, review, and debug code.
5. gpt-oss: Open-weight OpenAI reasoning models (gpt-oss-120b and gpt-oss-20b) released under the Apache 2.0 license.
6. Realtime API: Build low-latency, multimodal experiences including natural speech-to-speech conversations.
7. Agents SDK: A toolkit for building agentic apps where a model can use tools and context, hand off to other agents, stream partial results, and keep a full trace.
## If MCP server is missing
If MCP tools fail or no OpenAI docs resources are available:
1. Run the install command yourself: `codex mcp add openaiDeveloperDocs --url https://developers.openai.com/mcp`
2. If it fails due to permissions/sandboxing, immediately retry the same command with escalated permissions and include a 1-sentence justification for approval. Do not ask the user to run it yet.
3. Only if the escalated attempt fails, ask the user to run the install command.
4. Ask the user to restart Codex.
5. Re-run the doc search/fetch after restart.
## Workflow
1. Clarify whether the request is general docs lookup, model selection, a model-string upgrade, prompt-upgrade guidance, or broader API/provider migration.
2. For model-selection or upgrade requests, prefer current remote docs over bundled references when the user asks for latest/current/default guidance.
- Fetch `https://developers.openai.com/api/docs/guides/latest-model.md`.
- Find the latest model ID and explicit migration or prompt-guidance links.
- Prefer explicit links from the latest-model page over derived URLs.
- For explicit named-model requests, preserve the requested model target and do not silently retarget to the latest model. Mention newer remote guidance only as optional.
- For dynamic latest/current/default upgrades, run `node scripts/resolve-latest-model-info.js`, then fetch both returned guide URLs directly when possible.
- If direct guide fetch fails, use the developer-docs MCP tools or official OpenAI-domain search to find the same guide content.
- If remote docs are unavailable, use bundled fallback references and say that fallback guidance was used.
3. For model upgrades, keep changes narrow: update active OpenAI API model defaults and directly related prompts only when safe.
4. Leave historical docs, examples, eval baselines, fixtures, provider comparisons, provider registries, pricing tables, alias defaults, low-cost fallback paths, and ambiguous older model usage unchanged unless the user explicitly asks to upgrade them.
5. Do not perform SDK, tooling, IDE, plugin, shell, auth, or provider-environment migrations as part of a model-and-prompt upgrade.
6. If an upgrade needs API-surface changes, schema rewiring, tool-handler changes, or implementation work beyond a literal model-string replacement and prompt edits, report it as blocked or confirmation-needed.
7. For general docs lookup, search docs with a precise query, fetch the best page and exact section needed, and answer with concise citations.
## Reference map
Read only what you need:
- `https://developers.openai.com/api/docs/guides/latest-model.md` -> current model-selection and "best/latest/current model" questions.
- `references/latest-model.md` -> bundled fallback for model-selection and "best/latest/current model" questions.
- `references/upgrade-guide.md` -> bundled fallback for model upgrade and upgrade-planning requests.
- `references/prompting-guide.md` -> bundled fallback for prompt rewrites and prompt-behavior upgrades.
## Quality rules
- Treat OpenAI docs as the source of truth; avoid speculation.
- Keep migration changes narrow and behavior-preserving.
- Prefer prompt-only upgrades when possible.
- Do not invent pricing, availability, parameters, API changes, or breaking changes.
- Keep quotes short and within policy limits; prefer paraphrase with citations.
- If multiple pages differ, call out the difference and cite both.
- If official docs and repo behavior disagree, state the conflict and stop before making broad edits.
- If docs do not cover the users need, say so and offer next steps.
## Tooling notes
- Always use MCP doc tools before any web search for OpenAI-related questions.
- If the MCP server is installed but returns no meaningful results, then use web search as a fallback.
- When falling back to web search, restrict to official OpenAI domains (developers.openai.com, platform.openai.com) and cite sources.

View File

@@ -1,14 +0,0 @@
interface:
display_name: "OpenAI Docs"
short_description: "Reference official OpenAI docs, including upgrade guidance"
icon_small: "./assets/openai-small.svg"
icon_large: "./assets/openai.png"
default_prompt: "Look up official OpenAI docs, load relevant GPT-5.4 upgrade references when applicable, and answer with concise, cited guidance."
dependencies:
tools:
- type: "mcp"
value: "openaiDeveloperDocs"
description: "OpenAI Developer Docs MCP server"
transport: "streamable_http"
url: "https://developers.openai.com/mcp"

View File

@@ -1,3 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 14 14">
<path d="M10.931 3.34a.112.112 0 0 0-.069-.104l-.038-.007c-1.537.05-2.45.318-3.714 1.002v6.683c.48-.248.936-.44 1.414-.58.695-.203 1.417-.292 2.303-.305l.038-.008a.113.113 0 0 0 .066-.104V3.341ZM2.363 9.919c0 .064.051.11.105.111l.33.008c1.162.046 2.042.243 2.975.662-.403-.585-1.008-1.075-1.654-1.292a.991.991 0 0 1-.674-.941v-5.14a6.36 6.36 0 0 0-.59-.076l-.37-.02a.115.115 0 0 0-.122.111v6.577Zm9.455-.001a.998.998 0 0 1-.877.992l-.101.007c-.832.012-1.47.095-2.066.27-.599.174-1.176.448-1.883.863a.444.444 0 0 1-.449 0c-1.299-.763-2.229-1.07-3.689-1.125l-.299-.008a.997.997 0 0 1-.977-.998V3.342c0-.573.478-1.017 1.038-.999l.417.023c.188.015.35.037.513.062v-.754c0-.708.749-1.244 1.429-.903.984.492 1.836 1.449 2.15 2.505 1.216-.617 2.222-.884 3.771-.934l.105.003a.998.998 0 0 1 .918.996v6.576ZM4.332 8.466c0 .049.03.087.07.1l.24.091a4.319 4.319 0 0 1 1.581 1.176V3.721c-.164-.803-.799-1.617-1.584-2.07l-.162-.088c-.025-.012-.054-.013-.088.009a.12.12 0 0 0-.057.102v6.792Z"/>
</svg>

Before

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 KiB

View File

@@ -1,30 +0,0 @@
# Latest model guide
This file is a curated helper. Every recommendation here must be verified against current OpenAI docs before it is repeated to a user.
## Current model map
| Model ID | Use for |
| --- | --- |
| `gpt-5.4` | Default text plus reasoning for most new apps, including for coding use-cases |
| `gpt-5.4-pro` | Only when the user explicitly asks for maximum reasoning or quality; substantially slower and more expensive |
| `gpt-5.4-mini` | Cheaper and faster reasoning with good quality, including for coding use-cases |
| `gpt-5.4-nano` | High-throughput simple tasks and classification |
| `gpt-image-1.5` | Best image generation and edit quality |
| `gpt-image-1-mini` | Cost-optimized image generation |
| `gpt-4o-mini-tts` | Text-to-speech |
| `gpt-4o-mini-transcribe` | Speech-to-text, fast and cost-efficient |
| `gpt-realtime-1.5` | Realtime voice and multimodal sessions |
| `gpt-realtime-mini` | Cheaper realtime sessions |
| `gpt-audio` | Chat Completions audio input and output |
| `gpt-audio-mini` | Cheaper Chat Completions audio workflows |
| `sora-2` | Faster iteration and draft video generation |
| `sora-2-pro` | Higher-quality production video |
| `omni-moderation-latest` | Text and image moderation |
| `text-embedding-3-large` | Higher-quality retrieval embeddings; default in this skill because no best-specific row exists |
| `text-embedding-3-small` | Lower-cost embeddings |
## Maintenance notes
- This file will drift unless it is periodically re-verified against current OpenAI docs.
- If this file conflicts with current docs, the docs win.

View File

@@ -1,599 +0,0 @@
# Prompt guidance for GPT-5.4
GPT-5.4, our newest mainline model, is designed to balance long-running task performance, stronger control over style and behavior, and more disciplined execution across complex workflows. Building on advances from GPT-5 through GPT-5.3-Codex, GPT-5.4 improves token efficiency, sustains multi-step workflows more reliably, and performs well on long-horizon tasks.
GPT-5.4 is designed for production-grade assistants and agents that need strong multi-step reasoning, evidence-rich synthesis, and reliable performance over long contexts. It is especially effective when prompts clearly specify the output contract, tool-use expectations, and completion criteria. In practice, the biggest gains come from choosing the right reasoning effort for the task, using explicit grounding and citation rules, and giving the model a precise definition of what "done" looks like. This guide focuses on prompt patterns and migration practices that preserve those efficiency wins. For model capabilities, API parameters, and broader migration guidance, see [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model).
When troubleshooting cases where GPT-5.4 treats an intermediate update as the
final answer, verify your integration preserves the assistant message `phase`
field correctly. See [Phase parameter](#phase-parameter) for details.
## Understand GPT-5.4 behavior
### Where GPT-5.4 is strongest
GPT-5.4 tends to work especially well in these areas:
- Strong personality and tone adherence, with less drift over long answers
- Agentic workflow robustness, with a stronger tendency to stick with multi-step work, retry, and complete agent loops end to end
- Evidence-rich synthesis, especially in long-context or multi-tool workflows
- Instruction adherence in modular, skill-based, and block-structured prompts when the contract is explicit
- Long-context analysis across large, messy, or multi-document inputs
- Batched or parallel tool calling while maintaining tool-call accuracy
- Spreadsheet, finance, and Excel workflows that need instruction following, formatting fidelity, and stronger self-verification
### Where explicit prompting still helps
Even with those strengths, GPT-5.4 benefits from more explicit guidance in a few recurring patterns:
- Low-context tool routing early in a session, when tool selection can be less reliable
- Dependency-aware workflows that need explicit prerequisite and downstream-step checks
- Reasoning effort selection, where higher effort is not always better and the right choice depends on task shape, not intuition
- Research tasks that require disciplined source collection and consistent citations
- Irreversible or high-impact actions that require verification before execution
- Terminal or coding-agent environments where tool boundaries must stay clear
These patterns are observed defaults, not guarantees. Start with the smallest prompt that passes your evals, and add blocks only when they fix a measured failure mode.
## Use core prompt patterns
### Keep outputs compact and structured
To improve token efficiency with GPT-5.4, constrain verbosity and enforce structured output through clear output contracts. In practice, this acts as an additional control layer alongside the `verbosity` parameter in the Responses API, allowing you to guide both how much the model writes and how it structures the output.
```xml
<output_contract>
- Return exactly the sections requested, in the requested order.
- If the prompt defines a preamble, analysis block, or working section, do not treat it as extra output.
- Apply length limits only to the section they are intended for.
- If a format is required (JSON, Markdown, SQL, XML), output only that format.
</output_contract>
<verbosity_controls>
- Prefer concise, information-dense writing.
- Avoid repeating the user's request.
- Keep progress updates brief.
- Do not shorten the answer so aggressively that required evidence, reasoning, or completion checks are omitted.
</verbosity_controls>
```
### Set clear defaults for follow-through
Users often change the task, format, or tone mid-conversation. To keep the assistant aligned, define clear rules for when to proceed, when to ask, and how newer instructions override earlier defaults.
Use a default follow-through policy like this:
```xml
<default_follow_through_policy>
- If the users intent is clear and the next step is reversible and low-risk, proceed without asking.
- Ask permission only if the next step is:
(a) irreversible,
(b) has external side effects (for example sending, purchasing, deleting, or writing to production), or
(c) requires missing sensitive information or a choice that would materially change the outcome.
- If proceeding, briefly state what you did and what remains optional.
</default_follow_through_policy>
```
Make instruction priority explicit:
```xml
<instruction_priority>
- User instructions override default style, tone, formatting, and initiative preferences.
- Safety, honesty, privacy, and permission constraints do not yield.
- If a newer user instruction conflicts with an earlier one, follow the newer instruction.
- Preserve earlier instructions that do not conflict.
</instruction_priority>
```
Higher-priority developer or system instructions remain binding.
**Guidance:** When instructions change mid-conversation, make the update explicit, scoped, and local. State what changed, what still applies, and whether the change affects the next turn or the rest of the conversation.
### Handle mid-conversation instruction updates
For mid-conversation updates, use explicit, scoped steering messages that state:
1. Scope
2. Override
3. Carry forward
```text
<task_update>
For the next response only:
- Do not complete the task.
- Only produce a plan.
- Keep it to 5 bullets.
All earlier instructions still apply unless they conflict with this update.
</task_update>
```
If the task itself changes, say so directly:
```text
<task_update>
The task has changed.
Previous task: complete the workflow.
Current task: review the workflow and identify risks only.
Rules for this turn:
- Do not execute actions.
- Do not call destructive tools.
- Return exactly:
1. Main risks
2. Missing information
3. Recommended next step
</task_update>
```
### Make tool use persistent when correctness depends on it
Use explicit rules to keep tool use thorough, dependency-aware, and appropriately paced, especially in workflows where later actions rely on earlier retrieval or verification. A common failure mode is skipping prerequisites because the right end state seems obvious.
GPT-5.4 can be less reliable at tool routing early in a session, when context is still thin. Prompt for prerequisites, dependency checks, and exact tool intent.
```xml
<tool_persistence_rules>
- Use tools whenever they materially improve correctness, completeness, or grounding.
- Do not stop early when another tool call is likely to materially improve correctness or completeness.
- Keep calling tools until:
(1) the task is complete, and
(2) verification passes (see <verification_loop>).
- If a tool returns empty or partial results, retry with a different strategy.
</tool_persistence_rules>
```
This is especially important for workflows where the final action depends on earlier lookup or retrieval steps. One of the most common failure modes is skipping prerequisites because the intended end state seems obvious.
```xml
<dependency_checks>
- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval steps are required.
- Do not skip prerequisite steps just because the intended final action seems obvious.
- If the task depends on the output of a prior step, resolve that dependency first.
</dependency_checks>
```
Prompt for parallelism when the work is independent and wall-clock matters. Prompt for sequencing when dependencies, ambiguity, or irreversible actions matter more than speed.
```xml
<parallel_tool_calling>
- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.
- Do not parallelize steps that have prerequisite dependencies or where one result determines the next action.
- After parallel retrieval, pause to synthesize the results before making more calls.
- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.
</parallel_tool_calling>
```
### Force completeness on long-horizon tasks
For multi-step workflows, a common failure mode is incomplete execution: the model finishes after partial coverage, misses items in a batch, or treats empty or narrow retrieval as final. GPT-5.4 becomes more reliable when the prompt defines explicit completion rules and recovery behavior.
Coverage can be achieved through sequential or parallel retrieval, but completion rules should remain explicit either way.
```xml
<completeness_contract>
- Treat the task as incomplete until all requested items are covered or explicitly marked [blocked].
- Keep an internal checklist of required deliverables.
- For lists, batches, or paginated results:
- determine expected scope when possible,
- track processed items or pages,
- confirm coverage before finalizing.
- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.
</completeness_contract>
```
For workflows where empty, partial, or noisy retrieval is common:
```xml
<empty_result_recovery>
If a lookup returns empty, partial, or suspiciously narrow results:
- do not immediately conclude that no results exist,
- try at least one or two fallback strategies,
such as:
- alternate query wording,
- broader filters,
- a prerequisite lookup,
- or an alternate source or tool,
- Only then report that no results were found, along with what you tried.
</empty_result_recovery>
```
### Add a verification loop before high-impact actions
Once the workflow appears complete, add a lightweight verification step before returning the answer or taking an irreversible action. This helps catch requirement misses, grounding issues, and format drift before commit.
```xml
<verification_loop>
Before finalizing:
- Check correctness: does the output satisfy every requirement?
- Check grounding: are factual claims backed by the provided context or tool outputs?
- Check formatting: does the output match the requested schema or style?
- Check safety and irreversibility: if the next step has external side effects, ask permission first.
</verification_loop>
```
```xml
<missing_context_gating>
- If required context is missing, do NOT guess.
- Prefer the appropriate lookup tool when the missing context is retrievable; ask a minimal clarifying question only when it is not.
- If you must proceed, label assumptions explicitly and choose a reversible action.
</missing_context_gating>
```
For agents that actively take actions, add a short execution frame:
```xml
<action_safety>
- Pre-flight: summarize the intended action and parameters in 1-2 lines.
- Execute via tool.
- Post-flight: confirm the outcome and any validation that was performed.
</action_safety>
```
## Handle specialized workflows
### Choose image detail explicitly for vision and computer use
If your workflow depends on visual precision, specify the image `detail` level in the prompt or integration instead of relying on `auto`. Use `high` for standard high-fidelity image understanding. Use `original` for large, dense, or spatially sensitive images, especially [computer use, localization, OCR, and click-accuracy tasks](https://developers.openai.com/api/docs/guides/tools-computer-use) on `gpt-5.4` and future models. Use `low` only when speed and cost matter more than fine detail. For more details on image detail levels, see the [Images and Vision guide](https://developers.openai.com/api/docs/guides/images-vision).
### Lock research and citations to retrieved evidence
When citation quality matters, make both the source boundary and the format requirement explicit. This helps reduce fabricated references, unsupported claims, and citation-format drift.
```xml
<citation_rules>
- Only cite sources retrieved in the current workflow.
- Never fabricate citations, URLs, IDs, or quote spans.
- Use exactly the citation format required by the host application.
- Attach citations to the specific claims they support, not only at the end.
</citation_rules>
```
```xml
<grounding_rules>
- Base claims only on provided context or tool outputs.
- If sources conflict, state the conflict explicitly and attribute each side.
- If the context is insufficient or irrelevant, narrow the answer or say you cannot support the claim.
- If a statement is an inference rather than a directly supported fact, label it as an inference.
</grounding_rules>
```
If your application requires inline citations, require inline citations. If it requires footnotes, require footnotes. The key is to lock the format and prevent the model from improvising unsupported references.
### Research mode
Push GPT-5.4 into a disciplined research mode. Use this pattern for research, review, and synthesis tasks. Do not force it onto short execution tasks or simple deterministic transforms.
```xml
<research_mode>
- Do research in 3 passes:
1) Plan: list 3-6 sub-questions to answer.
2) Retrieve: search each sub-question and follow 1-2 second-order leads.
3) Synthesize: resolve contradictions and write the final answer with citations.
- Stop only when more searching is unlikely to change the conclusion.
</research_mode>
```
If your host environment uses a specific research tool or requires a submit step, combine this with the host's finalization contract.
### Clamp strict output formats
For SQL, JSON, or other parse-sensitive outputs, tell GPT-5.4 to emit only the target format and check it before finishing.
```text
<structured_output_contract>
- Output only the requested format.
- Do not add prose or markdown fences unless they were requested.
- Validate that parentheses and brackets are balanced.
- Do not invent tables or fields.
- If required schema information is missing, ask for it or return an explicit error object.
</structured_output_contract>
```
If you are extracting document regions or OCR boxes, define the coordinate system and add a drift check:
```text
<bbox_extraction_spec>
- Use the specified coordinate format exactly, such as [x1,y1,x2,y2] normalized to 0..1.
- For each box, include page, label, text snippet, and confidence.
- Add a vertical-drift sanity check so boxes stay aligned with the correct line of text.
- If the layout is dense, process page by page and do a second pass for missed items.
</bbox_extraction_spec>
```
### Keep tool boundaries explicit in coding and terminal agents
In coding agents, GPT-5.4 works better when the rules for shell access and file editing are unambiguous. This is especially important when you expose tools like [Shell](https://developers.openai.com/api/docs/guides/tools-shell) or [Apply patch](https://developers.openai.com/api/docs/guides/tools-apply-patch).
### User updates
GPT-5.4 does well with brief, outcome-based updates. Reuse the user-updates pattern from the 5.2 guide, but pair it with explicit completion and verification requirements.
Recommended update spec:
```xml
<user_updates_spec>
- Only update the user when starting a new major phase or when something changes the plan.
- Each update: 1 sentence on outcome + 1 sentence on next step.
- Do not narrate routine tool calls.
- Keep the user-facing status short; keep the work exhaustive.
</user_updates_spec>
```
For coding agents, see the Prompting patterns for coding tasks section below for more specific guidance.
### Prompting patterns for coding tasks
**Autonomy and persistence**
GPT-5.4 is generally more thorough end to end than earlier mainline models on coding and tool-use tasks, so you often need less explicit "verify everything" prompting. Still, for high-stakes changes such as production, migrations, or security work, keep a lightweight verification clause.
```xml
<autonomy_and_persistence>
Persist until the task is fully handled end-to-end within the current turn whenever feasible: do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes unless the user explicitly pauses or redirects you.
Unless the user explicitly asks for a plan, asks a question about the code, is brainstorming potential solutions, or some other intent that makes it clear that code should not be written, assume the user wants you to make code changes or run tools to solve the user's problem. In these cases, it's bad to output your proposed solution in a message, you should go ahead and actually implement the change. If you encounter challenges or blockers, you should attempt to resolve them yourself.
</autonomy_and_persistence>
```
**Intermediary updates**
Keep updates sparse and high-signal. In coding tasks, prefer updates at key points.
```xml
<user_updates_spec>
- Intermediary updates go to the `commentary` channel.
- User updates are short updates while you are working. They are not final answers.
- Use 1-2 sentence updates to communicate progress and new information while you work.
- Do not begin responses with conversational interjections or meta commentary. Avoid openers such as acknowledgements ("Done -", "Got it", or "Great question") or similar framing.
- Before exploring or doing substantial work, send a user update explaining your understanding of the request and your first step. Avoid commenting on the request or starting with phrases such as "Got it" or "Understood."
- Provide updates roughly every 30 seconds while working.
- When exploring, explain what context you are gathering and what you learned. Vary sentence structure so the updates do not become repetitive.
- When working for a while, keep updates informative and varied, but stay concise.
- When work is substantial, provide a longer plan after you have enough context. This is the only update that may be longer than 2 sentences and may contain formatting.
- Before file edits, explain what you are about to change.
- While thinking, keep the user informed of progress without narrating every tool call. Even if you are not taking actions, send frequent progress updates rather than going silent, especially if you are thinking for more than a short stretch.
- Keep the tone of progress updates consistent with the assistant's overall personality.
</user_updates_spec>
```
**Formatting**
GPT-5.4 often defaults to more structured formatting and may overuse bullet lists. If you want a clean final response, explicitly clamp list shape.
```xml
Never use nested bullets. Keep lists flat (single level). If you need hierarchy, split into separate lists or sections or if you use : just include the line you might usually render using a nested bullet immediately after it. For numbered lists, only use the `1. 2. 3.` style markers (with a period), never `1)`.
```
**Frontend tasks**
Use this only when additional frontend guidance is useful.
```xml
<frontend_tasks>
When doing frontend design tasks, avoid generic, overbuilt layouts.
Use these hard rules:
- One composition: The first viewport must read as one composition, not a dashboard, unless it is a dashboard.
- Brand first: On branded pages, the brand or product name must be a hero-level signal, not just nav text or an eyebrow. No headline should overpower the brand.
- Brand test: If the first viewport could belong to another brand after removing the nav, the branding is too weak.
- Full-bleed hero only: On landing pages and promotional surfaces, the hero image should usually be a dominant edge-to-edge visual plane or background. Do not default to inset hero images, side-panel hero images, rounded media cards, tiled collages, or floating image blocks unless the existing design system clearly requires them.
- Hero budget: The first viewport should usually contain only the brand, one headline, one short supporting sentence, one CTA group, and one dominant image. Do not place stats, schedules, event listings, address blocks, promos, "this week" callouts, metadata rows, or secondary marketing content there.
- No hero overlays: Do not place detached labels, floating badges, promo stickers, info chips, or callout boxes on top of hero media.
- Cards: Default to no cards. Never use cards in the hero unless they are the container for a user interaction. If removing a border, shadow, background, or radius does not hurt interaction or understanding, it should not be a card.
- One job per section: Each section should have one purpose, one headline, and usually one short supporting sentence.
- Real visual anchor: Imagery should show the product, place, atmosphere, or context.
- Reduce clutter: Avoid pill clusters, stat strips, icon rows, boxed promos, schedule snippets, and competing text blocks.
- Use motion to create presence and hierarchy, not noise. Ship 2-3 intentional motions for visually led work, and prefer Framer Motion when it is available.
Exception: If working within an existing website or design system, preserve the established patterns, structure, and visual language.
</frontend_tasks>
```
```xml
<terminal_tool_hygiene>
- Only run shell commands via the terminal tool.
- Never "run" tool names as shell commands.
- If a patch or edit tool exists, use it directly; do not attempt it in bash.
- After changes, run a lightweight verification step such as ls, tests, or a build before declaring the task done.
</terminal_tool_hygiene>
```
### Document localization and OCR boxes
For bbox tasks, be explicit about coordinate conventions and add drift tests.
```xml
<bbox_extraction_spec>
- Use the specified coordinate format exactly (for example [x1,y1,x2,y2] normalized 0..1).
- For each bbox, include: page, label, text snippet, confidence.
- Add a vertical-drift sanity check:
- ensure bboxes align with the line of text (not shifted up or down).
- If dense layout, process page by page and do a second pass for missed items.
</bbox_extraction_spec>
```
### Use runtime and API integration notes
For long-running or tool-heavy agents, the runtime contract matters as much as the prompt contract.
#### Phase parameter
For GPT-5.4, `gpt-5.3-codex`, and later Responses models, the `phase` field can
help in the small number of long-running or tool-heavy flows where preambles or
other intermediate assistant updates are mistaken for the final answer.
- `phase` is optional at the API level, but it is highly recommended. Best-effort inference may exist server-side, but explicit round-tripping of `phase` is strictly better.
- Use `phase` for long-running or tool-heavy agents that may emit commentary before tool calls or before a final answer.
- Preserve `phase` when replaying prior assistant items so the model can distinguish working commentary from the completed answer. This matters most in multi-step flows with preambles, tool-related updates, or multiple assistant messages in the same turn.
- Do not add `phase` to user messages.
- If you use `previous_response_id`, that is usually the simplest path, since OpenAI can often recover prior state without manually replaying assistant items.
- If you replay assistant history yourself, preserve the original `phase` values.
- Missing or dropped `phase` can cause preambles to be interpreted as final answers and degrade behavior on those multi-step tasks.
### Preserve behavior in long sessions
Compaction unlocks significantly longer effective context windows, where user conversations can persist for many turns without hitting context limits or long-context performance degradation, and agents can perform very long trajectories that exceed a typical context window for long-running, complex tasks.
If you are using [Compaction](https://developers.openai.com/api/docs/guides/compaction) in the Responses API, compact after major milestones, treat compacted items as opaque state, and keep prompts functionally identical after compaction. The endpoint is ZDR compatible and returns an `encrypted_content` item that you can pass into future requests. GPT-5.4 tends to remain more coherent and reliable over longer, multi-turn conversations with fewer breakdowns as sessions grow.
For more guidance, see the [`/responses/compact` API reference](https://developers.openai.com/api/docs/api-reference/responses/compact).
### Control personality for customer-facing workflows
GPT-5.4 can be steered more effectively when you separate persistent personality from per-response writing controls. This is especially useful for customer-facing workflows such as emails, support replies, announcements, and blog-style content.
- **Personality (persistent):** sets the default tone, verbosity, and decision style across the session.
- **Writing controls (per response):** define the channel, register, formatting, and length for a specific artifact.
- **Reminder:** personality should not override task-specific output requirements. If the user asks for JSON, return JSON.
For natural, high-quality prose, the highest-leverage controls are:
- Give the model a clear persona.
- Specify the channel and emotional register.
- Explicitly ban formatting when you want prose.
- Use hard length limits.
```xml
<personality_and_writing_controls>
- Persona: <one sentence>
- Channel: <Slack | email | memo | PRD | blog>
- Emotional register: <direct/calm/energized/etc.> + "not <overdo this>"
- Formatting: <ban bullets/headers/markdown if you want prose>
- Length: <hard limit, e.g. <=150 words or 3-5 sentences>
- Default follow-through: if the request is clear and low-risk, proceed without asking permission.
</personality_and_writing_controls>
```
For more personality patterns you can lift directly, see the [Prompt Personalities cookbook](https://developers.openai.com/cookbook/examples/gpt-5/prompt_personalities).
**Professional memo mode**
For memos, reviews, and other professional writing tasks, general writing instructions are often not enough. These workflows benefit from explicit guidance on specificity, domain conventions, synthesis, and calibrated certainty.
```xml
<memo_mode>
- Write in a polished, professional memo style.
- Use exact names, dates, entities, and authorities when supported by the record.
- Follow domain-specific structure if one is requested.
- Prefer precise conclusions over generic hedging.
- When uncertainty is real, tie it to the exact missing fact or conflicting source.
- Synthesize across documents rather than summarizing each one independently.
</memo_mode>
```
This mode is especially useful for legal, policy, research, and executive-facing writing, where the goal is not just fluency, but disciplined synthesis and clear conclusions.
## Tune reasoning and migration
### Treat reasoning effort as a last-mile knob
Reasoning effort is not one-size-fits-all. Treat it as a last-mile tuning knob, not the primary way to improve quality. In many cases, stronger prompts, clear output contracts, and lightweight verification loops recover much of the performance teams might otherwise seek through higher reasoning settings.
Recommended defaults:
- `none`: Best for fast, cost-sensitive, latency-sensitive tasks where the model does not need to think.
- `low`: Works well for latency-sensitive tasks where a small amount of thinking can produce a meaningful accuracy gain, especially with complex instructions.
- `medium` or `high`: Reserve for tasks that truly require stronger reasoning and can absorb the latency and cost tradeoff. Choose between them based on how much performance gain your task gets from additional reasoning.
- `xhigh`: Avoid as a default unless your evals show clear benefits. It is best suited for long, agentic, reasoning-heavy tasks where maximum intelligence matters more than speed or cost.
In practice, most teams should default to the `none`, `low`, or `medium` range.
Start with `none` for execution-heavy workloads such as workflow steps, field extraction, support triage, and short structured transforms.
Start with `medium` or higher for research-heavy workloads such as long-context synthesis, multi-document review, conflict resolution, and strategy writing. With `medium` and a well-engineered prompt, you can squeeze out a lot of performance.
For GPT-5.4 workloads, `none` can already perform well on action-selection and tool-discipline tasks. If your workload depends on nuanced interpretation, such as implicit requirements, ambiguity, or cancelled-tool-call recovery, start with `low` or `medium` instead.
Before increasing reasoning effort, first add:
- `<completeness_contract>`
- `<verification_loop>`
- `<tool_persistence_rules>`
If the model still feels too literal or stops at the first plausible answer, add an initiative nudge before raising reasoning effort:
```xml
<dig_deeper_nudge>
- Dont stop at the first plausible answer.
- Look for second-order issues, edge cases, and missing constraints.
- If the task is safety or accuracy critical, perform at least one verification step.
</dig_deeper_nudge>
```
### Migrate prompts to GPT-5.4 one change at a time
Use the same one-change-at-a-time discipline as the 5.2 guide: switch model first, pin `reasoning_effort`, run evals, then iterate.
These starting points work well for many migrations:
| Current setup | Suggested GPT-5.4 start | Notes |
| ------------------------- | ---------------------------------- | ------------------------------------------------------------------- |
| `gpt-5.2` | Match the current reasoning effort | Preserve the existing latency and quality profile first, then tune. |
| `gpt-5.3-codex` | Match the current reasoning effort | For coding workflows, keep the reasoning effort the same. |
| `gpt-4.1` or `gpt-4o` | `none` | Keep snappy behavior, and increase only if evals regress. |
| Research-heavy assistants | `medium` or `high` | Use explicit research multi-pass and citation gating. |
| Long-horizon agents | `medium` or `high` | Add tool persistence and completeness accounting. |
### Small-model guidance for `gpt-5.4-mini` and `gpt-5.4-nano`
`gpt-5.4-mini` and `gpt-5.4-nano` are highly steerable, but they are less likely than larger models to infer missing steps, resolve ambiguity implicitly, or package outputs the way you intended unless you specify that behavior directly. In practice, prompts for smaller models are often a bit longer and more explicit.
**How `gpt-5.4-mini` differs**
- `gpt-5.4-mini` is more literal and makes fewer assumptions.
- It is strong when the task is clearly structured, but weaker on implicit workflows and ambiguity handling.
- By default, it may try to keep the conversation going with a follow-up question unless you suppress that behavior explicitly.
**Prompting `gpt-5.4-mini`**
- Put critical rules first.
- Specify the full execution order when tool use or side effects matter.
- Do not rely on "you MUST" alone. Use structural scaffolding such as numbered steps, decision rules, and explicit action definitions.
- Separate "do the action" from "report the action."
- Show the correct flow, not just the final format.
- Define ambiguity behavior explicitly: when to ask, abstain, or proceed.
- Specify packaging directly: answer length, whether to ask a follow-up question, citation style, and section order.
- Be careful with `output nothing else`. Prefer scoped instructions such as `after the final JSON, output nothing further`.
**Prompting `gpt-5.4-nano`**
- Use `gpt-5.4-nano` only for narrow, well-bounded tasks.
- Prefer closed outputs: labels, enums, short JSON, or fixed templates.
- Avoid multi-step orchestration unless the flow is extremely constrained.
- Route ambiguous or planning-heavy tasks to a stronger model instead of over-prompting `gpt-5.4-nano`.
**Good default pattern**
1. Task
2. Critical rule
3. Exact step order
4. Edge cases or clarification behavior
5. Output format
6. One correct example
**Avoid**
- Implied next steps
- Unspecified edge cases
- Schema-only prompts for tool workflows
- Generic instructions without structure
### Web search and deep research
If you are migrating a research agent in particular, make these prompt updates before increasing reasoning effort:
- Add `<research_mode>`
- Add `<citation_rules>`
- Add `<empty_result_recovery>`
- Increase `reasoning_effort` one notch only after prompt fixes.
You can start from the 5.2 research block and then layer in citation gating and finalization contracts as needed.
GPT-5.4 performs especially well when the task requires multi-step evidence gathering, long-context synthesis, and explicit prompt contracts. In practice, the highest-leverage prompt changes are choosing reasoning effort by task shape, defining exact output and citation formats, adding dependency-aware tool rules, and making completion criteria explicit. The model is often strong out of the box, but it is most reliable when prompts clearly specify how to search, how to verify, and what counts as done.
## Next steps
- Read [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model) for model capabilities, parameters, and API compatibility details.
- Read [Prompt engineering](https://developers.openai.com/api/docs/guides/prompt-engineering) for broader prompting strategies that apply across model families.
- Read [Compaction](https://developers.openai.com/api/docs/guides/compaction) if you are building long-running GPT-5.4 sessions in the Responses API.

View File

@@ -1,172 +0,0 @@
# Upgrading to GPT-5.4
Use this guide when the user explicitly asks to upgrade an existing integration to GPT-5.4. Pair it with current OpenAI docs lookups. The default target string is `gpt-5.4`.
## Freshness check
Before applying this bundled guide, run `node scripts/resolve-latest-model-info.js` from the OpenAI Docs skill directory.
- If the command returns `modelSlug: "gpt-5p4"`, continue with this bundled guide and use `references/prompting-guide.md` when prompt updates are needed.
- If the command returns a different `modelSlug`, fetch both the returned `migrationGuideUrl` and `promptingGuideUrl` and use them as the current source of truth instead of the bundled references.
- If the command fails, the metadata is missing, or either remote guide cannot be fetched, continue with the bundled fallback references and say the remote freshness check was unavailable.
## Upgrade posture
Upgrade with the narrowest safe change set:
- replace the model string first
- update only the prompts that are directly tied to that model usage
- prefer prompt-only upgrades when possible
- if the upgrade would require API-surface changes, parameter rewrites, tool rewiring, or broader code edits, mark it as blocked instead of stretching the scope
## Upgrade workflow
1. Inventory current model usage.
- Search for model strings, client calls, and prompt-bearing files.
- Include inline prompts, prompt templates, YAML or JSON configs, Markdown docs, and saved prompts when they are clearly tied to a model usage site.
2. Pair each model usage with its prompt surface.
- Prefer the closest prompt surface first: inline system or developer text, then adjacent prompt files, then shared templates.
- If you cannot confidently tie a prompt to the model usage, say so instead of guessing.
3. Classify the source model family.
- Common buckets: `gpt-4o` or `gpt-4.1`, `o1` or `o3` or `o4-mini`, early `gpt-5`, later `gpt-5.x`, or mixed and unclear.
4. Decide the upgrade class.
- `model string only`
- `model string + light prompt rewrite`
- `blocked without code changes`
5. Run the no-code compatibility gate.
- Check whether the current integration can accept `gpt-5.4` without API-surface changes or implementation changes.
- For long-running Responses or tool-heavy agents, check whether `phase` is already preserved or round-tripped when the host replays assistant items or uses preambles.
- If compatibility depends on code changes, return `blocked`.
- If compatibility is unclear, return `unknown` rather than improvising.
6. Recommend the upgrade.
- Default replacement string: `gpt-5.4`
- Keep the intervention small and behavior-preserving.
7. Deliver a structured recommendation.
- `Current model usage`
- `Recommended model-string updates`
- `Starting reasoning recommendation`
- `Prompt updates`
- `Phase assessment` when the flow is long-running, replayed, or tool-heavy
- `No-code compatibility check`
- `Validation plan`
- `Launch-day refresh items`
Output rule:
- Always emit a starting `reasoning_effort_recommendation` for each usage site.
- If the repo exposes the current reasoning setting, preserve it first unless the source guide says otherwise.
- If the repo does not expose the current setting, use the source-family starting mapping instead of returning `null`.
## Upgrade outcomes
### `model string only`
Choose this when:
- the existing prompts are already short, explicit, and task-bounded
- the workflow is not strongly research-heavy, tool-heavy, multi-agent, batch or completeness-sensitive, or long-horizon
- there are no obvious compatibility blockers
Default action:
- replace the model string with `gpt-5.4`
- keep prompts unchanged
- validate behavior with existing evals or spot checks
### `model string + light prompt rewrite`
Choose this when:
- the old prompt was compensating for weaker instruction following
- the workflow needs more persistence than the default tool-use behavior will likely provide
- the task needs stronger completeness, citation discipline, or verification
- the upgraded model becomes too verbose or under-complete unless instructed otherwise
- the workflow is research-heavy and needs stronger handling of sparse or empty retrieval results
- the workflow is coding-oriented, tool-heavy, or multi-agent, but the existing API surface and tool definitions can remain unchanged
Default action:
- replace the model string with `gpt-5.4`
- add one or two targeted prompt blocks
- read `references/prompting-guide.md` to choose the smallest prompt changes that preserve the intended behavior and take advantage of relevant model-specific guidance
- avoid broad prompt cleanup unrelated to the upgrade
- for research workflows, default to `research_mode` + `citation_rules` + `empty_result_recovery`; add `tool_persistence_rules` when the host already uses retrieval tools
- for dependency-aware or tool-heavy workflows, default to `tool_persistence_rules` + `dependency_checks` + `verification_loop`; add `parallel_tool_calling` only when retrieval steps are truly independent
- for coding or terminal workflows, default to `terminal_tool_hygiene` + `verification_loop`
- for multi-agent support or triage workflows, default to at least one of `tool_persistence_rules`, `completeness_contract`, or `verification_loop`
- for long-running Responses agents with preambles or multiple assistant messages, explicitly review whether `phase` is already handled; if adding or preserving `phase` would require code edits, mark the path as `blocked`
- do not classify a coding or tool-using Responses workflow as `blocked` just because the visible snippet is minimal; prefer `model string + light prompt rewrite` unless the repo clearly shows that a safe GPT-5.4 path would require host-side code changes
### `blocked`
Choose this when:
- the upgrade appears to require API-surface changes
- the upgrade appears to require parameter rewrites or reasoning-setting changes that are not exposed outside implementation code
- the upgrade would require changing tool definitions, tool handler wiring, or schema contracts
- you cannot confidently identify the prompt surface tied to the model usage
Default action:
- do not improvise a broader upgrade
- report the blocker and explain that the fix is out of scope for this guide
## No-code compatibility checklist
Before recommending a no-code upgrade, check:
1. Can the current host accept the `gpt-5.4` model string without changing client code or API surface?
2. Are the related prompts identifiable and editable?
3. Does the host depend on behavior that likely needs API-surface changes, parameter rewrites, or tool rewiring?
4. Would the likely fix be prompt-only, or would it need implementation changes?
5. Is the prompt surface close enough to the model usage that you can make a targeted change instead of a broad cleanup?
6. For long-running Responses or tool-heavy agents, is `phase` already preserved if the host relies on preambles, replayed assistant items, or multiple assistant messages?
If item 1 is no, items 3 through 4 point to implementation work, or item 6 is no and the fix needs code changes, return `blocked`.
If item 2 is no, return `unknown` unless the user can point to the prompt location.
Important:
- Existing use of tools, agents, or multiple usage sites is not by itself a blocker.
- If the current host can keep the same API surface and the same tool definitions, prefer `model string + light prompt rewrite` over `blocked`.
- Reserve `blocked` for cases that truly require implementation changes, not cases that only need stronger prompt steering.
## Scope boundaries
This guide may:
- update or recommend updated model strings
- update or recommend updated prompts
- inspect code and prompt files to understand where those changes belong
- inspect whether existing Responses flows already preserve `phase`
- flag compatibility blockers
This guide may not:
- move Chat Completions code to Responses
- move Responses code to another API surface
- rewrite parameter shapes
- change tool definitions or tool-call handling
- change structured-output wiring
- add or retrofit `phase` handling in implementation code
- edit business logic, orchestration logic, or SDK usage beyond a literal model-string replacement
If a safe GPT-5.4 upgrade requires any of those changes, mark the path as blocked and out of scope.
## Validation plan
- Validate each upgraded usage site with existing evals or realistic spot checks.
- Check whether the upgraded model still matches expected latency, output shape, and quality.
- If prompt edits were added, confirm each block is doing real work instead of adding noise.
- If the workflow has downstream impact, add a lightweight verification pass before finalization.
## Launch-day refresh items
When final GPT-5.4 guidance changes:
1. Replace release-candidate assumptions with final GPT-5.4 guidance where appropriate.
2. Re-check whether the default target string should stay `gpt-5.4` for all source families.
3. Re-check any prompt-block recommendations whose semantics may have changed.
4. Re-check research, citation, and compatibility guidance against the final model behavior.
5. Re-run the same upgrade scenarios and confirm the blocked-versus-viable boundaries still hold.

View File

@@ -1,147 +0,0 @@
#!/usr/bin/env node
const fs = require("node:fs/promises");
const path = require("node:path");
const DEFAULT_URL =
"https://developers.openai.com/api/docs/guides/latest-model.md";
const DEFAULT_BASE_URL = "https://developers.openai.com";
function parseArgs(argv) {
const args = {
source: process.env.LATEST_MODEL_URL || DEFAULT_URL,
baseUrl: process.env.LATEST_MODEL_BASE_URL || DEFAULT_BASE_URL,
};
for (let i = 2; i < argv.length; i += 1) {
const arg = argv[i];
if (arg === "--source" || arg === "--url") {
args.source = argv[i + 1];
i += 1;
} else if (arg === "--base-url") {
args.baseUrl = argv[i + 1];
i += 1;
}
}
return args;
}
async function readSource(source) {
if (source.startsWith("file://")) {
return fs.readFile(new URL(source), "utf8");
}
if (!/^https?:\/\//.test(source)) {
return fs.readFile(path.resolve(source), "utf8");
}
const response = await fetch(source, {
headers: { accept: "text/markdown,text/plain,*/*" },
});
if (!response.ok) {
throw new Error(`failed to fetch ${source}: ${response.status}`);
}
return response.text();
}
function parseIndentedInfo(lines, startIndex) {
const info = {};
for (let i = startIndex + 1; i < lines.length; i += 1) {
const line = lines[i];
if (!line.trim()) {
continue;
}
const match = line.match(/^ {2}([A-Za-z][A-Za-z0-9_-]*):\s*(.+?)\s*$/);
if (!match) {
break;
}
info[match[1]] = match[2].replace(/^["']|["']$/g, "");
}
return info;
}
function parseFlatInfo(block) {
const info = {};
for (const line of block.split(/\r?\n/)) {
const match = line.match(/^([A-Za-z][A-Za-z0-9_-]*):\s*(.+?)\s*$/);
if (match) {
info[match[1]] = match[2].replace(/^["']|["']$/g, "");
}
}
return info;
}
function extractLatestModelInfo(markdown) {
const lines = markdown.split(/\r?\n/);
const latestModelInfoIndex = lines.findIndex((line) =>
/^latestModelInfo:\s*$/.test(line)
);
if (latestModelInfoIndex >= 0) {
return parseIndentedInfo(lines, latestModelInfoIndex);
}
const commentMatch = markdown.match(
/<!--\s*latestModelInfo\s*\n([\s\S]*?)\n\s*-->/m
);
if (commentMatch) {
return parseFlatInfo(commentMatch[1]);
}
return undefined;
}
function modelToSkillSlug(model) {
return model.trim().replace(/\./g, "p");
}
function absoluteUrl(baseUrl, value) {
return new URL(value, baseUrl).toString();
}
function normalizeInfo(info, baseUrl) {
const model = info?.model?.trim();
const migrationGuide = info?.migrationGuide?.trim();
const promptingGuide = info?.promptingGuide?.trim();
if (!model || !migrationGuide || !promptingGuide) {
throw new Error(
"latestModelInfo must include model, migrationGuide, and promptingGuide"
);
}
return {
model,
modelSlug: modelToSkillSlug(model),
migrationGuideUrl: absoluteUrl(baseUrl, migrationGuide),
promptingGuideUrl: absoluteUrl(baseUrl, promptingGuide),
};
}
async function main() {
const { source, baseUrl } = parseArgs(process.argv);
const markdown = await readSource(source);
const info = extractLatestModelInfo(markdown);
if (!info) {
throw new Error(`latestModelInfo block not found in ${source}`);
}
process.stdout.write(
`${JSON.stringify(normalizeInfo(info, baseUrl), null, 2)}\n`
);
}
main().catch((error) => {
console.error(error.message);
process.exit(1);
});

View File

@@ -1,160 +0,0 @@
---
name: plugin-creator
description: Create and scaffold plugin directories for Codex with a required `.codex-plugin/plugin.json`, optional plugin folders/files, and baseline placeholders you can edit before publishing or testing. Use when Codex needs to create a new local plugin, add optional plugin structure, or generate or update repo-root `.agents/plugins/marketplace.json` entries for plugin ordering and availability metadata.
---
# Plugin Creator
## Quick Start
1. Run the scaffold script:
```bash
# Plugin names are normalized to lower-case hyphen-case and must be <= 64 chars.
# The generated folder and plugin.json name are always the same.
# Run from repo root (or replace .agents/... with the absolute path to this SKILL).
# By default creates in <repo_root>/plugins/<plugin-name>.
python3 .agents/skills/plugin-creator/scripts/create_basic_plugin.py <plugin-name>
```
2. Open `<plugin-path>/.codex-plugin/plugin.json` and replace `[TODO: ...]` placeholders.
3. Generate or update the repo marketplace entry when the plugin should appear in Codex UI ordering:
```bash
# marketplace.json always lives at <repo-root>/.agents/plugins/marketplace.json
python3 .agents/skills/plugin-creator/scripts/create_basic_plugin.py my-plugin --with-marketplace
```
For a home-local plugin, treat `<home>` as the root and use:
```bash
python3 .agents/skills/plugin-creator/scripts/create_basic_plugin.py my-plugin \
--path ~/plugins \
--marketplace-path ~/.agents/plugins/marketplace.json \
--with-marketplace
```
4. Generate/adjust optional companion folders as needed:
```bash
python3 .agents/skills/plugin-creator/scripts/create_basic_plugin.py my-plugin --path <parent-plugin-directory> \
--with-skills --with-hooks --with-scripts --with-assets --with-mcp --with-apps --with-marketplace
```
`<parent-plugin-directory>` is the directory where the plugin folder `<plugin-name>` will be created (for example `~/code/plugins`).
## What this skill creates
- If the user has not made the plugin location explicit, ask whether they want a repo-local plugin or a home-local plugin before generating marketplace entries.
- Creates plugin root at `/<parent-plugin-directory>/<plugin-name>/`.
- Always creates `/<parent-plugin-directory>/<plugin-name>/.codex-plugin/plugin.json`.
- Fills the manifest with the full schema shape, placeholder values, and the complete `interface` section.
- Creates or updates `<repo-root>/.agents/plugins/marketplace.json` when `--with-marketplace` is set.
- If the marketplace file does not exist yet, seed top-level `name` plus `interface.displayName` placeholders before adding the first plugin entry.
- `<plugin-name>` is normalized using skill-creator naming rules:
- `My Plugin``my-plugin`
- `My--Plugin``my-plugin`
- underscores, spaces, and punctuation are converted to `-`
- result is lower-case hyphen-delimited with consecutive hyphens collapsed
- Supports optional creation of:
- `skills/`
- `hooks/`
- `scripts/`
- `assets/`
- `.mcp.json`
- `.app.json`
## Marketplace workflow
- `marketplace.json` always lives at `<repo-root>/.agents/plugins/marketplace.json`.
- For a home-local plugin, use the same convention with `<home>` as the root:
`~/.agents/plugins/marketplace.json` plus `./plugins/<plugin-name>`.
- Marketplace root metadata supports top-level `name` plus optional `interface.displayName`.
- Treat plugin order in `plugins[]` as render order in Codex. Append new entries unless a user explicitly asks to reorder the list.
- `displayName` belongs inside the marketplace `interface` object, not individual `plugins[]` entries.
- Each generated marketplace entry must include all of:
- `policy.installation`
- `policy.authentication`
- `category`
- Default new entries to:
- `policy.installation: "AVAILABLE"`
- `policy.authentication: "ON_INSTALL"`
- Override defaults only when the user explicitly specifies another allowed value.
- Allowed `policy.installation` values:
- `NOT_AVAILABLE`
- `AVAILABLE`
- `INSTALLED_BY_DEFAULT`
- Allowed `policy.authentication` values:
- `ON_INSTALL`
- `ON_USE`
- Treat `policy.products` as an override. Omit it unless the user explicitly requests product gating.
- The generated plugin entry shape is:
```json
{
"name": "plugin-name",
"source": {
"source": "local",
"path": "./plugins/plugin-name"
},
"policy": {
"installation": "AVAILABLE",
"authentication": "ON_INSTALL"
},
"category": "Productivity"
}
```
- Use `--force` only when intentionally replacing an existing marketplace entry for the same plugin name.
- If `<repo-root>/.agents/plugins/marketplace.json` does not exist yet, create it with top-level `"name"`, an `"interface"` object containing `"displayName"`, and a `plugins` array, then add the new entry.
- For a brand-new marketplace file, the root object should look like:
```json
{
"name": "[TODO: marketplace-name]",
"interface": {
"displayName": "[TODO: Marketplace Display Name]"
},
"plugins": [
{
"name": "plugin-name",
"source": {
"source": "local",
"path": "./plugins/plugin-name"
},
"policy": {
"installation": "AVAILABLE",
"authentication": "ON_INSTALL"
},
"category": "Productivity"
}
]
}
```
## Required behavior
- Outer folder name and `plugin.json` `"name"` are always the same normalized plugin name.
- Do not remove required structure; keep `.codex-plugin/plugin.json` present.
- Keep manifest values as placeholders until a human or follow-up step explicitly fills them.
- If creating files inside an existing plugin path, use `--force` only when overwrite is intentional.
- Preserve any existing marketplace `interface.displayName`.
- When generating marketplace entries, always write `policy.installation`, `policy.authentication`, and `category` even if their values are defaults.
- Add `policy.products` only when the user explicitly asks for that override.
- Keep marketplace `source.path` relative to repo root as `./plugins/<plugin-name>`.
## Reference to exact spec sample
For the exact canonical sample JSON for both plugin manifests and marketplace entries, use:
- `references/plugin-json-spec.md`
## Validation
After editing `SKILL.md`, run:
```bash
python3 <path-to-skill-creator>/scripts/quick_validate.py .agents/skills/plugin-creator
```

View File

@@ -1,6 +0,0 @@
interface:
display_name: "Plugin Creator"
short_description: "Scaffold plugins and marketplace entries"
default_prompt: "Use $plugin-creator to scaffold a plugin with placeholder plugin.json, optional structure, and a marketplace.json entry."
icon_small: "./assets/plugin-creator-small.svg"
icon_large: "./assets/plugin-creator.png"

View File

@@ -1,3 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" viewBox="0 0 20 20">
<path fill="#0D0D0D" d="M12.03 4.113a3.612 3.612 0 0 1 5.108 5.108l-6.292 6.29c-.324.324-.56.561-.791.752l-.235.176c-.205.14-.422.261-.65.36l-.229.093a4.136 4.136 0 0 1-.586.16l-.764.134-2.394.4c-.142.024-.294.05-.423.06-.098.007-.232.01-.378-.026l-.149-.05a1.081 1.081 0 0 1-.521-.474l-.046-.093a1.104 1.104 0 0 1-.075-.527c.01-.129.035-.28.06-.422l.398-2.394c.1-.602.162-.987.295-1.35l.093-.23c.1-.228.22-.445.36-.65l.176-.235c.19-.232.428-.467.751-.79l6.292-6.292Zm-5.35 7.232c-.35.35-.534.535-.66.688l-.11.147a2.67 2.67 0 0 0-.24.433l-.062.154c-.08.22-.124.462-.232 1.112l-.398 2.394-.001.001h.003l2.393-.399.717-.126a2.63 2.63 0 0 0 .394-.105l.154-.063a2.65 2.65 0 0 0 .433-.24l.147-.11c.153-.126.339-.31.688-.66l4.988-4.988-3.227-3.226-4.987 4.988Zm9.517-6.291a2.281 2.281 0 0 0-3.225 0l-.364.362 3.226 3.227.363-.364c.89-.89.89-2.334 0-3.225ZM4.583 1.783a.3.3 0 0 1 .294.241c.117.585.347 1.092.707 1.48.357.385.859.668 1.549.783a.3.3 0 0 1 0 .592c-.69.115-1.192.398-1.549.783-.315.34-.53.77-.657 1.265l-.05.215a.3.3 0 0 1-.588 0c-.117-.585-.347-1.092-.707-1.48-.357-.384-.859-.668-1.549-.783a.3.3 0 0 1 0-.592c.69-.115 1.192-.398 1.549-.783.36-.388.59-.895.707-1.48l.015-.05a.3.3 0 0 1 .279-.19Z"/>
</svg>

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.5 KiB

View File

@@ -1,170 +0,0 @@
# Plugin JSON sample spec
```json
{
"name": "plugin-name",
"version": "1.2.0",
"description": "Brief plugin description",
"author": {
"name": "Author Name",
"email": "author@example.com",
"url": "https://github.com/author"
},
"homepage": "https://docs.example.com/plugin",
"repository": "https://github.com/author/plugin",
"license": "MIT",
"keywords": ["keyword1", "keyword2"],
"skills": "./skills/",
"hooks": "./hooks.json",
"mcpServers": "./.mcp.json",
"apps": "./.app.json",
"interface": {
"displayName": "Plugin Display Name",
"shortDescription": "Short description for subtitle",
"longDescription": "Long description for details page",
"developerName": "OpenAI",
"category": "Productivity",
"capabilities": ["Interactive", "Write"],
"websiteURL": "https://openai.com/",
"privacyPolicyURL": "https://openai.com/policies/row-privacy-policy/",
"termsOfServiceURL": "https://openai.com/policies/row-terms-of-use/",
"defaultPrompt": [
"Summarize my inbox and draft replies for me.",
"Find open bugs and turn them into Linear tickets.",
"Review today's meetings and flag scheduling gaps."
],
"brandColor": "#3B82F6",
"composerIcon": "./assets/icon.png",
"logo": "./assets/logo.png",
"screenshots": [
"./assets/screenshot1.png",
"./assets/screenshot2.png",
"./assets/screenshot3.png"
]
}
}
```
## Field guide
### Top-level fields
- `name` (`string`): Plugin identifier (kebab-case, no spaces). Required if `plugin.json` is provided and used as manifest name and component namespace.
- `version` (`string`): Plugin semantic version.
- `description` (`string`): Short purpose summary.
- `author` (`object`): Publisher identity.
- `name` (`string`): Author or team name.
- `email` (`string`): Contact email.
- `url` (`string`): Author/team homepage or profile URL.
- `homepage` (`string`): Documentation URL for plugin usage.
- `repository` (`string`): Source code URL.
- `license` (`string`): License identifier (for example `MIT`, `Apache-2.0`).
- `keywords` (`array` of `string`): Search/discovery tags.
- `skills` (`string`): Relative path to skill directories/files.
- `hooks` (`string`): Hook config path.
- `mcpServers` (`string`): MCP config path.
- `apps` (`string`): App manifest path for plugin integrations.
- `interface` (`object`): Interface/UX metadata block for plugin presentation.
### `interface` fields
- `displayName` (`string`): User-facing title shown for the plugin.
- `shortDescription` (`string`): Brief subtitle used in compact views.
- `longDescription` (`string`): Longer description used on details screens.
- `developerName` (`string`): Human-readable publisher name.
- `category` (`string`): Plugin category bucket.
- `capabilities` (`array` of `string`): Capability list from implementation.
- `websiteURL` (`string`): Public website for the plugin.
- `privacyPolicyURL` (`string`): Privacy policy URL.
- `termsOfServiceURL` (`string`): Terms of service URL.
- `defaultPrompt` (`array` of `string`): Starter prompts shown in composer/UX context.
- Include at most 3 strings. Entries after the first 3 are ignored and will not be included.
- Each string is capped at 128 characters. Longer entries are truncated.
- Prefer short starter prompts around 50 characters so they scan well in the UI.
- `brandColor` (`string`): Theme color for the plugin card.
- `composerIcon` (`string`): Path to icon asset.
- `logo` (`string`): Path to logo asset.
- `screenshots` (`array` of `string`): List of screenshot asset paths.
- Screenshot entries must be PNG filenames and stored under `./assets/`.
- Keep file paths relative to plugin root.
### Path conventions and defaults
- Path values should be relative and begin with `./`.
- `skills`, `hooks`, and `mcpServers` are supplemented on top of default component discovery; they do not replace defaults.
- Custom path values must follow the plugin root convention and naming/namespacing rules.
- This repos scaffold writes `.codex-plugin/plugin.json`; treat that as the manifest location this skill generates.
# Marketplace JSON sample spec
`marketplace.json` depends on where the plugin should live:
- Repo plugin: `<repo-root>/.agents/plugins/marketplace.json`
- Local plugin: `~/.agents/plugins/marketplace.json`
```json
{
"name": "openai-curated",
"interface": {
"displayName": "ChatGPT Official"
},
"plugins": [
{
"name": "linear",
"source": {
"source": "local",
"path": "./plugins/linear"
},
"policy": {
"installation": "AVAILABLE",
"authentication": "ON_INSTALL"
},
"category": "Productivity"
}
]
}
```
## Marketplace field guide
### Top-level fields
- `name` (`string`): Marketplace identifier or catalog name.
- `interface` (`object`, optional): Marketplace presentation metadata.
- `plugins` (`array`): Ordered plugin entries. This order determines how Codex renders plugins.
### `interface` fields
- `displayName` (`string`, optional): User-facing marketplace title.
### Plugin entry fields
- `name` (`string`): Plugin identifier. Match the plugin folder name and `plugin.json` `name`.
- `source` (`object`): Plugin source descriptor.
- `source` (`string`): Use `local` for this repo workflow.
- `path` (`string`): Relative plugin path based on the marketplace root.
- Repo plugin: `./plugins/<plugin-name>`
- Local plugin in `~/.agents/plugins/marketplace.json`: `./plugins/<plugin-name>`
- The same relative path convention is used for both repo-rooted and home-rooted marketplaces.
- Example: with `~/.agents/plugins/marketplace.json`, `./plugins/<plugin-name>` resolves to `~/plugins/<plugin-name>`.
- `policy` (`object`): Marketplace policy block. Always include it.
- `installation` (`string`): Availability policy.
- Allowed values: `NOT_AVAILABLE`, `AVAILABLE`, `INSTALLED_BY_DEFAULT`
- Default for new entries: `AVAILABLE`
- `authentication` (`string`): Authentication timing policy.
- Allowed values: `ON_INSTALL`, `ON_USE`
- Default for new entries: `ON_INSTALL`
- `products` (`array` of `string`, optional): Product override for this plugin entry. Omit it unless product gating is explicitly requested.
- `category` (`string`): Display category bucket. Always include it.
### Marketplace generation rules
- `displayName` belongs under the top-level `interface` object, not individual plugin entries.
- When creating a new marketplace file from scratch, seed `interface.displayName` alongside top-level `name`.
- Always include `policy.installation`, `policy.authentication`, and `category` on every generated or updated plugin entry.
- Treat `policy.products` as an override and omit it unless explicitly requested.
- Append new entries unless the user explicitly requests reordering.
- Replace an existing entry for the same plugin only when overwrite is intentional.
- Choose marketplace location to match the plugin destination:
- Repo plugin: `<repo-root>/.agents/plugins/marketplace.json`
- Local plugin: `~/.agents/plugins/marketplace.json`

View File

@@ -1,301 +0,0 @@
#!/usr/bin/env python3
"""Scaffold a plugin directory and optionally update marketplace.json."""
from __future__ import annotations
import argparse
import json
import re
from pathlib import Path
from typing import Any
MAX_PLUGIN_NAME_LENGTH = 64
DEFAULT_PLUGIN_PARENT = Path.cwd() / "plugins"
DEFAULT_MARKETPLACE_PATH = Path.cwd() / ".agents" / "plugins" / "marketplace.json"
DEFAULT_INSTALL_POLICY = "AVAILABLE"
DEFAULT_AUTH_POLICY = "ON_INSTALL"
DEFAULT_CATEGORY = "Productivity"
DEFAULT_MARKETPLACE_DISPLAY_NAME = "[TODO: Marketplace Display Name]"
VALID_INSTALL_POLICIES = {"NOT_AVAILABLE", "AVAILABLE", "INSTALLED_BY_DEFAULT"}
VALID_AUTH_POLICIES = {"ON_INSTALL", "ON_USE"}
def normalize_plugin_name(plugin_name: str) -> str:
"""Normalize a plugin name to lowercase hyphen-case."""
normalized = plugin_name.strip().lower()
normalized = re.sub(r"[^a-z0-9]+", "-", normalized)
normalized = normalized.strip("-")
normalized = re.sub(r"-{2,}", "-", normalized)
return normalized
def validate_plugin_name(plugin_name: str) -> None:
if not plugin_name:
raise ValueError("Plugin name must include at least one letter or digit.")
if len(plugin_name) > MAX_PLUGIN_NAME_LENGTH:
raise ValueError(
f"Plugin name '{plugin_name}' is too long ({len(plugin_name)} characters). "
f"Maximum is {MAX_PLUGIN_NAME_LENGTH} characters."
)
def build_plugin_json(plugin_name: str) -> dict:
return {
"name": plugin_name,
"version": "[TODO: 1.2.0]",
"description": "[TODO: Brief plugin description]",
"author": {
"name": "[TODO: Author Name]",
"email": "[TODO: author@example.com]",
"url": "[TODO: https://github.com/author]",
},
"homepage": "[TODO: https://docs.example.com/plugin]",
"repository": "[TODO: https://github.com/author/plugin]",
"license": "[TODO: MIT]",
"keywords": ["[TODO: keyword1]", "[TODO: keyword2]"],
"skills": "[TODO: ./skills/]",
"hooks": "[TODO: ./hooks.json]",
"mcpServers": "[TODO: ./.mcp.json]",
"apps": "[TODO: ./.app.json]",
"interface": {
"displayName": "[TODO: Plugin Display Name]",
"shortDescription": "[TODO: Short description for subtitle]",
"longDescription": "[TODO: Long description for details page]",
"developerName": "[TODO: OpenAI]",
"category": "[TODO: Productivity]",
"capabilities": ["[TODO: Interactive]", "[TODO: Write]"],
"websiteURL": "[TODO: https://openai.com/]",
"privacyPolicyURL": "[TODO: https://openai.com/policies/row-privacy-policy/]",
"termsOfServiceURL": "[TODO: https://openai.com/policies/row-terms-of-use/]",
"defaultPrompt": [
"[TODO: Summarize my inbox and draft replies for me.]",
"[TODO: Find open bugs and turn them into tickets.]",
"[TODO: Review today's meetings and flag gaps.]",
],
"brandColor": "[TODO: #3B82F6]",
"composerIcon": "[TODO: ./assets/icon.png]",
"logo": "[TODO: ./assets/logo.png]",
"screenshots": [
"[TODO: ./assets/screenshot1.png]",
"[TODO: ./assets/screenshot2.png]",
"[TODO: ./assets/screenshot3.png]",
],
},
}
def build_marketplace_entry(
plugin_name: str,
install_policy: str,
auth_policy: str,
category: str,
) -> dict[str, Any]:
return {
"name": plugin_name,
"source": {
"source": "local",
"path": f"./plugins/{plugin_name}",
},
"policy": {
"installation": install_policy,
"authentication": auth_policy,
},
"category": category,
}
def load_json(path: Path) -> dict[str, Any]:
with path.open() as handle:
return json.load(handle)
def build_default_marketplace() -> dict[str, Any]:
return {
"name": "[TODO: marketplace-name]",
"interface": {
"displayName": DEFAULT_MARKETPLACE_DISPLAY_NAME,
},
"plugins": [],
}
def validate_marketplace_interface(payload: dict[str, Any]) -> None:
interface = payload.get("interface")
if interface is not None and not isinstance(interface, dict):
raise ValueError("marketplace.json field 'interface' must be an object.")
def update_marketplace_json(
marketplace_path: Path,
plugin_name: str,
install_policy: str,
auth_policy: str,
category: str,
force: bool,
) -> None:
if marketplace_path.exists():
payload = load_json(marketplace_path)
else:
payload = build_default_marketplace()
if not isinstance(payload, dict):
raise ValueError(f"{marketplace_path} must contain a JSON object.")
validate_marketplace_interface(payload)
plugins = payload.setdefault("plugins", [])
if not isinstance(plugins, list):
raise ValueError(f"{marketplace_path} field 'plugins' must be an array.")
new_entry = build_marketplace_entry(plugin_name, install_policy, auth_policy, category)
for index, entry in enumerate(plugins):
if isinstance(entry, dict) and entry.get("name") == plugin_name:
if not force:
raise FileExistsError(
f"Marketplace entry '{plugin_name}' already exists in {marketplace_path}. "
"Use --force to overwrite that entry."
)
plugins[index] = new_entry
break
else:
plugins.append(new_entry)
write_json(marketplace_path, payload, force=True)
def write_json(path: Path, data: dict, force: bool) -> None:
if path.exists() and not force:
raise FileExistsError(f"{path} already exists. Use --force to overwrite.")
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("w") as handle:
json.dump(data, handle, indent=2)
handle.write("\n")
def create_stub_file(path: Path, payload: dict, force: bool) -> None:
if path.exists() and not force:
return
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("w") as handle:
json.dump(payload, handle, indent=2)
handle.write("\n")
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Create a plugin skeleton with placeholder plugin.json."
)
parser.add_argument("plugin_name")
parser.add_argument(
"--path",
default=str(DEFAULT_PLUGIN_PARENT),
help=(
"Parent directory for plugin creation (defaults to <cwd>/plugins). "
"When using a home-rooted marketplace, use <home>/plugins."
),
)
parser.add_argument("--with-skills", action="store_true", help="Create skills/ directory")
parser.add_argument("--with-hooks", action="store_true", help="Create hooks/ directory")
parser.add_argument("--with-scripts", action="store_true", help="Create scripts/ directory")
parser.add_argument("--with-assets", action="store_true", help="Create assets/ directory")
parser.add_argument("--with-mcp", action="store_true", help="Create .mcp.json placeholder")
parser.add_argument("--with-apps", action="store_true", help="Create .app.json placeholder")
parser.add_argument(
"--with-marketplace",
action="store_true",
help=(
"Create or update <cwd>/.agents/plugins/marketplace.json. "
"Marketplace entries always point to ./plugins/<plugin-name> relative to the "
"marketplace root."
),
)
parser.add_argument(
"--marketplace-path",
default=str(DEFAULT_MARKETPLACE_PATH),
help=(
"Path to marketplace.json (defaults to <cwd>/.agents/plugins/marketplace.json). "
"For a home-rooted marketplace, use <home>/.agents/plugins/marketplace.json."
),
)
parser.add_argument(
"--install-policy",
default=DEFAULT_INSTALL_POLICY,
choices=sorted(VALID_INSTALL_POLICIES),
help="Marketplace policy.installation value",
)
parser.add_argument(
"--auth-policy",
default=DEFAULT_AUTH_POLICY,
choices=sorted(VALID_AUTH_POLICIES),
help="Marketplace policy.authentication value",
)
parser.add_argument(
"--category",
default=DEFAULT_CATEGORY,
help="Marketplace category value",
)
parser.add_argument("--force", action="store_true", help="Overwrite existing files")
return parser.parse_args()
def main() -> None:
args = parse_args()
raw_plugin_name = args.plugin_name
plugin_name = normalize_plugin_name(raw_plugin_name)
if plugin_name != raw_plugin_name:
print(f"Note: Normalized plugin name from '{raw_plugin_name}' to '{plugin_name}'.")
validate_plugin_name(plugin_name)
plugin_root = (Path(args.path).expanduser().resolve() / plugin_name)
plugin_root.mkdir(parents=True, exist_ok=True)
plugin_json_path = plugin_root / ".codex-plugin" / "plugin.json"
write_json(plugin_json_path, build_plugin_json(plugin_name), args.force)
optional_directories = {
"skills": args.with_skills,
"hooks": args.with_hooks,
"scripts": args.with_scripts,
"assets": args.with_assets,
}
for folder, enabled in optional_directories.items():
if enabled:
(plugin_root / folder).mkdir(parents=True, exist_ok=True)
if args.with_mcp:
create_stub_file(
plugin_root / ".mcp.json",
{"mcpServers": {}},
args.force,
)
if args.with_apps:
create_stub_file(
plugin_root / ".app.json",
{
"apps": {},
},
args.force,
)
if args.with_marketplace:
marketplace_path = Path(args.marketplace_path).expanduser().resolve()
update_marketplace_json(
marketplace_path,
plugin_name,
args.install_policy,
args.auth_policy,
args.category,
args.force,
)
print(f"Created plugin scaffold: {plugin_root}")
print(f"plugin manifest: {plugin_json_path}")
if args.with_marketplace:
print(f"marketplace manifest: {marketplace_path}")
if __name__ == "__main__":
main()

View File

@@ -1,416 +0,0 @@
---
name: skill-creator
description: Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Codex's capabilities with specialized knowledge, workflows, or tool integrations.
metadata:
short-description: Create or update a skill
---
# Skill Creator
This skill provides guidance for creating effective skills.
## About Skills
Skills are modular, self-contained folders that extend Codex's capabilities by providing
specialized knowledge, workflows, and tools. Think of them as "onboarding guides" for specific
domains or tasks—they transform Codex from a general-purpose agent into a specialized agent
equipped with procedural knowledge that no model can fully possess.
### What Skills Provide
1. Specialized workflows - Multi-step procedures for specific domains
2. Tool integrations - Instructions for working with specific file formats or APIs
3. Domain expertise - Company-specific knowledge, schemas, business logic
4. Bundled resources - Scripts, references, and assets for complex and repetitive tasks
## Core Principles
### Concise is Key
The context window is a public good. Skills share the context window with everything else Codex needs: system prompt, conversation history, other Skills' metadata, and the actual user request.
**Default assumption: Codex is already very smart.** Only add context Codex doesn't already have. Challenge each piece of information: "Does Codex really need this explanation?" and "Does this paragraph justify its token cost?"
Prefer concise examples over verbose explanations.
### Set Appropriate Degrees of Freedom
Match the level of specificity to the task's fragility and variability:
**High freedom (text-based instructions)**: Use when multiple approaches are valid, decisions depend on context, or heuristics guide the approach.
**Medium freedom (pseudocode or scripts with parameters)**: Use when a preferred pattern exists, some variation is acceptable, or configuration affects behavior.
**Low freedom (specific scripts, few parameters)**: Use when operations are fragile and error-prone, consistency is critical, or a specific sequence must be followed.
Think of Codex as exploring a path: a narrow bridge with cliffs needs specific guardrails (low freedom), while an open field allows many routes (high freedom).
### Protect Validation Integrity
You may use subagents during iteration to validate whether a skill works on realistic tasks or whether a suspected problem is real. This is most useful when you want an independent pass on the skill's behavior, outputs, or failure modes after a revision. Only do this when it is possible to start new subagents.
When using subagents for validation, treat that as an evaluation surface. The goal is to learn whether the skill generalizes, not whether another agent can reconstruct the answer from leaked context.
Prefer raw artifacts such as example prompts, outputs, diffs, logs, or traces. Give the minimum task-local context needed to perform the validation. Avoid passing the intended answer, suspected bug, intended fix, or your prior conclusions unless the validation explicitly requires them.
### Anatomy of a Skill
Every skill consists of a required SKILL.md file and optional bundled resources:
```
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter metadata (required)
│ │ ├── name: (required)
│ │ └── description: (required)
│ └── Markdown instructions (required)
├── agents/ (recommended)
│ └── openai.yaml - UI metadata for skill lists and chips
└── Bundled Resources (optional)
├── scripts/ - Executable code (Python/Bash/etc.)
├── references/ - Documentation intended to be loaded into context as needed
└── assets/ - Files used in output (templates, icons, fonts, etc.)
```
#### SKILL.md (required)
Every SKILL.md consists of:
- **Frontmatter** (YAML): Contains `name` and `description` fields. These are the only fields that Codex reads to determine when the skill gets used, thus it is very important to be clear and comprehensive in describing what the skill is, and when it should be used.
- **Body** (Markdown): Instructions and guidance for using the skill. Only loaded AFTER the skill triggers (if at all).
#### Agents metadata (recommended)
- UI-facing metadata for skill lists and chips
- Read references/openai_yaml.md before generating values and follow its descriptions and constraints
- Create: human-facing `display_name`, `short_description`, and `default_prompt` by reading the skill
- Generate deterministically by passing the values as `--interface key=value` to `scripts/generate_openai_yaml.py` or `scripts/init_skill.py`
- On updates: validate `agents/openai.yaml` still matches SKILL.md; regenerate if stale
- Only include other optional interface fields (icons, brand color) if explicitly provided
- See references/openai_yaml.md for field definitions and examples
#### Bundled Resources (optional)
##### Scripts (`scripts/`)
Executable code (Python/Bash/etc.) for tasks that require deterministic reliability or are repeatedly rewritten.
- **When to include**: When the same code is being rewritten repeatedly or deterministic reliability is needed
- **Example**: `scripts/rotate_pdf.py` for PDF rotation tasks
- **Benefits**: Token efficient, deterministic, may be executed without loading into context
- **Note**: Scripts may still need to be read by Codex for patching or environment-specific adjustments
##### References (`references/`)
Documentation and reference material intended to be loaded as needed into context to inform Codex's process and thinking.
- **When to include**: For documentation that Codex should reference while working
- **Examples**: `references/finance.md` for financial schemas, `references/mnda.md` for company NDA template, `references/policies.md` for company policies, `references/api_docs.md` for API specifications
- **Use cases**: Database schemas, API documentation, domain knowledge, company policies, detailed workflow guides
- **Benefits**: Keeps SKILL.md lean, loaded only when Codex determines it's needed
- **Best practice**: If files are large (>10k words), include grep search patterns in SKILL.md
- **Avoid duplication**: Information should live in either SKILL.md or references files, not both. Prefer references files for detailed information unless it's truly core to the skill—this keeps SKILL.md lean while making information discoverable without hogging the context window. Keep only essential procedural instructions and workflow guidance in SKILL.md; move detailed reference material, schemas, and examples to references files.
##### Assets (`assets/`)
Files not intended to be loaded into context, but rather used within the output Codex produces.
- **When to include**: When the skill needs files that will be used in the final output
- **Examples**: `assets/logo.png` for brand assets, `assets/slides.pptx` for PowerPoint templates, `assets/frontend-template/` for HTML/React boilerplate, `assets/font.ttf` for typography
- **Use cases**: Templates, images, icons, boilerplate code, fonts, sample documents that get copied or modified
- **Benefits**: Separates output resources from documentation, enables Codex to use files without loading them into context
#### What to Not Include in a Skill
A skill should only contain essential files that directly support its functionality. Do NOT create extraneous documentation or auxiliary files, including:
- README.md
- INSTALLATION_GUIDE.md
- QUICK_REFERENCE.md
- CHANGELOG.md
- etc.
The skill should only contain the information needed for an AI agent to do the job at hand. It should not contain auxiliary context about the process that went into creating it, setup and testing procedures, user-facing documentation, etc. Creating additional documentation files just adds clutter and confusion.
### Progressive Disclosure Design Principle
Skills use a three-level loading system to manage context efficiently:
1. **Metadata (name + description)** - Always in context (~100 words)
2. **SKILL.md body** - When skill triggers (<5k words)
3. **Bundled resources** - As needed by Codex (Unlimited because scripts can be executed without reading into context window)
#### Progressive Disclosure Patterns
Keep SKILL.md body to the essentials and under 500 lines to minimize context bloat. Split content into separate files when approaching this limit. When splitting out content into other files, it is very important to reference them from SKILL.md and describe clearly when to read them, to ensure the reader of the skill knows they exist and when to use them.
**Key principle:** When a skill supports multiple variations, frameworks, or options, keep only the core workflow and selection guidance in SKILL.md. Move variant-specific details (patterns, examples, configuration) into separate reference files.
**Pattern 1: High-level guide with references**
```markdown
# PDF Processing
## Quick start
Extract text with pdfplumber:
[code example]
## Advanced features
- **Form filling**: See [FORMS.md](FORMS.md) for complete guide
- **API reference**: See [REFERENCE.md](REFERENCE.md) for all methods
- **Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns
```
Codex loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.
**Pattern 2: Domain-specific organization**
For Skills with multiple domains, organize content by domain to avoid loading irrelevant context:
```
bigquery-skill/
├── SKILL.md (overview and navigation)
└── reference/
├── finance.md (revenue, billing metrics)
├── sales.md (opportunities, pipeline)
├── product.md (API usage, features)
└── marketing.md (campaigns, attribution)
```
When a user asks about sales metrics, Codex only reads sales.md.
Similarly, for skills supporting multiple frameworks or variants, organize by variant:
```
cloud-deploy/
├── SKILL.md (workflow + provider selection)
└── references/
├── aws.md (AWS deployment patterns)
├── gcp.md (GCP deployment patterns)
└── azure.md (Azure deployment patterns)
```
When the user chooses AWS, Codex only reads aws.md.
**Pattern 3: Conditional details**
Show basic content, link to advanced content:
```markdown
# DOCX Processing
## Creating documents
Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
## Editing documents
For simple edits, modify the XML directly.
**For tracked changes**: See [REDLINING.md](REDLINING.md)
**For OOXML details**: See [OOXML.md](OOXML.md)
```
Codex reads REDLINING.md or OOXML.md only when the user needs those features.
**Important guidelines:**
- **Avoid deeply nested references** - Keep references one level deep from SKILL.md. All reference files should link directly from SKILL.md.
- **Structure longer reference files** - For files longer than 100 lines, include a table of contents at the top so Codex can see the full scope when previewing.
## Skill Creation Process
Skill creation involves these steps:
1. Understand the skill with concrete examples
2. Plan reusable skill contents (scripts, references, assets)
3. Initialize the skill (run init_skill.py)
4. Edit the skill (implement resources and write SKILL.md)
5. Validate the skill (run quick_validate.py)
6. Iterate based on real usage and forward-test complex skills.
Follow these steps in order, skipping only if there is a clear reason why they are not applicable.
### Skill Naming
- Use lowercase letters, digits, and hyphens only; normalize user-provided titles to hyphen-case (e.g., "Plan Mode" -> `plan-mode`).
- When generating names, generate a name under 64 characters (letters, digits, hyphens).
- Prefer short, verb-led phrases that describe the action.
- Namespace by tool when it improves clarity or triggering (e.g., `gh-address-comments`, `linear-address-issue`).
- Name the skill folder exactly after the skill name.
### Step 1: Understanding the Skill with Concrete Examples
Skip this step only when the skill's usage patterns are already clearly understood. It remains valuable even when working with an existing skill.
To create an effective skill, clearly understand concrete examples of how the skill will be used. This understanding can come from either direct user examples or generated examples that are validated with user feedback.
For example, when building an image-editor skill, relevant questions include:
- "What functionality should the image-editor skill support? Editing, rotating, anything else?"
- "Can you give some examples of how this skill would be used?"
- "I can imagine users asking for things like 'Remove the red-eye from this image' or 'Rotate this image'. Are there other ways you imagine this skill being used?"
- "What would a user say that should trigger this skill?"
- "Where should I create this skill? If you do not have a preference, I will place it in `$CODEX_HOME/skills` (or `~/.codex/skills` when `CODEX_HOME` is unset) so Codex can discover it automatically."
To avoid overwhelming users, avoid asking too many questions in a single message. Start with the most important questions and follow up as needed for better effectiveness.
Conclude this step when there is a clear sense of the functionality the skill should support.
### Step 2: Planning the Reusable Skill Contents
To turn concrete examples into an effective skill, analyze each example by:
1. Considering how to execute on the example from scratch
2. Identifying what scripts, references, and assets would be helpful when executing these workflows repeatedly
Example: When building a `pdf-editor` skill to handle queries like "Help me rotate this PDF," the analysis shows:
1. Rotating a PDF requires re-writing the same code each time
2. A `scripts/rotate_pdf.py` script would be helpful to store in the skill
Example: When designing a `frontend-webapp-builder` skill for queries like "Build me a todo app" or "Build me a dashboard to track my steps," the analysis shows:
1. Writing a frontend webapp requires the same boilerplate HTML/React each time
2. An `assets/hello-world/` template containing the boilerplate HTML/React project files would be helpful to store in the skill
Example: When building a `big-query` skill to handle queries like "How many users have logged in today?" the analysis shows:
1. Querying BigQuery requires re-discovering the table schemas and relationships each time
2. A `references/schema.md` file documenting the table schemas would be helpful to store in the skill
To establish the skill's contents, analyze each concrete example to create a list of the reusable resources to include: scripts, references, and assets.
### Step 3: Initializing the Skill
At this point, it is time to actually create the skill.
Skip this step only if the skill being developed already exists. In this case, continue to the next step.
Before running `init_skill.py`, ask where the user wants the skill created. If they do not specify a location, default to `$CODEX_HOME/skills`; when `CODEX_HOME` is unset, fall back to `~/.codex/skills` so the skill is auto-discovered.
When creating a new skill from scratch, always run the `init_skill.py` script. The script conveniently generates a new template skill directory that automatically includes everything a skill requires, making the skill creation process much more efficient and reliable.
Usage:
```bash
scripts/init_skill.py <skill-name> --path <output-directory> [--resources scripts,references,assets] [--examples]
```
Examples:
```bash
scripts/init_skill.py my-skill --path "${CODEX_HOME:-$HOME/.codex}/skills"
scripts/init_skill.py my-skill --path "${CODEX_HOME:-$HOME/.codex}/skills" --resources scripts,references
scripts/init_skill.py my-skill --path ~/work/skills --resources scripts --examples
```
The script:
- Creates the skill directory at the specified path
- Generates a SKILL.md template with proper frontmatter and TODO placeholders
- Creates `agents/openai.yaml` using agent-generated `display_name`, `short_description`, and `default_prompt` passed via `--interface key=value`
- Optionally creates resource directories based on `--resources`
- Optionally adds example files when `--examples` is set
After initialization, customize the SKILL.md and add resources as needed. If you used `--examples`, replace or delete placeholder files.
Generate `display_name`, `short_description`, and `default_prompt` by reading the skill, then pass them as `--interface key=value` to `init_skill.py` or regenerate with:
```bash
scripts/generate_openai_yaml.py <path/to/skill-folder> --interface key=value
```
Only include other optional interface fields when the user explicitly provides them. For full field descriptions and examples, see references/openai_yaml.md.
### Step 4: Edit the Skill
When editing the (newly-generated or existing) skill, remember that the skill is being created for another instance of Codex to use. Include information that would be beneficial and non-obvious to Codex. Consider what procedural knowledge, domain-specific details, or reusable assets would help another Codex instance execute these tasks more effectively.
After substantial revisions, or if the skill is particularly tricky, you should use subagents to forward-test the skill on realistic tasks or artifacts. When doing so, pass the artifact under validation rather than your diagnosis of what is wrong, and keep the prompt generic enough that success depends on transferable reasoning rather than hidden ground truth.
#### Start with Reusable Skill Contents
To begin implementation, start with the reusable resources identified above: `scripts/`, `references/`, and `assets/` files. Note that this step may require user input. For example, when implementing a `brand-guidelines` skill, the user may need to provide brand assets or templates to store in `assets/`, or documentation to store in `references/`.
Added scripts must be tested by actually running them to ensure there are no bugs and that the output matches what is expected. If there are many similar scripts, only a representative sample needs to be tested to ensure confidence that they all work while balancing time to completion.
If you used `--examples`, delete any placeholder files that are not needed for the skill. Only create resource directories that are actually required.
#### Update SKILL.md
**Writing Guidelines:** Always use imperative/infinitive form.
##### Frontmatter
Write the YAML frontmatter with `name` and `description`:
- `name`: The skill name
- `description`: This is the primary triggering mechanism for your skill, and helps Codex understand when to use the skill.
- Include both what the Skill does and specific triggers/contexts for when to use it.
- Include all "when to use" information here - Not in the body. The body is only loaded after triggering, so "When to Use This Skill" sections in the body are not helpful to Codex.
- Example description for a `docx` skill: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. Use when Codex needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
Do not include any other fields in YAML frontmatter.
##### Body
Write instructions for using the skill and its bundled resources.
### Step 5: Validate the Skill
Once development of the skill is complete, validate the skill folder to catch basic issues early:
```bash
scripts/quick_validate.py <path/to/skill-folder>
```
The validation script checks YAML frontmatter format, required fields, and naming rules. If validation fails, fix the reported issues and run the command again.
### Step 6: Iterate
After testing the skill, you may detect the skill is complex enough that it requires forward-testing; or users may request improvements.
User testing often this happens right after using the skill, with fresh context of how the skill performed.
**Forward-testing and iteration workflow:**
1. Use the skill on real tasks
2. Notice struggles or inefficiencies
3. Identify how SKILL.md or bundled resources should be updated
4. Implement changes and test again
5. Forward-test if it is reasonable and appropriate
## Forward-testing
To forward-test, launch subagents as a way to stress test the skill with minimal context.
Subagents should *not* know that they are being asked to test the skill. They should be treated as
an agent asked to perform a task by the user. Prompts to subagents should look like:
`Use $skill-x at /path/to/skill-x to solve problem y`
Not:
`Review the skill at /path/to/skill-x; pretend a user asks you to...`
Decision rule for forward-testing:
- Err on the side of forward-testing
- Ask for approval if you think there's a risk that forward-testing would:
* take a long time,
* require additional approvals from the user, or
* modify live production systems
In these cases, show the user your proposed prompt and request (1) a yes/no decision, and
(2) any suggested modifictions.
Considerations when forward-testing:
- use fresh threads for independent passes
- pass the skill, and a request in a similar way the user would.
- pass raw artifacts, not your conclusions
- avoid showing expected answers or intended fixes
- rebuild context from source artifacts after each iteration
- review the subagent's output and reasoning and emitted artifacts
- avoid leaving artifacts the agent can find on disk between iterations;
clean up subagents' artifacts to avoid additional contamination.
If forward-testing only succeeds when subagents see leaked context, tighten the skill or the
forward-testing setup before trusting the result.

View File

@@ -1,5 +0,0 @@
interface:
display_name: "Skill Creator"
short_description: "Create or update a skill"
icon_small: "./assets/skill-creator-small.svg"
icon_large: "./assets/skill-creator.png"

View File

@@ -1,3 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" viewBox="0 0 20 20">
<path fill="#0D0D0D" d="M12.03 4.113a3.612 3.612 0 0 1 5.108 5.108l-6.292 6.29c-.324.324-.56.561-.791.752l-.235.176c-.205.14-.422.261-.65.36l-.229.093a4.136 4.136 0 0 1-.586.16l-.764.134-2.394.4c-.142.024-.294.05-.423.06-.098.007-.232.01-.378-.026l-.149-.05a1.081 1.081 0 0 1-.521-.474l-.046-.093a1.104 1.104 0 0 1-.075-.527c.01-.129.035-.28.06-.422l.398-2.394c.1-.602.162-.987.295-1.35l.093-.23c.1-.228.22-.445.36-.65l.176-.235c.19-.232.428-.467.751-.79l6.292-6.292Zm-5.35 7.232c-.35.35-.534.535-.66.688l-.11.147a2.67 2.67 0 0 0-.24.433l-.062.154c-.08.22-.124.462-.232 1.112l-.398 2.394-.001.001h.003l2.393-.399.717-.126a2.63 2.63 0 0 0 .394-.105l.154-.063a2.65 2.65 0 0 0 .433-.24l.147-.11c.153-.126.339-.31.688-.66l4.988-4.988-3.227-3.226-4.987 4.988Zm9.517-6.291a2.281 2.281 0 0 0-3.225 0l-.364.362 3.226 3.227.363-.364c.89-.89.89-2.334 0-3.225ZM4.583 1.783a.3.3 0 0 1 .294.241c.117.585.347 1.092.707 1.48.357.385.859.668 1.549.783a.3.3 0 0 1 0 .592c-.69.115-1.192.398-1.549.783-.315.34-.53.77-.657 1.265l-.05.215a.3.3 0 0 1-.588 0c-.117-.585-.347-1.092-.707-1.48-.357-.384-.859-.668-1.549-.783a.3.3 0 0 1 0-.592c.69-.115 1.192-.398 1.549-.783.36-.388.59-.895.707-1.48l.015-.05a.3.3 0 0 1 .279-.19Z"/>
</svg>

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.5 KiB

View File

@@ -1,202 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,49 +0,0 @@
# openai.yaml fields (full example + descriptions)
`agents/openai.yaml` is an extended, product-specific config intended for the machine/harness to read, not the agent. Other product-specific config can also live in the `agents/` folder.
## Full example
```yaml
interface:
display_name: "Optional user-facing name"
short_description: "Optional user-facing description"
icon_small: "./assets/small-400px.png"
icon_large: "./assets/large-logo.svg"
brand_color: "#3B82F6"
default_prompt: "Optional surrounding prompt to use the skill with"
dependencies:
tools:
- type: "mcp"
value: "github"
description: "GitHub MCP server"
transport: "streamable_http"
url: "https://api.githubcopilot.com/mcp/"
policy:
allow_implicit_invocation: true
```
## Field descriptions and constraints
Top-level constraints:
- Quote all string values.
- Keep keys unquoted.
- For `interface.default_prompt`: generate a helpful, short (typically 1 sentence) example starting prompt based on the skill. It must explicitly mention the skill as `$skill-name` (e.g., "Use $skill-name-here to draft a concise weekly status update.").
- `interface.display_name`: Human-facing title shown in UI skill lists and chips.
- `interface.short_description`: Human-facing short UI blurb (2564 chars) for quick scanning.
- `interface.icon_small`: Path to a small icon asset (relative to skill dir). Default to `./assets/` and place icons in the skill's `assets/` folder.
- `interface.icon_large`: Path to a larger logo asset (relative to skill dir). Default to `./assets/` and place icons in the skill's `assets/` folder.
- `interface.brand_color`: Hex color used for UI accents (e.g., badges).
- `interface.default_prompt`: Default prompt snippet inserted when invoking the skill.
- `dependencies.tools[].type`: Dependency category. Only `mcp` is supported for now.
- `dependencies.tools[].value`: Identifier of the tool or dependency.
- `dependencies.tools[].description`: Human-readable explanation of the dependency.
- `dependencies.tools[].transport`: Connection type when `type` is `mcp`.
- `dependencies.tools[].url`: MCP server URL when `type` is `mcp`.
- `policy.allow_implicit_invocation`: When false, the skill is not injected into
the model context by default, but can still be invoked explicitly via `$skill`.
Defaults to true.

View File

@@ -1,226 +0,0 @@
#!/usr/bin/env python3
"""
OpenAI YAML Generator - Creates agents/openai.yaml for a skill folder.
Usage:
generate_openai_yaml.py <skill_dir> [--name <skill_name>] [--interface key=value]
"""
import argparse
import re
import sys
from pathlib import Path
ACRONYMS = {
"GH",
"MCP",
"API",
"CI",
"CLI",
"LLM",
"PDF",
"PR",
"UI",
"URL",
"SQL",
}
BRANDS = {
"openai": "OpenAI",
"openapi": "OpenAPI",
"github": "GitHub",
"pagerduty": "PagerDuty",
"datadog": "DataDog",
"sqlite": "SQLite",
"fastapi": "FastAPI",
}
SMALL_WORDS = {"and", "or", "to", "up", "with"}
ALLOWED_INTERFACE_KEYS = {
"display_name",
"short_description",
"icon_small",
"icon_large",
"brand_color",
"default_prompt",
}
def yaml_quote(value):
escaped = value.replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n")
return f'"{escaped}"'
def format_display_name(skill_name):
words = [word for word in skill_name.split("-") if word]
formatted = []
for index, word in enumerate(words):
lower = word.lower()
upper = word.upper()
if upper in ACRONYMS:
formatted.append(upper)
continue
if lower in BRANDS:
formatted.append(BRANDS[lower])
continue
if index > 0 and lower in SMALL_WORDS:
formatted.append(lower)
continue
formatted.append(word.capitalize())
return " ".join(formatted)
def generate_short_description(display_name):
description = f"Help with {display_name} tasks"
if len(description) < 25:
description = f"Help with {display_name} tasks and workflows"
if len(description) < 25:
description = f"Help with {display_name} tasks with guidance"
if len(description) > 64:
description = f"Help with {display_name}"
if len(description) > 64:
description = f"{display_name} helper"
if len(description) > 64:
description = f"{display_name} tools"
if len(description) > 64:
suffix = " helper"
max_name_length = 64 - len(suffix)
trimmed = display_name[:max_name_length].rstrip()
description = f"{trimmed}{suffix}"
if len(description) > 64:
description = description[:64].rstrip()
if len(description) < 25:
description = f"{description} workflows"
if len(description) > 64:
description = description[:64].rstrip()
return description
def read_frontmatter_name(skill_dir):
skill_md = Path(skill_dir) / "SKILL.md"
if not skill_md.exists():
print(f"[ERROR] SKILL.md not found in {skill_dir}")
return None
content = skill_md.read_text()
match = re.match(r"^---\n(.*?)\n---", content, re.DOTALL)
if not match:
print("[ERROR] Invalid SKILL.md frontmatter format.")
return None
frontmatter_text = match.group(1)
import yaml
try:
frontmatter = yaml.safe_load(frontmatter_text)
except yaml.YAMLError as exc:
print(f"[ERROR] Invalid YAML frontmatter: {exc}")
return None
if not isinstance(frontmatter, dict):
print("[ERROR] Frontmatter must be a YAML dictionary.")
return None
name = frontmatter.get("name", "")
if not isinstance(name, str) or not name.strip():
print("[ERROR] Frontmatter 'name' is missing or invalid.")
return None
return name.strip()
def parse_interface_overrides(raw_overrides):
overrides = {}
optional_order = []
for item in raw_overrides:
if "=" not in item:
print(f"[ERROR] Invalid interface override '{item}'. Use key=value.")
return None, None
key, value = item.split("=", 1)
key = key.strip()
value = value.strip()
if not key:
print(f"[ERROR] Invalid interface override '{item}'. Key is empty.")
return None, None
if key not in ALLOWED_INTERFACE_KEYS:
allowed = ", ".join(sorted(ALLOWED_INTERFACE_KEYS))
print(f"[ERROR] Unknown interface field '{key}'. Allowed: {allowed}")
return None, None
overrides[key] = value
if key not in ("display_name", "short_description") and key not in optional_order:
optional_order.append(key)
return overrides, optional_order
def write_openai_yaml(skill_dir, skill_name, raw_overrides):
overrides, optional_order = parse_interface_overrides(raw_overrides)
if overrides is None:
return None
display_name = overrides.get("display_name") or format_display_name(skill_name)
short_description = overrides.get("short_description") or generate_short_description(display_name)
if not (25 <= len(short_description) <= 64):
print(
"[ERROR] short_description must be 25-64 characters "
f"(got {len(short_description)})."
)
return None
interface_lines = [
"interface:",
f" display_name: {yaml_quote(display_name)}",
f" short_description: {yaml_quote(short_description)}",
]
for key in optional_order:
value = overrides.get(key)
if value is not None:
interface_lines.append(f" {key}: {yaml_quote(value)}")
agents_dir = Path(skill_dir) / "agents"
agents_dir.mkdir(parents=True, exist_ok=True)
output_path = agents_dir / "openai.yaml"
output_path.write_text("\n".join(interface_lines) + "\n")
print(f"[OK] Created agents/openai.yaml")
return output_path
def main():
parser = argparse.ArgumentParser(
description="Create agents/openai.yaml for a skill directory.",
)
parser.add_argument("skill_dir", help="Path to the skill directory")
parser.add_argument(
"--name",
help="Skill name override (defaults to SKILL.md frontmatter)",
)
parser.add_argument(
"--interface",
action="append",
default=[],
help="Interface override in key=value format (repeatable)",
)
args = parser.parse_args()
skill_dir = Path(args.skill_dir).resolve()
if not skill_dir.exists():
print(f"[ERROR] Skill directory not found: {skill_dir}")
sys.exit(1)
if not skill_dir.is_dir():
print(f"[ERROR] Path is not a directory: {skill_dir}")
sys.exit(1)
skill_name = args.name or read_frontmatter_name(skill_dir)
if not skill_name:
sys.exit(1)
result = write_openai_yaml(skill_dir, skill_name, args.interface)
if result:
sys.exit(0)
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -1,400 +0,0 @@
#!/usr/bin/env python3
"""
Skill Initializer - Creates a new skill from template
Usage:
init_skill.py <skill-name> --path <path> [--resources scripts,references,assets] [--examples] [--interface key=value]
Examples:
init_skill.py my-new-skill --path skills/public
init_skill.py my-new-skill --path skills/public --resources scripts,references
init_skill.py my-api-helper --path skills/private --resources scripts --examples
init_skill.py custom-skill --path /custom/location
init_skill.py my-skill --path skills/public --interface short_description="Short UI label"
"""
import argparse
import re
import sys
from pathlib import Path
from generate_openai_yaml import write_openai_yaml
MAX_SKILL_NAME_LENGTH = 64
ALLOWED_RESOURCES = {"scripts", "references", "assets"}
SKILL_TEMPLATE = """---
name: {skill_name}
description: [TODO: Complete and informative explanation of what the skill does and when to use it. Include WHEN to use this skill - specific scenarios, file types, or tasks that trigger it.]
---
# {skill_title}
## Overview
[TODO: 1-2 sentences explaining what this skill enables]
## Structuring This Skill
[TODO: Choose the structure that best fits this skill's purpose. Common patterns:
**1. Workflow-Based** (best for sequential processes)
- Works well when there are clear step-by-step procedures
- Example: DOCX skill with "Workflow Decision Tree" -> "Reading" -> "Creating" -> "Editing"
- Structure: ## Overview -> ## Workflow Decision Tree -> ## Step 1 -> ## Step 2...
**2. Task-Based** (best for tool collections)
- Works well when the skill offers different operations/capabilities
- Example: PDF skill with "Quick Start" -> "Merge PDFs" -> "Split PDFs" -> "Extract Text"
- Structure: ## Overview -> ## Quick Start -> ## Task Category 1 -> ## Task Category 2...
**3. Reference/Guidelines** (best for standards or specifications)
- Works well for brand guidelines, coding standards, or requirements
- Example: Brand styling with "Brand Guidelines" -> "Colors" -> "Typography" -> "Features"
- Structure: ## Overview -> ## Guidelines -> ## Specifications -> ## Usage...
**4. Capabilities-Based** (best for integrated systems)
- Works well when the skill provides multiple interrelated features
- Example: Product Management with "Core Capabilities" -> numbered capability list
- Structure: ## Overview -> ## Core Capabilities -> ### 1. Feature -> ### 2. Feature...
Patterns can be mixed and matched as needed. Most skills combine patterns (e.g., start with task-based, add workflow for complex operations).
Delete this entire "Structuring This Skill" section when done - it's just guidance.]
## [TODO: Replace with the first main section based on chosen structure]
[TODO: Add content here. See examples in existing skills:
- Code samples for technical skills
- Decision trees for complex workflows
- Concrete examples with realistic user requests
- References to scripts/templates/references as needed]
## Resources (optional)
Create only the resource directories this skill actually needs. Delete this section if no resources are required.
### scripts/
Executable code (Python/Bash/etc.) that can be run directly to perform specific operations.
**Examples from other skills:**
- PDF skill: `fill_fillable_fields.py`, `extract_form_field_info.py` - utilities for PDF manipulation
- DOCX skill: `document.py`, `utilities.py` - Python modules for document processing
**Appropriate for:** Python scripts, shell scripts, or any executable code that performs automation, data processing, or specific operations.
**Note:** Scripts may be executed without loading into context, but can still be read by Codex for patching or environment adjustments.
### references/
Documentation and reference material intended to be loaded into context to inform Codex's process and thinking.
**Examples from other skills:**
- Product management: `communication.md`, `context_building.md` - detailed workflow guides
- BigQuery: API reference documentation and query examples
- Finance: Schema documentation, company policies
**Appropriate for:** In-depth documentation, API references, database schemas, comprehensive guides, or any detailed information that Codex should reference while working.
### assets/
Files not intended to be loaded into context, but rather used within the output Codex produces.
**Examples from other skills:**
- Brand styling: PowerPoint template files (.pptx), logo files
- Frontend builder: HTML/React boilerplate project directories
- Typography: Font files (.ttf, .woff2)
**Appropriate for:** Templates, boilerplate code, document templates, images, icons, fonts, or any files meant to be copied or used in the final output.
---
**Not every skill requires all three types of resources.**
"""
EXAMPLE_SCRIPT = '''#!/usr/bin/env python3
"""
Example helper script for {skill_name}
This is a placeholder script that can be executed directly.
Replace with actual implementation or delete if not needed.
Example real scripts from other skills:
- pdf/scripts/fill_fillable_fields.py - Fills PDF form fields
- pdf/scripts/convert_pdf_to_images.py - Converts PDF pages to images
"""
def main():
print("This is an example script for {skill_name}")
# TODO: Add actual script logic here
# This could be data processing, file conversion, API calls, etc.
if __name__ == "__main__":
main()
'''
EXAMPLE_REFERENCE = """# Reference Documentation for {skill_title}
This is a placeholder for detailed reference documentation.
Replace with actual reference content or delete if not needed.
Example real reference docs from other skills:
- product-management/references/communication.md - Comprehensive guide for status updates
- product-management/references/context_building.md - Deep-dive on gathering context
- bigquery/references/ - API references and query examples
## When Reference Docs Are Useful
Reference docs are ideal for:
- Comprehensive API documentation
- Detailed workflow guides
- Complex multi-step processes
- Information too lengthy for main SKILL.md
- Content that's only needed for specific use cases
## Structure Suggestions
### API Reference Example
- Overview
- Authentication
- Endpoints with examples
- Error codes
- Rate limits
### Workflow Guide Example
- Prerequisites
- Step-by-step instructions
- Common patterns
- Troubleshooting
- Best practices
"""
EXAMPLE_ASSET = """# Example Asset File
This placeholder represents where asset files would be stored.
Replace with actual asset files (templates, images, fonts, etc.) or delete if not needed.
Asset files are NOT intended to be loaded into context, but rather used within
the output Codex produces.
Example asset files from other skills:
- Brand guidelines: logo.png, slides_template.pptx
- Frontend builder: hello-world/ directory with HTML/React boilerplate
- Typography: custom-font.ttf, font-family.woff2
- Data: sample_data.csv, test_dataset.json
## Common Asset Types
- Templates: .pptx, .docx, boilerplate directories
- Images: .png, .jpg, .svg, .gif
- Fonts: .ttf, .otf, .woff, .woff2
- Boilerplate code: Project directories, starter files
- Icons: .ico, .svg
- Data files: .csv, .json, .xml, .yaml
Note: This is a text placeholder. Actual assets can be any file type.
"""
def normalize_skill_name(skill_name):
"""Normalize a skill name to lowercase hyphen-case."""
normalized = skill_name.strip().lower()
normalized = re.sub(r"[^a-z0-9]+", "-", normalized)
normalized = normalized.strip("-")
normalized = re.sub(r"-{2,}", "-", normalized)
return normalized
def title_case_skill_name(skill_name):
"""Convert hyphenated skill name to Title Case for display."""
return " ".join(word.capitalize() for word in skill_name.split("-"))
def parse_resources(raw_resources):
if not raw_resources:
return []
resources = [item.strip() for item in raw_resources.split(",") if item.strip()]
invalid = sorted({item for item in resources if item not in ALLOWED_RESOURCES})
if invalid:
allowed = ", ".join(sorted(ALLOWED_RESOURCES))
print(f"[ERROR] Unknown resource type(s): {', '.join(invalid)}")
print(f" Allowed: {allowed}")
sys.exit(1)
deduped = []
seen = set()
for resource in resources:
if resource not in seen:
deduped.append(resource)
seen.add(resource)
return deduped
def create_resource_dirs(skill_dir, skill_name, skill_title, resources, include_examples):
for resource in resources:
resource_dir = skill_dir / resource
resource_dir.mkdir(exist_ok=True)
if resource == "scripts":
if include_examples:
example_script = resource_dir / "example.py"
example_script.write_text(EXAMPLE_SCRIPT.format(skill_name=skill_name))
example_script.chmod(0o755)
print("[OK] Created scripts/example.py")
else:
print("[OK] Created scripts/")
elif resource == "references":
if include_examples:
example_reference = resource_dir / "api_reference.md"
example_reference.write_text(EXAMPLE_REFERENCE.format(skill_title=skill_title))
print("[OK] Created references/api_reference.md")
else:
print("[OK] Created references/")
elif resource == "assets":
if include_examples:
example_asset = resource_dir / "example_asset.txt"
example_asset.write_text(EXAMPLE_ASSET)
print("[OK] Created assets/example_asset.txt")
else:
print("[OK] Created assets/")
def init_skill(skill_name, path, resources, include_examples, interface_overrides):
"""
Initialize a new skill directory with template SKILL.md.
Args:
skill_name: Name of the skill
path: Path where the skill directory should be created
resources: Resource directories to create
include_examples: Whether to create example files in resource directories
Returns:
Path to created skill directory, or None if error
"""
# Determine skill directory path
skill_dir = Path(path).resolve() / skill_name
# Check if directory already exists
if skill_dir.exists():
print(f"[ERROR] Skill directory already exists: {skill_dir}")
return None
# Create skill directory
try:
skill_dir.mkdir(parents=True, exist_ok=False)
print(f"[OK] Created skill directory: {skill_dir}")
except Exception as e:
print(f"[ERROR] Error creating directory: {e}")
return None
# Create SKILL.md from template
skill_title = title_case_skill_name(skill_name)
skill_content = SKILL_TEMPLATE.format(skill_name=skill_name, skill_title=skill_title)
skill_md_path = skill_dir / "SKILL.md"
try:
skill_md_path.write_text(skill_content)
print("[OK] Created SKILL.md")
except Exception as e:
print(f"[ERROR] Error creating SKILL.md: {e}")
return None
# Create agents/openai.yaml
try:
result = write_openai_yaml(skill_dir, skill_name, interface_overrides)
if not result:
return None
except Exception as e:
print(f"[ERROR] Error creating agents/openai.yaml: {e}")
return None
# Create resource directories if requested
if resources:
try:
create_resource_dirs(skill_dir, skill_name, skill_title, resources, include_examples)
except Exception as e:
print(f"[ERROR] Error creating resource directories: {e}")
return None
# Print next steps
print(f"\n[OK] Skill '{skill_name}' initialized successfully at {skill_dir}")
print("\nNext steps:")
print("1. Edit SKILL.md to complete the TODO items and update the description")
if resources:
if include_examples:
print("2. Customize or delete the example files in scripts/, references/, and assets/")
else:
print("2. Add resources to scripts/, references/, and assets/ as needed")
else:
print("2. Create resource directories only if needed (scripts/, references/, assets/)")
print("3. Update agents/openai.yaml if the UI metadata should differ")
print("4. Run the validator when ready to check the skill structure")
print(
"5. Forward-test complex skills with realistic user requests to ensure they work as intended"
)
return skill_dir
def main():
parser = argparse.ArgumentParser(
description="Create a new skill directory with a SKILL.md template.",
)
parser.add_argument("skill_name", help="Skill name (normalized to hyphen-case)")
parser.add_argument("--path", required=True, help="Output directory for the skill")
parser.add_argument(
"--resources",
default="",
help="Comma-separated list: scripts,references,assets",
)
parser.add_argument(
"--examples",
action="store_true",
help="Create example files inside the selected resource directories",
)
parser.add_argument(
"--interface",
action="append",
default=[],
help="Interface override in key=value format (repeatable)",
)
args = parser.parse_args()
raw_skill_name = args.skill_name
skill_name = normalize_skill_name(raw_skill_name)
if not skill_name:
print("[ERROR] Skill name must include at least one letter or digit.")
sys.exit(1)
if len(skill_name) > MAX_SKILL_NAME_LENGTH:
print(
f"[ERROR] Skill name '{skill_name}' is too long ({len(skill_name)} characters). "
f"Maximum is {MAX_SKILL_NAME_LENGTH} characters."
)
sys.exit(1)
if skill_name != raw_skill_name:
print(f"Note: Normalized skill name from '{raw_skill_name}' to '{skill_name}'.")
resources = parse_resources(args.resources)
if args.examples and not resources:
print("[ERROR] --examples requires --resources to be set.")
sys.exit(1)
path = args.path
print(f"Initializing skill: {skill_name}")
print(f" Location: {path}")
if resources:
print(f" Resources: {', '.join(resources)}")
if args.examples:
print(" Examples: enabled")
else:
print(" Resources: none (create as needed)")
print()
result = init_skill(skill_name, path, resources, args.examples, args.interface)
if result:
sys.exit(0)
else:
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -1,101 +0,0 @@
#!/usr/bin/env python3
"""
Quick validation script for skills - minimal version
"""
import re
import sys
from pathlib import Path
import yaml
MAX_SKILL_NAME_LENGTH = 64
def validate_skill(skill_path):
"""Basic validation of a skill"""
skill_path = Path(skill_path)
skill_md = skill_path / "SKILL.md"
if not skill_md.exists():
return False, "SKILL.md not found"
content = skill_md.read_text()
if not content.startswith("---"):
return False, "No YAML frontmatter found"
match = re.match(r"^---\n(.*?)\n---", content, re.DOTALL)
if not match:
return False, "Invalid frontmatter format"
frontmatter_text = match.group(1)
try:
frontmatter = yaml.safe_load(frontmatter_text)
if not isinstance(frontmatter, dict):
return False, "Frontmatter must be a YAML dictionary"
except yaml.YAMLError as e:
return False, f"Invalid YAML in frontmatter: {e}"
allowed_properties = {"name", "description", "license", "allowed-tools", "metadata"}
unexpected_keys = set(frontmatter.keys()) - allowed_properties
if unexpected_keys:
allowed = ", ".join(sorted(allowed_properties))
unexpected = ", ".join(sorted(unexpected_keys))
return (
False,
f"Unexpected key(s) in SKILL.md frontmatter: {unexpected}. Allowed properties are: {allowed}",
)
if "name" not in frontmatter:
return False, "Missing 'name' in frontmatter"
if "description" not in frontmatter:
return False, "Missing 'description' in frontmatter"
name = frontmatter.get("name", "")
if not isinstance(name, str):
return False, f"Name must be a string, got {type(name).__name__}"
name = name.strip()
if name:
if not re.match(r"^[a-z0-9-]+$", name):
return (
False,
f"Name '{name}' should be hyphen-case (lowercase letters, digits, and hyphens only)",
)
if name.startswith("-") or name.endswith("-") or "--" in name:
return (
False,
f"Name '{name}' cannot start/end with hyphen or contain consecutive hyphens",
)
if len(name) > MAX_SKILL_NAME_LENGTH:
return (
False,
f"Name is too long ({len(name)} characters). "
f"Maximum is {MAX_SKILL_NAME_LENGTH} characters.",
)
description = frontmatter.get("description", "")
if not isinstance(description, str):
return False, f"Description must be a string, got {type(description).__name__}"
description = description.strip()
if description:
if "<" in description or ">" in description:
return False, "Description cannot contain angle brackets (< or >)"
if len(description) > 1024:
return (
False,
f"Description is too long ({len(description)} characters). Maximum is 1024 characters.",
)
return True, "Skill is valid!"
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python quick_validate.py <skill_directory>")
sys.exit(1)
valid, message = validate_skill(sys.argv[1])
print(message)
sys.exit(0 if valid else 1)

View File

@@ -1,202 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,58 +0,0 @@
---
name: skill-installer
description: Install Codex skills into $CODEX_HOME/skills from a curated list or a GitHub repo path. Use when a user asks to list installable skills, install a curated skill, or install a skill from another repo (including private repos).
metadata:
short-description: Install curated skills from openai/skills or other repos
---
# Skill Installer
Helps install skills. By default these are from https://github.com/openai/skills/tree/main/skills/.curated, but users can also provide other locations. Experimental skills live in https://github.com/openai/skills/tree/main/skills/.experimental and can be installed the same way.
Use the helper scripts based on the task:
- List skills when the user asks what is available, or if the user uses this skill without specifying what to do. Default listing is `.curated`, but you can pass `--path skills/.experimental` when they ask about experimental skills.
- Install from the curated list when the user provides a skill name.
- Install from another repo when the user provides a GitHub repo/path (including private repos).
Install skills with the helper scripts.
## Communication
When listing skills, output approximately as follows, depending on the context of the user's request. If they ask about experimental skills, list from `.experimental` instead of `.curated` and label the source accordingly:
"""
Skills from {repo}:
1. skill-1
2. skill-2 (already installed)
3. ...
Which ones would you like installed?
"""
After installing a skill, tell the user: "Restart Codex to pick up new skills."
## Scripts
All of these scripts use network, so when running in the sandbox, request escalation when running them.
- `scripts/list-skills.py` (prints skills list with installed annotations)
- `scripts/list-skills.py --format json`
- Example (experimental list): `scripts/list-skills.py --path skills/.experimental`
- `scripts/install-skill-from-github.py --repo <owner>/<repo> --path <path/to/skill> [<path/to/skill> ...]`
- `scripts/install-skill-from-github.py --url https://github.com/<owner>/<repo>/tree/<ref>/<path>`
- Example (experimental skill): `scripts/install-skill-from-github.py --repo openai/skills --path skills/.experimental/<skill-name>`
## Behavior and Options
- Defaults to direct download for public GitHub repos.
- If download fails with auth/permission errors, falls back to git sparse checkout.
- Aborts if the destination skill directory already exists.
- Installs into `$CODEX_HOME/skills/<skill-name>` (defaults to `~/.codex/skills`).
- Multiple `--path` values install multiple skills in one run, each named from the path basename unless `--name` is supplied.
- Options: `--ref <ref>` (default `main`), `--dest <path>`, `--method auto|download|git`.
## Notes
- Curated listing is fetched from `https://github.com/openai/skills/tree/main/skills/.curated` via the GitHub API. If it is unavailable, explain the error and exit.
- Private GitHub repos can be accessed via existing git credentials or optional `GITHUB_TOKEN`/`GH_TOKEN` for download.
- Git fallback tries HTTPS first, then SSH.
- The skills at https://github.com/openai/skills/tree/main/skills/.system are preinstalled, so no need to help users install those. If they ask, just explain this. If they insist, you can download and overwrite.
- Installed annotations come from `$CODEX_HOME/skills`.

View File

@@ -1,5 +0,0 @@
interface:
display_name: "Skill Installer"
short_description: "Install curated skills from openai/skills or other repos"
icon_small: "./assets/skill-installer-small.svg"
icon_large: "./assets/skill-installer.png"

View File

@@ -1,3 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path fill="#0D0D0D" d="M2.145 3.959a2.033 2.033 0 0 1 2.022-1.824h5.966c.551 0 .997 0 1.357.029.367.03.692.093.993.246l.174.098c.397.243.72.593.932 1.01l.053.114c.116.269.168.557.194.878.03.36.03.805.03 1.357v4.3a2.365 2.365 0 0 1-2.366 2.365h-1.312a2.198 2.198 0 0 1-4.377 0H4.167A2.032 2.032 0 0 1 2.135 10.5V9.333l.004-.088A.865.865 0 0 1 3 8.468l.116-.006A1.135 1.135 0 0 0 3 6.199a.865.865 0 0 1-.865-.864V4.167l.01-.208Zm1.054 1.186a2.198 2.198 0 0 1 0 4.376v.98c0 .534.433.967.968.967H6l.089.004a.866.866 0 0 1 .776.861 1.135 1.135 0 0 0 2.27 0c0-.478.387-.865.865-.865h1.5c.719 0 1.301-.583 1.301-1.301v-4.3c0-.57 0-.964-.025-1.27a1.933 1.933 0 0 0-.09-.493L12.642 4a1.47 1.47 0 0 0-.541-.585l-.102-.056c-.126-.065-.295-.11-.596-.135a17.31 17.31 0 0 0-1.27-.025H4.167a.968.968 0 0 0-.968.968v.978Z"/>
</svg>

Before

Width:  |  Height:  |  Size: 923 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

View File

@@ -1,21 +0,0 @@
#!/usr/bin/env python3
"""Shared GitHub helpers for skill install scripts."""
from __future__ import annotations
import os
import urllib.request
def github_request(url: str, user_agent: str) -> bytes:
headers = {"User-Agent": user_agent}
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
if token:
headers["Authorization"] = f"token {token}"
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req) as resp:
return resp.read()
def github_api_contents_url(repo: str, path: str, ref: str) -> str:
return f"https://api.github.com/repos/{repo}/contents/{path}?ref={ref}"

View File

@@ -1,308 +0,0 @@
#!/usr/bin/env python3
"""Install a skill from a GitHub repo path into $CODEX_HOME/skills."""
from __future__ import annotations
import argparse
from dataclasses import dataclass
import os
import shutil
import subprocess
import sys
import tempfile
import urllib.error
import urllib.parse
import zipfile
from github_utils import github_request
DEFAULT_REF = "main"
@dataclass
class Args:
url: str | None = None
repo: str | None = None
path: list[str] | None = None
ref: str = DEFAULT_REF
dest: str | None = None
name: str | None = None
method: str = "auto"
@dataclass
class Source:
owner: str
repo: str
ref: str
paths: list[str]
repo_url: str | None = None
class InstallError(Exception):
pass
def _codex_home() -> str:
return os.environ.get("CODEX_HOME", os.path.expanduser("~/.codex"))
def _tmp_root() -> str:
base = os.path.join(tempfile.gettempdir(), "codex")
os.makedirs(base, exist_ok=True)
return base
def _request(url: str) -> bytes:
return github_request(url, "codex-skill-install")
def _parse_github_url(url: str, default_ref: str) -> tuple[str, str, str, str | None]:
parsed = urllib.parse.urlparse(url)
if parsed.netloc != "github.com":
raise InstallError("Only GitHub URLs are supported for download mode.")
parts = [p for p in parsed.path.split("/") if p]
if len(parts) < 2:
raise InstallError("Invalid GitHub URL.")
owner, repo = parts[0], parts[1]
ref = default_ref
subpath = ""
if len(parts) > 2:
if parts[2] in ("tree", "blob"):
if len(parts) < 4:
raise InstallError("GitHub URL missing ref or path.")
ref = parts[3]
subpath = "/".join(parts[4:])
else:
subpath = "/".join(parts[2:])
return owner, repo, ref, subpath or None
def _download_repo_zip(owner: str, repo: str, ref: str, dest_dir: str) -> str:
zip_url = f"https://codeload.github.com/{owner}/{repo}/zip/{ref}"
zip_path = os.path.join(dest_dir, "repo.zip")
try:
payload = _request(zip_url)
except urllib.error.HTTPError as exc:
raise InstallError(f"Download failed: HTTP {exc.code}") from exc
with open(zip_path, "wb") as file_handle:
file_handle.write(payload)
with zipfile.ZipFile(zip_path, "r") as zip_file:
_safe_extract_zip(zip_file, dest_dir)
top_levels = {name.split("/")[0] for name in zip_file.namelist() if name}
if not top_levels:
raise InstallError("Downloaded archive was empty.")
if len(top_levels) != 1:
raise InstallError("Unexpected archive layout.")
return os.path.join(dest_dir, next(iter(top_levels)))
def _run_git(args: list[str]) -> None:
result = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
if result.returncode != 0:
raise InstallError(result.stderr.strip() or "Git command failed.")
def _safe_extract_zip(zip_file: zipfile.ZipFile, dest_dir: str) -> None:
dest_root = os.path.realpath(dest_dir)
for info in zip_file.infolist():
extracted_path = os.path.realpath(os.path.join(dest_dir, info.filename))
if extracted_path == dest_root or extracted_path.startswith(dest_root + os.sep):
continue
raise InstallError("Archive contains files outside the destination.")
zip_file.extractall(dest_dir)
def _validate_relative_path(path: str) -> None:
if os.path.isabs(path) or os.path.normpath(path).startswith(".."):
raise InstallError("Skill path must be a relative path inside the repo.")
def _validate_skill_name(name: str) -> None:
altsep = os.path.altsep
if not name or os.path.sep in name or (altsep and altsep in name):
raise InstallError("Skill name must be a single path segment.")
if name in (".", ".."):
raise InstallError("Invalid skill name.")
def _git_sparse_checkout(repo_url: str, ref: str, paths: list[str], dest_dir: str) -> str:
repo_dir = os.path.join(dest_dir, "repo")
clone_cmd = [
"git",
"clone",
"--filter=blob:none",
"--depth",
"1",
"--sparse",
"--single-branch",
"--branch",
ref,
repo_url,
repo_dir,
]
try:
_run_git(clone_cmd)
except InstallError:
_run_git(
[
"git",
"clone",
"--filter=blob:none",
"--depth",
"1",
"--sparse",
"--single-branch",
repo_url,
repo_dir,
]
)
_run_git(["git", "-C", repo_dir, "sparse-checkout", "set", *paths])
_run_git(["git", "-C", repo_dir, "checkout", ref])
return repo_dir
def _validate_skill(path: str) -> None:
if not os.path.isdir(path):
raise InstallError(f"Skill path not found: {path}")
skill_md = os.path.join(path, "SKILL.md")
if not os.path.isfile(skill_md):
raise InstallError("SKILL.md not found in selected skill directory.")
def _copy_skill(src: str, dest_dir: str) -> None:
os.makedirs(os.path.dirname(dest_dir), exist_ok=True)
if os.path.exists(dest_dir):
raise InstallError(f"Destination already exists: {dest_dir}")
shutil.copytree(src, dest_dir)
def _build_repo_url(owner: str, repo: str) -> str:
return f"https://github.com/{owner}/{repo}.git"
def _build_repo_ssh(owner: str, repo: str) -> str:
return f"git@github.com:{owner}/{repo}.git"
def _prepare_repo(source: Source, method: str, tmp_dir: str) -> str:
if method in ("download", "auto"):
try:
return _download_repo_zip(source.owner, source.repo, source.ref, tmp_dir)
except InstallError as exc:
if method == "download":
raise
err_msg = str(exc)
if "HTTP 401" in err_msg or "HTTP 403" in err_msg or "HTTP 404" in err_msg:
pass
else:
raise
if method in ("git", "auto"):
repo_url = source.repo_url or _build_repo_url(source.owner, source.repo)
try:
return _git_sparse_checkout(repo_url, source.ref, source.paths, tmp_dir)
except InstallError:
repo_url = _build_repo_ssh(source.owner, source.repo)
return _git_sparse_checkout(repo_url, source.ref, source.paths, tmp_dir)
raise InstallError("Unsupported method.")
def _resolve_source(args: Args) -> Source:
if args.url:
owner, repo, ref, url_path = _parse_github_url(args.url, args.ref)
if args.path is not None:
paths = list(args.path)
elif url_path:
paths = [url_path]
else:
paths = []
if not paths:
raise InstallError("Missing --path for GitHub URL.")
return Source(owner=owner, repo=repo, ref=ref, paths=paths)
if not args.repo:
raise InstallError("Provide --repo or --url.")
if "://" in args.repo:
return _resolve_source(
Args(url=args.repo, repo=None, path=args.path, ref=args.ref)
)
repo_parts = [p for p in args.repo.split("/") if p]
if len(repo_parts) != 2:
raise InstallError("--repo must be in owner/repo format.")
if not args.path:
raise InstallError("Missing --path for --repo.")
paths = list(args.path)
return Source(
owner=repo_parts[0],
repo=repo_parts[1],
ref=args.ref,
paths=paths,
)
def _default_dest() -> str:
return os.path.join(_codex_home(), "skills")
def _parse_args(argv: list[str]) -> Args:
parser = argparse.ArgumentParser(description="Install a skill from GitHub.")
parser.add_argument("--repo", help="owner/repo")
parser.add_argument("--url", help="https://github.com/owner/repo[/tree/ref/path]")
parser.add_argument(
"--path",
nargs="+",
help="Path(s) to skill(s) inside repo",
)
parser.add_argument("--ref", default=DEFAULT_REF)
parser.add_argument("--dest", help="Destination skills directory")
parser.add_argument(
"--name", help="Destination skill name (defaults to basename of path)"
)
parser.add_argument(
"--method",
choices=["auto", "download", "git"],
default="auto",
)
return parser.parse_args(argv, namespace=Args())
def main(argv: list[str]) -> int:
args = _parse_args(argv)
try:
source = _resolve_source(args)
source.ref = source.ref or args.ref
if not source.paths:
raise InstallError("No skill paths provided.")
for path in source.paths:
_validate_relative_path(path)
dest_root = args.dest or _default_dest()
tmp_dir = tempfile.mkdtemp(prefix="skill-install-", dir=_tmp_root())
try:
repo_root = _prepare_repo(source, args.method, tmp_dir)
installed = []
for path in source.paths:
skill_name = args.name if len(source.paths) == 1 else None
skill_name = skill_name or os.path.basename(path.rstrip("/"))
_validate_skill_name(skill_name)
if not skill_name:
raise InstallError("Unable to derive skill name.")
dest_dir = os.path.join(dest_root, skill_name)
if os.path.exists(dest_dir):
raise InstallError(f"Destination already exists: {dest_dir}")
skill_src = os.path.join(repo_root, path)
_validate_skill(skill_src)
_copy_skill(skill_src, dest_dir)
installed.append((skill_name, dest_dir))
finally:
if os.path.isdir(tmp_dir):
shutil.rmtree(tmp_dir, ignore_errors=True)
for skill_name, dest_dir in installed:
print(f"Installed {skill_name} to {dest_dir}")
return 0
except InstallError as exc:
print(f"Error: {exc}", file=sys.stderr)
return 1
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))

View File

@@ -1,107 +0,0 @@
#!/usr/bin/env python3
"""List skills from a GitHub repo path."""
from __future__ import annotations
import argparse
import json
import os
import sys
import urllib.error
from github_utils import github_api_contents_url, github_request
DEFAULT_REPO = "openai/skills"
DEFAULT_PATH = "skills/.curated"
DEFAULT_REF = "main"
class ListError(Exception):
pass
class Args(argparse.Namespace):
repo: str
path: str
ref: str
format: str
def _request(url: str) -> bytes:
return github_request(url, "codex-skill-list")
def _codex_home() -> str:
return os.environ.get("CODEX_HOME", os.path.expanduser("~/.codex"))
def _installed_skills() -> set[str]:
root = os.path.join(_codex_home(), "skills")
if not os.path.isdir(root):
return set()
entries = set()
for name in os.listdir(root):
path = os.path.join(root, name)
if os.path.isdir(path):
entries.add(name)
return entries
def _list_skills(repo: str, path: str, ref: str) -> list[str]:
api_url = github_api_contents_url(repo, path, ref)
try:
payload = _request(api_url)
except urllib.error.HTTPError as exc:
if exc.code == 404:
raise ListError(
"Skills path not found: "
f"https://github.com/{repo}/tree/{ref}/{path}"
) from exc
raise ListError(f"Failed to fetch skills: HTTP {exc.code}") from exc
data = json.loads(payload.decode("utf-8"))
if not isinstance(data, list):
raise ListError("Unexpected skills listing response.")
skills = [item["name"] for item in data if item.get("type") == "dir"]
return sorted(skills)
def _parse_args(argv: list[str]) -> Args:
parser = argparse.ArgumentParser(description="List skills.")
parser.add_argument("--repo", default=DEFAULT_REPO)
parser.add_argument(
"--path",
default=DEFAULT_PATH,
help="Repo path to list (default: skills/.curated)",
)
parser.add_argument("--ref", default=DEFAULT_REF)
parser.add_argument(
"--format",
choices=["text", "json"],
default="text",
help="Output format",
)
return parser.parse_args(argv, namespace=Args())
def main(argv: list[str]) -> int:
args = _parse_args(argv)
try:
skills = _list_skills(args.repo, args.path, args.ref)
installed = _installed_skills()
if args.format == "json":
payload = [
{"name": name, "installed": name in installed} for name in skills
]
print(json.dumps(payload))
else:
for idx, name in enumerate(skills, start=1):
suffix = " (already installed)" if name in installed else ""
print(f"{idx}. {name}{suffix}")
return 0
except ListError as exc:
print(f"Error: {exc}", file=sys.stderr)
return 1
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))

4
dotfiles/codex/.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
*
!.gitignore
!AGENTS.md
!config.toml

View File

@@ -1 +0,0 @@
../agents/skills

View File

@@ -0,0 +1,57 @@
#!/usr/bin/env bash
set -euo pipefail
tracked_skills_dir="${1:?usage: setup_codex_skills TRACKED_SKILLS_DIR [RUNTIME_SKILLS_DIR] [CODEX_SKILLS_LINK]}"
runtime_skills_dir="${2:-${XDG_DATA_HOME:-$HOME/.local/share}/codex/skills}"
codex_skills_link="${3:-$HOME/.codex/skills}"
mkdir -p "$runtime_skills_dir"
mkdir -p "$(dirname "$codex_skills_link")"
# Preserve generated Codex skill bundles while moving them out of the git worktree.
for generated_name in .system codex-primary-runtime; do
generated_src="$tracked_skills_dir/$generated_name"
generated_dest="$runtime_skills_dir/$generated_name"
if [[ -e "$generated_src" && ! -e "$generated_dest" ]]; then
mv "$generated_src" "$generated_dest"
fi
done
if [[ -L "$codex_skills_link" ]]; then
current_target="$(readlink "$codex_skills_link")"
if [[ "$current_target" != "$runtime_skills_dir" ]]; then
rm -f "$codex_skills_link"
fi
elif [[ -e "$codex_skills_link" ]]; then
echo "Skipping $codex_skills_link because it exists and is not a symlink" >&2
codex_skills_link=""
fi
if [[ -n "$codex_skills_link" && ! -e "$codex_skills_link" ]]; then
ln -s "$runtime_skills_dir" "$codex_skills_link"
fi
for skill_src in "$tracked_skills_dir"/*; do
[[ -e "$skill_src" || -L "$skill_src" ]] || continue
skill_name="$(basename "$skill_src")"
case "$skill_name" in
.system|codex-primary-runtime)
continue
;;
esac
skill_dest="$runtime_skills_dir/$skill_name"
if [[ -L "$skill_dest" ]]; then
current_target="$(readlink "$skill_dest")"
if [[ "$current_target" == "$skill_src" ]]; then
continue
fi
rm -f "$skill_dest"
elif [[ -e "$skill_dest" ]]; then
echo "Skipping skill $skill_name because $skill_dest already exists and is not a symlink" >&2
continue
fi
ln -s "$skill_src" "$skill_dest"
done

View File

@@ -109,6 +109,10 @@ in {
fi fi
''; '';
home.activation.linkCodexSkills = lib.hm.dag.entryAfter ["writeBoundary"] ''
${pkgs.bash}/bin/bash "${libDir}/bin/setup_codex_skills" "${dotfilesDir}/agents/skills"
'';
home.sessionPath = [ home.sessionPath = [
"$HOME/.cargo/bin" "$HOME/.cargo/bin"
"${libDir}/bin" "${libDir}/bin"

View File

@@ -1,4 +1,4 @@
{ config, lib, ... }: { config, lib, pkgs, ... }:
let let
# Replicate the useful part of rcm/rcup: # Replicate the useful part of rcm/rcup:
# - dotfiles live in ~/dotfiles/dotfiles (no leading dots in the repo) # - dotfiles live in ~/dotfiles/dotfiles (no leading dots in the repo)
@@ -70,4 +70,8 @@ in
echo "Skipping ~/.emacs.d relink because it is not a symlink" >&2 echo "Skipping ~/.emacs.d relink because it is not a symlink" >&2
fi fi
''; '';
home.activation.linkCodexSkills = lib.hm.dag.entryAfter ["writeBoundary"] ''
${pkgs.bash}/bin/bash "${worktreeDotfiles}/lib/bin/setup_codex_skills" "${worktreeDotfiles}/agents/skills"
'';
} }