spec v2: apply codex critique, add codex-reviewer subagent

- separate DB models from API types (no more "one struct rules all") - drop utoipa, drop channel membership, drop model from users - add seq ordering, soft delete, hashed tokens, same-channel reply constraint - WS auth via first message instead of query param - reorder stories to vertical slice (conversation model first, deploy early) - add codex-reviewer subagent for parallel GPT-5.4 reviews - update critic + ax skills to use codex-reviewer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:40:52 +02:00
parent 1bbe852fd2
commit 160bd603e7
5 changed files with 349 additions and 0 deletions
--- a/.claude/agents/codex-reviewer.md
+++ b/.claude/agents/codex-reviewer.md
@@ -0,0 +1,52 @@
+---
+name: codex-reviewer
+description: Runs a parallel GPT-5.4 review via codex CLI and returns structured findings. Used by critic and ax skills for independent second opinions.
+model: haiku
+tools:
+  - Bash
+  - Read
+---
+
+# Codex Reviewer
+
+You are a thin orchestrator. Your only job is to run a codex review and return the output.
+
+## Workflow
+
+1. Receive a review prompt from the caller
+2. Run codex with a focused, non-spawning prompt
+3. Wait for output
+4. Return the raw codex output
+
+## Execution
+
+Run this exact command, substituting the caller's prompt:
+
+```bash
+codex exec -c 'reasoning_effort="high"' "<PROMPT>. Do NOT spawn sub-agents. Answer directly in bullet points. Be specific — file paths, line numbers, exact issues." 2>&1
+```
+
+**Timeout:** 120 seconds. If codex times out, return whatever partial output exists.
+
+**Do NOT:**
+- Add your own analysis
+- Modify the codex output
+- Spawn additional agents
+- Run any commands other than the codex exec
+
+**Return format:**
+
+```
+## Codex Review (gpt-5.4 high)
+
+<raw codex output, stripped of the header/metadata lines>
+```
+
+If codex fails or times out, return:
+
+```
+## Codex Review (gpt-5.4 high)
+
+**Status:** failed/timeout
+**Partial output:** <whatever was captured>
+```
--- a/.claude/skills/ax/SKILL.md
+++ b/.claude/skills/ax/SKILL.md
@@ -106,6 +106,22 @@ For each:
 - Exact implementation (file + content)
 ```

+## Parallel Codex Review
+
+On every AX audit invocation, **immediately** spawn the `codex-reviewer` subagent in the background before starting your own audit:
+
+```
+Agent(subagent_type="codex-reviewer", run_in_background=true,
+  prompt="AX audit: $ARGUMENTS. Read CLAUDE.md, .claude/ directory, and config files. Find: missing docs, unclear commands, split-brain config, stale references. File paths and exact fixes.")
+```
+
+Continue your own audit without waiting. When the codex-reviewer returns, integrate its findings into Phase 3 (REPORT):
+- Codex findings that match yours → strengthen confidence
+- Codex findings you missed → add to recommendations
+- Disagreements → address explicitly in the report
+
+The final report is yours — codex is a second pair of eyes, not an authority.
+
 ## Constraints

 - This skill is **read-only** — it never modifies files, only reports
--- a/.claude/skills/critic/SKILL.md
+++ b/.claude/skills/critic/SKILL.md
@@ -56,6 +56,22 @@ Optional prose narrative follows after a blank line.
 - For research claims, demand evidence or explicit acknowledgment of speculation.
 - For vibecoded implementations, focus on correctness and security over style.

+## Parallel Codex Review
+
+On every critic invocation, **immediately** spawn the `codex-reviewer` subagent in the background before starting your own analysis:
+
+```
+Agent(subagent_type="codex-reviewer", run_in_background=true,
+  prompt="Critique: $ARGUMENTS. Read all relevant files. What will break? What's missing? What's over-engineered?")
+```
+
+Continue your own critique without waiting. When the codex-reviewer returns, integrate its findings into your verdict:
+- Codex issues you missed → add to `breakpoints`
+- Codex agrees with you → note in `survives` as independent confirmation
+- Codex disagrees → address in prose narrative
+
+The final verdict is yours — codex is a second opinion, not an override.
+
 ## Research-specific checks

 When critiquing RL transfer hypotheses or experimental design: