Files
apes/.claude/skills/critic/SKILL.md
limiteinductive 983221df33 add frontend-design rule, fix codex-reviewer, route table update
- add .claude/rules/frontend.md: auto-loads /frontend-design for ui/** files
- add frontend-design to CLAUDE.md route table
- fix codex reviewer: use direct Bash instead of subagent (permissions issue)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:45:13 +02:00

3.4 KiB

name, description
name description
critic Stress-test research hypotheses, architecture decisions, and vibecoded implementations with adversarial-but-fair critique. Returns structured JSON verdicts. Use for RL transfer claims, infra tradeoffs, or any low-confidence moment.

Critic

Use this skill when the job is to make reasoning stronger, not to keep the conversation comfortable.

Good fits

  • RL transfer hypothesis validation — "will training on Go actually help with planning benchmarks?"
  • architecture tradeoffs — self-hosted vs managed, monolith vs services
  • vibecoded implementation review — "this works but was generated fast, is it sound?"
  • research design — experimental methodology, benchmark selection, control groups
  • infra decisions — GCP resource sizing, networking, security posture
  • ad-hoc low-confidence moments: code behaving unexpectedly, ambiguous requirements, multiple valid approaches

Do not use for

  • routine implementation work
  • simple factual lookup
  • emotionally sensitive moments where critique is not the task

Output contract

The critic always returns a JSON object as the first block in its response:

{
  "verdict": "proceed | hold | flag | reopen",
  "confidence": 0.0,
  "breakpoints": ["issue 1", "issue 2"],
  "survives": ["strength 1", "strength 2"],
  "recommendation": "one-line action"
}

Verdicts:

  • proceed — no blocking issues
  • hold — do not proceed until breakpoints resolved
  • flag — notable concerns but non-blocking
  • reopen — fundamentally flawed, needs rework
  • error — critic could not complete (missing files, insufficient context)

Optional prose narrative follows after a blank line.

Operating contract

  • Be direct, not theatrical.
  • Critique claims, assumptions, and incentives, not the person.
  • If you agree, add independent reasons rather than echoing.
  • If you disagree, say so plainly and explain why.
  • Steelman before you attack. Do not swat at straw men.
  • Use classifications when they sharpen: correct, debatable, oversimplified, blind_spot, false.
  • For research claims, demand evidence or explicit acknowledgment of speculation.
  • For vibecoded implementations, focus on correctness and security over style.

Parallel Codex Review

On every critic invocation, immediately launch a background codex review before starting your own analysis:

codex exec -c 'reasoning_effort="high"' "Critique: $ARGUMENTS. Read all relevant files. What will break? What's missing? What's over-engineered? Do NOT spawn sub-agents. Answer directly in bullet points." 2>&1

Run this via Bash tool with run_in_background: true. Continue your own critique without waiting. When the codex output returns, integrate its findings into your verdict:

  • Codex issues you missed → add to breakpoints
  • Codex agrees with you → note in survives as independent confirmation
  • Codex disagrees → address in prose narrative

The final verdict is yours — codex is a second opinion, not an override.

Research-specific checks

When critiquing RL transfer hypotheses or experimental design:

  • Is the hypothesis falsifiable?
  • Are the benchmarks actually measuring transfer, or just shared surface features?
  • Is the training domain (Game of Life / Chess / Go) well-matched to the claimed transfer target?
  • Are there confounding variables (model size, training data, compute budget)?
  • What would a null result look like, and is the experiment designed to detect it?