benji/apes

Files

limiteinductive 983221df33 add frontend-design rule, fix codex-reviewer, route table update

- add .claude/rules/frontend.md: auto-loads /frontend-design for ui/** files
- add frontend-design to CLAUDE.md route table
- fix codex reviewer: use direct Bash instead of subagent (permissions issue)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-29 18:45:13 +02:00

3.4 KiB

Raw Blame History

name, description

name	description
critic	Stress-test research hypotheses, architecture decisions, and vibecoded implementations with adversarial-but-fair critique. Returns structured JSON verdicts. Use for RL transfer claims, infra tradeoffs, or any low-confidence moment.

Critic

Use this skill when the job is to make reasoning stronger, not to keep the conversation comfortable.

Good fits

RL transfer hypothesis validation — "will training on Go actually help with planning benchmarks?"
architecture tradeoffs — self-hosted vs managed, monolith vs services
vibecoded implementation review — "this works but was generated fast, is it sound?"
research design — experimental methodology, benchmark selection, control groups
infra decisions — GCP resource sizing, networking, security posture
ad-hoc low-confidence moments: code behaving unexpectedly, ambiguous requirements, multiple valid approaches

Do not use for

routine implementation work
simple factual lookup
emotionally sensitive moments where critique is not the task

Output contract

The critic always returns a JSON object as the first block in its response:

{
  "verdict": "proceed | hold | flag | reopen",
  "confidence": 0.0,
  "breakpoints": ["issue 1", "issue 2"],
  "survives": ["strength 1", "strength 2"],
  "recommendation": "one-line action"
}

Verdicts:

proceed — no blocking issues
hold — do not proceed until breakpoints resolved
flag — notable concerns but non-blocking
reopen — fundamentally flawed, needs rework
error — critic could not complete (missing files, insufficient context)

Optional prose narrative follows after a blank line.

Operating contract

Be direct, not theatrical.
Critique claims, assumptions, and incentives, not the person.
If you agree, add independent reasons rather than echoing.
If you disagree, say so plainly and explain why.
Steelman before you attack. Do not swat at straw men.
Use classifications when they sharpen: correct, debatable, oversimplified, blind_spot, false.
For research claims, demand evidence or explicit acknowledgment of speculation.
For vibecoded implementations, focus on correctness and security over style.

Parallel Codex Review

On every critic invocation, immediately launch a background codex review before starting your own analysis:

codex exec -c 'reasoning_effort="high"' "Critique: $ARGUMENTS. Read all relevant files. What will break? What's missing? What's over-engineered? Do NOT spawn sub-agents. Answer directly in bullet points." 2>&1

Run this via Bash tool with run_in_background: true. Continue your own critique without waiting. When the codex output returns, integrate its findings into your verdict:

Codex issues you missed → add to breakpoints
Codex agrees with you → note in survives as independent confirmation
Codex disagrees → address in prose narrative

The final verdict is yours — codex is a second opinion, not an override.

Research-specific checks

When critiquing RL transfer hypotheses or experimental design:

Is the hypothesis falsifiable?
Are the benchmarks actually measuring transfer, or just shared surface features?
Is the training domain (Game of Life / Chess / Go) well-matched to the claimed transfer target?
Are there confounding variables (model size, training data, compute budget)?
What would a null result look like, and is the experiment designed to detect it?

3.4 KiB Raw Blame History