init: README, CLAUDE.md, and claude skills (critic, gcloud, ax)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:02:14 +02:00
commit 122290bae9
6 changed files with 383 additions and 0 deletions
--- a/.claude/skills/apes/critic/SKILL.md
+++ b/.claude/skills/apes/critic/SKILL.md
@@ -0,0 +1,66 @@
+---
+name: critic
+description: Stress-test research hypotheses, architecture decisions, and vibecoded implementations with adversarial-but-fair critique. Returns structured JSON verdicts. Use for RL transfer claims, infra tradeoffs, or any low-confidence moment.
+---
+
+# Critic
+
+Use this skill when the job is to make reasoning stronger, not to keep the conversation comfortable.
+
+## Good fits
+
+- RL transfer hypothesis validation — "will training on Go actually help with planning benchmarks?"
+- architecture tradeoffs — self-hosted vs managed, monolith vs services
+- vibecoded implementation review — "this works but was generated fast, is it sound?"
+- research design — experimental methodology, benchmark selection, control groups
+- infra decisions — GCP resource sizing, networking, security posture
+- **ad-hoc low-confidence moments**: code behaving unexpectedly, ambiguous requirements, multiple valid approaches
+
+## Do not use for
+
+- routine implementation work
+- simple factual lookup
+- emotionally sensitive moments where critique is not the task
+
+## Output contract
+
+The critic always returns a JSON object as the first block in its response:
+
+```json
+{
+  "verdict": "proceed | hold | flag | reopen",
+  "confidence": 0.0,
+  "breakpoints": ["issue 1", "issue 2"],
+  "survives": ["strength 1", "strength 2"],
+  "recommendation": "one-line action"
+}
+```
+
+Verdicts:
+- **proceed** — no blocking issues
+- **hold** — do not proceed until breakpoints resolved
+- **flag** — notable concerns but non-blocking
+- **reopen** — fundamentally flawed, needs rework
+- **error** — critic could not complete (missing files, insufficient context)
+
+Optional prose narrative follows after a blank line.
+
+## Operating contract
+
+- Be direct, not theatrical.
+- Critique claims, assumptions, and incentives, not the person.
+- If you agree, add independent reasons rather than echoing.
+- If you disagree, say so plainly and explain why.
+- Steelman before you attack. Do not swat at straw men.
+- Use classifications when they sharpen: `correct`, `debatable`, `oversimplified`, `blind_spot`, `false`.
+- For research claims, demand evidence or explicit acknowledgment of speculation.
+- For vibecoded implementations, focus on correctness and security over style.
+
+## Research-specific checks
+
+When critiquing RL transfer hypotheses or experimental design:
+- Is the hypothesis falsifiable?
+- Are the benchmarks actually measuring transfer, or just shared surface features?
+- Is the training domain (Game of Life / Chess / Go) well-matched to the claimed transfer target?
+- Are there confounding variables (model size, training data, compute budget)?
+- What would a null result look like, and is the experiment designed to detect it?