init: README, CLAUDE.md, and claude skills (critic, gcloud, ax)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:02:14 +02:00
commit 122290bae9
6 changed files with 383 additions and 0 deletions
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -0,0 +1,5 @@
+{
+  "permissions": {
+    "dangerouslySkipPermissions": true
+  }
+}
--- a/.claude/skills/apes/ax/SKILL.md
+++ b/.claude/skills/apes/ax/SKILL.md
@@ -0,0 +1,113 @@
+---
+name: ax
+description: Audit agent-facing docs, hooks, skills, and config for the apes platform against AX principles. Use when agent behavior is wrong due to missing/unclear docs, poor ergonomics, or misconfigured automation.
+argument-hint: "[problem description]"
+disable-model-invocation: true
+---
+
+# AX — Agent Experience Audit
+
+Audit the apes project's Claude Code configuration — CLAUDE.md, hooks, skills, rules, permissions — against AX principles. For each finding, recommend the right mechanism to fix it.
+
+## Arguments
+
+- `$ARGUMENTS` — description of the AX problem (e.g., "agents keep deploying to wrong project"). If empty, run a general audit.
+
+## Workflow
+
+### Phase 1: AUDIT — Discover and score
+
+#### 1a. Establish ground truth
+
+Derive canonical workflows from:
+- `docker-compose.yml` files on VMs (SSH to check)
+- Any `Makefile`, `package.json`, `pyproject.toml` in repo
+- Deployment scripts, CI pipelines
+- GCP project config (`apes-platform`)
+
+Ground truth is authoritative. If docs and automation disagree, fix docs.
+
+#### 1b. Inventory agent-facing surfaces
+
+Discover all Claude Code configuration:
+
+**Documentation:** `CLAUDE.md`, `.claude/rules/*.md`, `README.md`
+**Automation:** `.claude/settings.json`, hooks
+**Skills:** `.claude/skills/*/SKILL.md`
+**Commands:** `.claude/commands/*.md`
+**Agents:** `.claude/agents/*.md`
+**Memory:** `~/.claude/projects/*/memory/MEMORY.md`
+
+If `$ARGUMENTS` is provided, focus on relevant surfaces.
+
+#### 1c. Score against AX principles
+
+| # | Principle | FAIL when... |
+|---|-----------|--------------|
+| 1 | Explicitness over convention | A non-standard workflow isn't called out explicitly |
+| 2 | Fail fast with clear recovery | Errors lack concrete fix commands |
+| 3 | Minimize context rot | CLAUDE.md adds tokens that don't earn their keep |
+| 4 | Structured over unstructured | Important info buried in prose instead of tables/code blocks |
+| 5 | Consistent patterns | Naming or formatting conventions shift across docs |
+| 6 | Complete context at point of need | Critical commands missing where they're needed |
+| 7 | Guard rails over documentation | Says "don't do X" but X would succeed — a hook or permission would be better |
+| 8 | Single source of truth | Same info maintained in multiple places, or docs diverge from reality |
+
+**Apes-specific checks:**
+- GCP project/region/zone correct everywhere?
+- Docker Compose configs on VMs match what docs describe?
+- DNS records match what's deployed?
+- No SaaS dependencies crept in?
+
+### Phase 2: PROPOSE — Select mechanism and draft fixes
+
+For each WARN or FAIL, select the right Claude Code mechanism:
+
+| If the finding is... | Use this mechanism |
+|---|---|
+| Block forbidden actions | **PreToolUse hook** |
+| Dangerous command that should never run | **Permission deny rule** |
+| Auto-format/lint/test after edits | **PostToolUse hook** |
+| File-type-specific convention | **`.claude/rules/*.md`** with `paths` frontmatter |
+| Repeatable workflow or reference | **Skill** |
+| Complex task needing isolation | **Subagent** |
+| Critical context surviving compaction | **CLAUDE.md** |
+| Universal project convention | **CLAUDE.md** (keep <200 lines) |
+
+Each fix must include:
+- Which principle it addresses
+- The selected mechanism and why
+- Exact implementation (file path + content)
+
+### Phase 3: REPORT
+
+```
+# AX Audit Report — apes
+
+**Surfaces audited:** <count>
+
+## Scorecard
+
+| # | Principle | Rating | Detail |
+|---|-----------|--------|--------|
+| 1-8 | ... | PASS/WARN/FAIL | ... |
+
+## Findings
+
+| Surface | Issues | Recommended mechanism |
+|---------|--------|----------------------|
+| ... | ... | ... |
+
+## Recommendations
+
+For each:
+- Principle addressed
+- Mechanism type
+- Exact implementation (file + content)
+```
+
+## Constraints
+
+- This skill is **read-only** — it never modifies files, only reports
+- Apes-specific: verify no SaaS dependencies in recommendations
+- Verify GCP infra state via SSH before reporting on deployed services
--- a/.claude/skills/apes/critic/SKILL.md
+++ b/.claude/skills/apes/critic/SKILL.md
@@ -0,0 +1,66 @@
+---
+name: critic
+description: Stress-test research hypotheses, architecture decisions, and vibecoded implementations with adversarial-but-fair critique. Returns structured JSON verdicts. Use for RL transfer claims, infra tradeoffs, or any low-confidence moment.
+---
+
+# Critic
+
+Use this skill when the job is to make reasoning stronger, not to keep the conversation comfortable.
+
+## Good fits
+
+- RL transfer hypothesis validation — "will training on Go actually help with planning benchmarks?"
+- architecture tradeoffs — self-hosted vs managed, monolith vs services
+- vibecoded implementation review — "this works but was generated fast, is it sound?"
+- research design — experimental methodology, benchmark selection, control groups
+- infra decisions — GCP resource sizing, networking, security posture
+- **ad-hoc low-confidence moments**: code behaving unexpectedly, ambiguous requirements, multiple valid approaches
+
+## Do not use for
+
+- routine implementation work
+- simple factual lookup
+- emotionally sensitive moments where critique is not the task
+
+## Output contract
+
+The critic always returns a JSON object as the first block in its response:
+
+```json
+{
+  "verdict": "proceed | hold | flag | reopen",
+  "confidence": 0.0,
+  "breakpoints": ["issue 1", "issue 2"],
+  "survives": ["strength 1", "strength 2"],
+  "recommendation": "one-line action"
+}
+```
+
+Verdicts:
+- **proceed** — no blocking issues
+- **hold** — do not proceed until breakpoints resolved
+- **flag** — notable concerns but non-blocking
+- **reopen** — fundamentally flawed, needs rework
+- **error** — critic could not complete (missing files, insufficient context)
+
+Optional prose narrative follows after a blank line.
+
+## Operating contract
+
+- Be direct, not theatrical.
+- Critique claims, assumptions, and incentives, not the person.
+- If you agree, add independent reasons rather than echoing.
+- If you disagree, say so plainly and explain why.
+- Steelman before you attack. Do not swat at straw men.
+- Use classifications when they sharpen: `correct`, `debatable`, `oversimplified`, `blind_spot`, `false`.
+- For research claims, demand evidence or explicit acknowledgment of speculation.
+- For vibecoded implementations, focus on correctness and security over style.
+
+## Research-specific checks
+
+When critiquing RL transfer hypotheses or experimental design:
+- Is the hypothesis falsifiable?
+- Are the benchmarks actually measuring transfer, or just shared surface features?
+- Is the training domain (Game of Life / Chess / Go) well-matched to the claimed transfer target?
+- Are there confounding variables (model size, training data, compute budget)?
+- What would a null result look like, and is the experiment designed to detect it?
--- a/.claude/skills/gcloud/SKILL.md
+++ b/.claude/skills/gcloud/SKILL.md
@@ -0,0 +1,94 @@
+# gcloud Skill
+
+Common GCP patterns for the apes platform. All commands invoke gcloud/kubectl/docker directly via Bash.
+
+**Project:** `apes-platform`
+**Region:** `europe-west1`
+**Zone:** `europe-west1-b`
+
+## Current Infrastructure
+
+| Service | Host | VM | IP |
+|---------|------|----|----|
+| Gitea | git.unslope.com | gitea-vm | 34.78.255.104 |
+| Chat (planned) | apes.unslope.com | TBD | TBD |
+
+## SSH into VMs
+
+```bash
+# Gitea VM
+gcloud compute ssh gitea-vm --zone=europe-west1-b --project=apes-platform
+
+# Run a command remotely
+gcloud compute ssh gitea-vm --zone=europe-west1-b --project=apes-platform --command="sudo docker ps"
+```
+
+## Docker Compose on VMs
+
+```bash
+# Restart a service
+gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform \
+  --command="sudo bash -c 'cd /opt/<service> && docker compose restart <container>'"
+
+# View logs
+gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform \
+  --command="sudo docker logs <container> --tail 50"
+
+# Full redeploy
+gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform \
+  --command="sudo bash -c 'cd /opt/<service> && docker compose pull && docker compose up -d'"
+```
+
+## Static IPs & DNS
+
+```bash
+# Reserve a new static IP
+gcloud compute addresses create <name> --region=europe-west1 --project=apes-platform
+
+# Get IP value
+gcloud compute addresses describe <name> --region=europe-west1 --project=apes-platform --format='value(address)'
+
+# DNS: add A record at Namecheap (Advanced DNS tab) pointing subdomain to IP
+```
+
+## Firewall Rules
+
+```bash
+# List rules
+gcloud compute firewall-rules list --project=apes-platform
+
+# Open a port
+gcloud compute firewall-rules create <name> --allow=tcp:<port> --target-tags=web-server --project=apes-platform
+```
+
+## New VM Pattern
+
+```bash
+gcloud compute instances create <name> \
+  --project=apes-platform \
+  --zone=europe-west1-b \
+  --machine-type=e2-small \
+  --image-family=debian-12 \
+  --image-project=debian-cloud \
+  --boot-disk-size=20GB \
+  --tags=web-server \
+  --address=<static-ip-name> \
+  --metadata-from-file=startup-script=<script-path>
+```
+
+## IAM
+
+```bash
+gcloud auth list
+gcloud projects get-iam-policy apes-platform --format=json
+```
+
+## Troubleshooting
+
+| Error | Fix |
+|-------|-----|
+| VM SSH timeout | Check firewall: `gcloud compute firewall-rules list --project=apes-platform` |
+| Docker not running | SSH in, run `sudo systemctl start docker` |
+| Caddy cert failed | Check DNS propagation: `dig @dns1.registrar-servers.com <domain> A +short` |
+| Container not starting | Check logs: `sudo docker logs <container> --tail 50` |
+| DNS not resolving | Flush local cache: `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` |
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,54 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## What This Is
+
+`apes` is a post-singularity research platform. No SaaS. Everything vibecoded. Self-hosted on GCP.
+
+**Research goal:** Prove that RL training an LLM on formal games (Game of Life, Chess, Go) transfers to general capabilities/benchmarks.
+
+**Team:** Benji and Neeraj (the apes). Claude Code agents are first-class collaborators.
+
+## Infrastructure
+
+| Service | URL | VM | Zone |
+|---------|-----|----|----|
+| Gitea | git.unslope.com | gitea-vm | europe-west1-b |
+| Chat (planned) | apes.unslope.com | TBD | europe-west1-b |
+
+**GCP project:** `apes-platform`
+**Region:** `europe-west1`
+**DNS:** Namecheap (Advanced DNS tab for A records)
+
+## Non-Negotiable Rules
+
+- **No SaaS.** If we can't self-host it, we don't use it.
+- **Vibecoded.** Humans direct, agents build. Move fast, verify correctness.
+- **GCP project is `apes-platform`.** Always pass `--project=apes-platform`.
+- **Region is `europe-west1`.** Zone `europe-west1-b` unless there's a reason to change.
+
+## Route By Task
+
+| Need | Load |
+|------|------|
+| GCP commands | `/apes:gcloud` skill |
+| Stress-test a decision | `/apes:critic` skill |
+| Audit agent config quality | `/apes:ax` skill |
+
+## Deployment Pattern
+
+All services run as Docker Compose on GCP Compute Engine VMs behind Caddy (auto HTTPS via Let's Encrypt).
+
+```bash
+# SSH into a VM
+gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform
+
+# Manage services
+sudo bash -c 'cd /opt/<service> && docker compose up -d'
+sudo docker logs <container> --tail 50
+```
+
+## Critic Reflex
+
+When something is surprising, contradictory, or your confidence is low, use the `/apes:critic` skill before proceeding. Good triggers: vibecoded code behaving unexpectedly, multiple valid architectures, research methodology questions.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,51 @@
+# 🐒 apes
+
+**post-singularity research platform — vibecoded, self-hosted, no SaaS**
+
+> the apes are the humans. everything else is smarter than us now. might as well own it.
+
+## what is this
+
+a research project by benji and neeraj to test a hypothesis:
+
+**can you RL an LLM on formal games (game of life, chess, go) and have those capabilities transfer to general benchmarks?**
+
+the twist: this entire project is vibecoded. no SaaS. no managed services. just apes prompting agents to build everything from scratch on GCP.
+
+the platform itself is proof of concept #0 — if apes can vibe a full collaboration stack into existence, the methodology holds.
+
+## live at
+
+[apes.unslope.com](https://apes.unslope.com)
+
+## phase 1: the colony
+
+a self-hosted slack clone deployed on GCP where apes and claude code agents talk together as peers.
+
+- real-time messaging
+- humans and agents in the same channels
+- zero external SaaS dependencies
+- fully vibecoded
+
+## the research
+
+| domain | why |
+|--------|-----|
+| game of life | emergent complexity from simple rules — can an LLM learn to reason about emergence? |
+| chess | deep tactical + strategic reasoning with perfect information |
+| go | intuition, pattern recognition, vast search spaces |
+
+**hypothesis:** RL on these domains produces capabilities that transfer to reasoning, planning, and problem-solving benchmarks beyond the games themselves.
+
+## philosophy
+
+- **no SaaS** — if we can't build it, we don't use it
+- **vibecoded** — humans direct, agents build
+- **apes first** — the platform serves the humans, not the other way around
+- **self-hosted** — runs on GCP, owned by us
+
+## team
+
+- **benji** — ape
+- **neeraj** — ape
+- **claude code** — the smart one