commit 122290bae9065572e05fb8c03e86a86a74a1d082 Author: limiteinductive Date: Sun Mar 29 18:02:14 2026 +0200 init: README, CLAUDE.md, and claude skills (critic, gcloud, ax) Co-Authored-By: Claude Opus 4.6 (1M context) diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 0000000..174bd5e --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,5 @@ +{ + "permissions": { + "dangerouslySkipPermissions": true + } +} diff --git a/.claude/skills/apes/ax/SKILL.md b/.claude/skills/apes/ax/SKILL.md new file mode 100644 index 0000000..fad3c3d --- /dev/null +++ b/.claude/skills/apes/ax/SKILL.md @@ -0,0 +1,113 @@ +--- +name: ax +description: Audit agent-facing docs, hooks, skills, and config for the apes platform against AX principles. Use when agent behavior is wrong due to missing/unclear docs, poor ergonomics, or misconfigured automation. +argument-hint: "[problem description]" +disable-model-invocation: true +--- + +# AX — Agent Experience Audit + +Audit the apes project's Claude Code configuration — CLAUDE.md, hooks, skills, rules, permissions — against AX principles. For each finding, recommend the right mechanism to fix it. + +## Arguments + +- `$ARGUMENTS` — description of the AX problem (e.g., "agents keep deploying to wrong project"). If empty, run a general audit. + +## Workflow + +### Phase 1: AUDIT — Discover and score + +#### 1a. Establish ground truth + +Derive canonical workflows from: +- `docker-compose.yml` files on VMs (SSH to check) +- Any `Makefile`, `package.json`, `pyproject.toml` in repo +- Deployment scripts, CI pipelines +- GCP project config (`apes-platform`) + +Ground truth is authoritative. If docs and automation disagree, fix docs. + +#### 1b. Inventory agent-facing surfaces + +Discover all Claude Code configuration: + +**Documentation:** `CLAUDE.md`, `.claude/rules/*.md`, `README.md` +**Automation:** `.claude/settings.json`, hooks +**Skills:** `.claude/skills/*/SKILL.md` +**Commands:** `.claude/commands/*.md` +**Agents:** `.claude/agents/*.md` +**Memory:** `~/.claude/projects/*/memory/MEMORY.md` + +If `$ARGUMENTS` is provided, focus on relevant surfaces. + +#### 1c. Score against AX principles + +| # | Principle | FAIL when... | +|---|-----------|--------------| +| 1 | Explicitness over convention | A non-standard workflow isn't called out explicitly | +| 2 | Fail fast with clear recovery | Errors lack concrete fix commands | +| 3 | Minimize context rot | CLAUDE.md adds tokens that don't earn their keep | +| 4 | Structured over unstructured | Important info buried in prose instead of tables/code blocks | +| 5 | Consistent patterns | Naming or formatting conventions shift across docs | +| 6 | Complete context at point of need | Critical commands missing where they're needed | +| 7 | Guard rails over documentation | Says "don't do X" but X would succeed — a hook or permission would be better | +| 8 | Single source of truth | Same info maintained in multiple places, or docs diverge from reality | + +**Apes-specific checks:** +- GCP project/region/zone correct everywhere? +- Docker Compose configs on VMs match what docs describe? +- DNS records match what's deployed? +- No SaaS dependencies crept in? + +### Phase 2: PROPOSE — Select mechanism and draft fixes + +For each WARN or FAIL, select the right Claude Code mechanism: + +| If the finding is... | Use this mechanism | +|---|---| +| Block forbidden actions | **PreToolUse hook** | +| Dangerous command that should never run | **Permission deny rule** | +| Auto-format/lint/test after edits | **PostToolUse hook** | +| File-type-specific convention | **`.claude/rules/*.md`** with `paths` frontmatter | +| Repeatable workflow or reference | **Skill** | +| Complex task needing isolation | **Subagent** | +| Critical context surviving compaction | **CLAUDE.md** | +| Universal project convention | **CLAUDE.md** (keep <200 lines) | + +Each fix must include: +- Which principle it addresses +- The selected mechanism and why +- Exact implementation (file path + content) + +### Phase 3: REPORT + +``` +# AX Audit Report — apes + +**Surfaces audited:** + +## Scorecard + +| # | Principle | Rating | Detail | +|---|-----------|--------|--------| +| 1-8 | ... | PASS/WARN/FAIL | ... | + +## Findings + +| Surface | Issues | Recommended mechanism | +|---------|--------|----------------------| +| ... | ... | ... | + +## Recommendations + +For each: +- Principle addressed +- Mechanism type +- Exact implementation (file + content) +``` + +## Constraints + +- This skill is **read-only** — it never modifies files, only reports +- Apes-specific: verify no SaaS dependencies in recommendations +- Verify GCP infra state via SSH before reporting on deployed services diff --git a/.claude/skills/apes/critic/SKILL.md b/.claude/skills/apes/critic/SKILL.md new file mode 100644 index 0000000..38193f2 --- /dev/null +++ b/.claude/skills/apes/critic/SKILL.md @@ -0,0 +1,66 @@ +--- +name: critic +description: Stress-test research hypotheses, architecture decisions, and vibecoded implementations with adversarial-but-fair critique. Returns structured JSON verdicts. Use for RL transfer claims, infra tradeoffs, or any low-confidence moment. +--- + +# Critic + +Use this skill when the job is to make reasoning stronger, not to keep the conversation comfortable. + +## Good fits + +- RL transfer hypothesis validation — "will training on Go actually help with planning benchmarks?" +- architecture tradeoffs — self-hosted vs managed, monolith vs services +- vibecoded implementation review — "this works but was generated fast, is it sound?" +- research design — experimental methodology, benchmark selection, control groups +- infra decisions — GCP resource sizing, networking, security posture +- **ad-hoc low-confidence moments**: code behaving unexpectedly, ambiguous requirements, multiple valid approaches + +## Do not use for + +- routine implementation work +- simple factual lookup +- emotionally sensitive moments where critique is not the task + +## Output contract + +The critic always returns a JSON object as the first block in its response: + +```json +{ + "verdict": "proceed | hold | flag | reopen", + "confidence": 0.0, + "breakpoints": ["issue 1", "issue 2"], + "survives": ["strength 1", "strength 2"], + "recommendation": "one-line action" +} +``` + +Verdicts: +- **proceed** — no blocking issues +- **hold** — do not proceed until breakpoints resolved +- **flag** — notable concerns but non-blocking +- **reopen** — fundamentally flawed, needs rework +- **error** — critic could not complete (missing files, insufficient context) + +Optional prose narrative follows after a blank line. + +## Operating contract + +- Be direct, not theatrical. +- Critique claims, assumptions, and incentives, not the person. +- If you agree, add independent reasons rather than echoing. +- If you disagree, say so plainly and explain why. +- Steelman before you attack. Do not swat at straw men. +- Use classifications when they sharpen: `correct`, `debatable`, `oversimplified`, `blind_spot`, `false`. +- For research claims, demand evidence or explicit acknowledgment of speculation. +- For vibecoded implementations, focus on correctness and security over style. + +## Research-specific checks + +When critiquing RL transfer hypotheses or experimental design: +- Is the hypothesis falsifiable? +- Are the benchmarks actually measuring transfer, or just shared surface features? +- Is the training domain (Game of Life / Chess / Go) well-matched to the claimed transfer target? +- Are there confounding variables (model size, training data, compute budget)? +- What would a null result look like, and is the experiment designed to detect it? diff --git a/.claude/skills/gcloud/SKILL.md b/.claude/skills/gcloud/SKILL.md new file mode 100644 index 0000000..84ee5b7 --- /dev/null +++ b/.claude/skills/gcloud/SKILL.md @@ -0,0 +1,94 @@ +# gcloud Skill + +Common GCP patterns for the apes platform. All commands invoke gcloud/kubectl/docker directly via Bash. + +**Project:** `apes-platform` +**Region:** `europe-west1` +**Zone:** `europe-west1-b` + +## Current Infrastructure + +| Service | Host | VM | IP | +|---------|------|----|----| +| Gitea | git.unslope.com | gitea-vm | 34.78.255.104 | +| Chat (planned) | apes.unslope.com | TBD | TBD | + +## SSH into VMs + +```bash +# Gitea VM +gcloud compute ssh gitea-vm --zone=europe-west1-b --project=apes-platform + +# Run a command remotely +gcloud compute ssh gitea-vm --zone=europe-west1-b --project=apes-platform --command="sudo docker ps" +``` + +## Docker Compose on VMs + +```bash +# Restart a service +gcloud compute ssh --zone=europe-west1-b --project=apes-platform \ + --command="sudo bash -c 'cd /opt/ && docker compose restart '" + +# View logs +gcloud compute ssh --zone=europe-west1-b --project=apes-platform \ + --command="sudo docker logs --tail 50" + +# Full redeploy +gcloud compute ssh --zone=europe-west1-b --project=apes-platform \ + --command="sudo bash -c 'cd /opt/ && docker compose pull && docker compose up -d'" +``` + +## Static IPs & DNS + +```bash +# Reserve a new static IP +gcloud compute addresses create --region=europe-west1 --project=apes-platform + +# Get IP value +gcloud compute addresses describe --region=europe-west1 --project=apes-platform --format='value(address)' + +# DNS: add A record at Namecheap (Advanced DNS tab) pointing subdomain to IP +``` + +## Firewall Rules + +```bash +# List rules +gcloud compute firewall-rules list --project=apes-platform + +# Open a port +gcloud compute firewall-rules create --allow=tcp: --target-tags=web-server --project=apes-platform +``` + +## New VM Pattern + +```bash +gcloud compute instances create \ + --project=apes-platform \ + --zone=europe-west1-b \ + --machine-type=e2-small \ + --image-family=debian-12 \ + --image-project=debian-cloud \ + --boot-disk-size=20GB \ + --tags=web-server \ + --address= \ + --metadata-from-file=startup-script= +``` + +## IAM + +```bash +gcloud auth list +gcloud projects get-iam-policy apes-platform --format=json +``` + +## Troubleshooting + +| Error | Fix | +|-------|-----| +| VM SSH timeout | Check firewall: `gcloud compute firewall-rules list --project=apes-platform` | +| Docker not running | SSH in, run `sudo systemctl start docker` | +| Caddy cert failed | Check DNS propagation: `dig @dns1.registrar-servers.com A +short` | +| Container not starting | Check logs: `sudo docker logs --tail 50` | +| DNS not resolving | Flush local cache: `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` | diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..0356a32 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,54 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What This Is + +`apes` is a post-singularity research platform. No SaaS. Everything vibecoded. Self-hosted on GCP. + +**Research goal:** Prove that RL training an LLM on formal games (Game of Life, Chess, Go) transfers to general capabilities/benchmarks. + +**Team:** Benji and Neeraj (the apes). Claude Code agents are first-class collaborators. + +## Infrastructure + +| Service | URL | VM | Zone | +|---------|-----|----|----| +| Gitea | git.unslope.com | gitea-vm | europe-west1-b | +| Chat (planned) | apes.unslope.com | TBD | europe-west1-b | + +**GCP project:** `apes-platform` +**Region:** `europe-west1` +**DNS:** Namecheap (Advanced DNS tab for A records) + +## Non-Negotiable Rules + +- **No SaaS.** If we can't self-host it, we don't use it. +- **Vibecoded.** Humans direct, agents build. Move fast, verify correctness. +- **GCP project is `apes-platform`.** Always pass `--project=apes-platform`. +- **Region is `europe-west1`.** Zone `europe-west1-b` unless there's a reason to change. + +## Route By Task + +| Need | Load | +|------|------| +| GCP commands | `/apes:gcloud` skill | +| Stress-test a decision | `/apes:critic` skill | +| Audit agent config quality | `/apes:ax` skill | + +## Deployment Pattern + +All services run as Docker Compose on GCP Compute Engine VMs behind Caddy (auto HTTPS via Let's Encrypt). + +```bash +# SSH into a VM +gcloud compute ssh --zone=europe-west1-b --project=apes-platform + +# Manage services +sudo bash -c 'cd /opt/ && docker compose up -d' +sudo docker logs --tail 50 +``` + +## Critic Reflex + +When something is surprising, contradictory, or your confidence is low, use the `/apes:critic` skill before proceeding. Good triggers: vibecoded code behaving unexpectedly, multiple valid architectures, research methodology questions. diff --git a/README.md b/README.md new file mode 100644 index 0000000..b51af39 --- /dev/null +++ b/README.md @@ -0,0 +1,51 @@ +# 🐒 apes + +**post-singularity research platform — vibecoded, self-hosted, no SaaS** + +> the apes are the humans. everything else is smarter than us now. might as well own it. + +## what is this + +a research project by benji and neeraj to test a hypothesis: + +**can you RL an LLM on formal games (game of life, chess, go) and have those capabilities transfer to general benchmarks?** + +the twist: this entire project is vibecoded. no SaaS. no managed services. just apes prompting agents to build everything from scratch on GCP. + +the platform itself is proof of concept #0 — if apes can vibe a full collaboration stack into existence, the methodology holds. + +## live at + +[apes.unslope.com](https://apes.unslope.com) + +## phase 1: the colony + +a self-hosted slack clone deployed on GCP where apes and claude code agents talk together as peers. + +- real-time messaging +- humans and agents in the same channels +- zero external SaaS dependencies +- fully vibecoded + +## the research + +| domain | why | +|--------|-----| +| game of life | emergent complexity from simple rules — can an LLM learn to reason about emergence? | +| chess | deep tactical + strategic reasoning with perfect information | +| go | intuition, pattern recognition, vast search spaces | + +**hypothesis:** RL on these domains produces capabilities that transfer to reasoning, planning, and problem-solving benchmarks beyond the games themselves. + +## philosophy + +- **no SaaS** — if we can't build it, we don't use it +- **vibecoded** — humans direct, agents build +- **apes first** — the platform serves the humans, not the other way around +- **self-hosted** — runs on GCP, owned by us + +## team + +- **benji** — ape +- **neeraj** — ape +- **claude code** — the smart one