init: README, CLAUDE.md, and claude skills (critic, gcloud, ax)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
5
.claude/settings.json
Normal file
5
.claude/settings.json
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
{
|
||||||
|
"permissions": {
|
||||||
|
"dangerouslySkipPermissions": true
|
||||||
|
}
|
||||||
|
}
|
||||||
113
.claude/skills/apes/ax/SKILL.md
Normal file
113
.claude/skills/apes/ax/SKILL.md
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
---
|
||||||
|
name: ax
|
||||||
|
description: Audit agent-facing docs, hooks, skills, and config for the apes platform against AX principles. Use when agent behavior is wrong due to missing/unclear docs, poor ergonomics, or misconfigured automation.
|
||||||
|
argument-hint: "[problem description]"
|
||||||
|
disable-model-invocation: true
|
||||||
|
---
|
||||||
|
|
||||||
|
# AX — Agent Experience Audit
|
||||||
|
|
||||||
|
Audit the apes project's Claude Code configuration — CLAUDE.md, hooks, skills, rules, permissions — against AX principles. For each finding, recommend the right mechanism to fix it.
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
- `$ARGUMENTS` — description of the AX problem (e.g., "agents keep deploying to wrong project"). If empty, run a general audit.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### Phase 1: AUDIT — Discover and score
|
||||||
|
|
||||||
|
#### 1a. Establish ground truth
|
||||||
|
|
||||||
|
Derive canonical workflows from:
|
||||||
|
- `docker-compose.yml` files on VMs (SSH to check)
|
||||||
|
- Any `Makefile`, `package.json`, `pyproject.toml` in repo
|
||||||
|
- Deployment scripts, CI pipelines
|
||||||
|
- GCP project config (`apes-platform`)
|
||||||
|
|
||||||
|
Ground truth is authoritative. If docs and automation disagree, fix docs.
|
||||||
|
|
||||||
|
#### 1b. Inventory agent-facing surfaces
|
||||||
|
|
||||||
|
Discover all Claude Code configuration:
|
||||||
|
|
||||||
|
**Documentation:** `CLAUDE.md`, `.claude/rules/*.md`, `README.md`
|
||||||
|
**Automation:** `.claude/settings.json`, hooks
|
||||||
|
**Skills:** `.claude/skills/*/SKILL.md`
|
||||||
|
**Commands:** `.claude/commands/*.md`
|
||||||
|
**Agents:** `.claude/agents/*.md`
|
||||||
|
**Memory:** `~/.claude/projects/*/memory/MEMORY.md`
|
||||||
|
|
||||||
|
If `$ARGUMENTS` is provided, focus on relevant surfaces.
|
||||||
|
|
||||||
|
#### 1c. Score against AX principles
|
||||||
|
|
||||||
|
| # | Principle | FAIL when... |
|
||||||
|
|---|-----------|--------------|
|
||||||
|
| 1 | Explicitness over convention | A non-standard workflow isn't called out explicitly |
|
||||||
|
| 2 | Fail fast with clear recovery | Errors lack concrete fix commands |
|
||||||
|
| 3 | Minimize context rot | CLAUDE.md adds tokens that don't earn their keep |
|
||||||
|
| 4 | Structured over unstructured | Important info buried in prose instead of tables/code blocks |
|
||||||
|
| 5 | Consistent patterns | Naming or formatting conventions shift across docs |
|
||||||
|
| 6 | Complete context at point of need | Critical commands missing where they're needed |
|
||||||
|
| 7 | Guard rails over documentation | Says "don't do X" but X would succeed — a hook or permission would be better |
|
||||||
|
| 8 | Single source of truth | Same info maintained in multiple places, or docs diverge from reality |
|
||||||
|
|
||||||
|
**Apes-specific checks:**
|
||||||
|
- GCP project/region/zone correct everywhere?
|
||||||
|
- Docker Compose configs on VMs match what docs describe?
|
||||||
|
- DNS records match what's deployed?
|
||||||
|
- No SaaS dependencies crept in?
|
||||||
|
|
||||||
|
### Phase 2: PROPOSE — Select mechanism and draft fixes
|
||||||
|
|
||||||
|
For each WARN or FAIL, select the right Claude Code mechanism:
|
||||||
|
|
||||||
|
| If the finding is... | Use this mechanism |
|
||||||
|
|---|---|
|
||||||
|
| Block forbidden actions | **PreToolUse hook** |
|
||||||
|
| Dangerous command that should never run | **Permission deny rule** |
|
||||||
|
| Auto-format/lint/test after edits | **PostToolUse hook** |
|
||||||
|
| File-type-specific convention | **`.claude/rules/*.md`** with `paths` frontmatter |
|
||||||
|
| Repeatable workflow or reference | **Skill** |
|
||||||
|
| Complex task needing isolation | **Subagent** |
|
||||||
|
| Critical context surviving compaction | **CLAUDE.md** |
|
||||||
|
| Universal project convention | **CLAUDE.md** (keep <200 lines) |
|
||||||
|
|
||||||
|
Each fix must include:
|
||||||
|
- Which principle it addresses
|
||||||
|
- The selected mechanism and why
|
||||||
|
- Exact implementation (file path + content)
|
||||||
|
|
||||||
|
### Phase 3: REPORT
|
||||||
|
|
||||||
|
```
|
||||||
|
# AX Audit Report — apes
|
||||||
|
|
||||||
|
**Surfaces audited:** <count>
|
||||||
|
|
||||||
|
## Scorecard
|
||||||
|
|
||||||
|
| # | Principle | Rating | Detail |
|
||||||
|
|---|-----------|--------|--------|
|
||||||
|
| 1-8 | ... | PASS/WARN/FAIL | ... |
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
| Surface | Issues | Recommended mechanism |
|
||||||
|
|---------|--------|----------------------|
|
||||||
|
| ... | ... | ... |
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
For each:
|
||||||
|
- Principle addressed
|
||||||
|
- Mechanism type
|
||||||
|
- Exact implementation (file + content)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- This skill is **read-only** — it never modifies files, only reports
|
||||||
|
- Apes-specific: verify no SaaS dependencies in recommendations
|
||||||
|
- Verify GCP infra state via SSH before reporting on deployed services
|
||||||
66
.claude/skills/apes/critic/SKILL.md
Normal file
66
.claude/skills/apes/critic/SKILL.md
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
---
|
||||||
|
name: critic
|
||||||
|
description: Stress-test research hypotheses, architecture decisions, and vibecoded implementations with adversarial-but-fair critique. Returns structured JSON verdicts. Use for RL transfer claims, infra tradeoffs, or any low-confidence moment.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Critic
|
||||||
|
|
||||||
|
Use this skill when the job is to make reasoning stronger, not to keep the conversation comfortable.
|
||||||
|
|
||||||
|
## Good fits
|
||||||
|
|
||||||
|
- RL transfer hypothesis validation — "will training on Go actually help with planning benchmarks?"
|
||||||
|
- architecture tradeoffs — self-hosted vs managed, monolith vs services
|
||||||
|
- vibecoded implementation review — "this works but was generated fast, is it sound?"
|
||||||
|
- research design — experimental methodology, benchmark selection, control groups
|
||||||
|
- infra decisions — GCP resource sizing, networking, security posture
|
||||||
|
- **ad-hoc low-confidence moments**: code behaving unexpectedly, ambiguous requirements, multiple valid approaches
|
||||||
|
|
||||||
|
## Do not use for
|
||||||
|
|
||||||
|
- routine implementation work
|
||||||
|
- simple factual lookup
|
||||||
|
- emotionally sensitive moments where critique is not the task
|
||||||
|
|
||||||
|
## Output contract
|
||||||
|
|
||||||
|
The critic always returns a JSON object as the first block in its response:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"verdict": "proceed | hold | flag | reopen",
|
||||||
|
"confidence": 0.0,
|
||||||
|
"breakpoints": ["issue 1", "issue 2"],
|
||||||
|
"survives": ["strength 1", "strength 2"],
|
||||||
|
"recommendation": "one-line action"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Verdicts:
|
||||||
|
- **proceed** — no blocking issues
|
||||||
|
- **hold** — do not proceed until breakpoints resolved
|
||||||
|
- **flag** — notable concerns but non-blocking
|
||||||
|
- **reopen** — fundamentally flawed, needs rework
|
||||||
|
- **error** — critic could not complete (missing files, insufficient context)
|
||||||
|
|
||||||
|
Optional prose narrative follows after a blank line.
|
||||||
|
|
||||||
|
## Operating contract
|
||||||
|
|
||||||
|
- Be direct, not theatrical.
|
||||||
|
- Critique claims, assumptions, and incentives, not the person.
|
||||||
|
- If you agree, add independent reasons rather than echoing.
|
||||||
|
- If you disagree, say so plainly and explain why.
|
||||||
|
- Steelman before you attack. Do not swat at straw men.
|
||||||
|
- Use classifications when they sharpen: `correct`, `debatable`, `oversimplified`, `blind_spot`, `false`.
|
||||||
|
- For research claims, demand evidence or explicit acknowledgment of speculation.
|
||||||
|
- For vibecoded implementations, focus on correctness and security over style.
|
||||||
|
|
||||||
|
## Research-specific checks
|
||||||
|
|
||||||
|
When critiquing RL transfer hypotheses or experimental design:
|
||||||
|
- Is the hypothesis falsifiable?
|
||||||
|
- Are the benchmarks actually measuring transfer, or just shared surface features?
|
||||||
|
- Is the training domain (Game of Life / Chess / Go) well-matched to the claimed transfer target?
|
||||||
|
- Are there confounding variables (model size, training data, compute budget)?
|
||||||
|
- What would a null result look like, and is the experiment designed to detect it?
|
||||||
94
.claude/skills/gcloud/SKILL.md
Normal file
94
.claude/skills/gcloud/SKILL.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
# gcloud Skill
|
||||||
|
|
||||||
|
Common GCP patterns for the apes platform. All commands invoke gcloud/kubectl/docker directly via Bash.
|
||||||
|
|
||||||
|
**Project:** `apes-platform`
|
||||||
|
**Region:** `europe-west1`
|
||||||
|
**Zone:** `europe-west1-b`
|
||||||
|
|
||||||
|
## Current Infrastructure
|
||||||
|
|
||||||
|
| Service | Host | VM | IP |
|
||||||
|
|---------|------|----|----|
|
||||||
|
| Gitea | git.unslope.com | gitea-vm | 34.78.255.104 |
|
||||||
|
| Chat (planned) | apes.unslope.com | TBD | TBD |
|
||||||
|
|
||||||
|
## SSH into VMs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Gitea VM
|
||||||
|
gcloud compute ssh gitea-vm --zone=europe-west1-b --project=apes-platform
|
||||||
|
|
||||||
|
# Run a command remotely
|
||||||
|
gcloud compute ssh gitea-vm --zone=europe-west1-b --project=apes-platform --command="sudo docker ps"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Docker Compose on VMs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restart a service
|
||||||
|
gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform \
|
||||||
|
--command="sudo bash -c 'cd /opt/<service> && docker compose restart <container>'"
|
||||||
|
|
||||||
|
# View logs
|
||||||
|
gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform \
|
||||||
|
--command="sudo docker logs <container> --tail 50"
|
||||||
|
|
||||||
|
# Full redeploy
|
||||||
|
gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform \
|
||||||
|
--command="sudo bash -c 'cd /opt/<service> && docker compose pull && docker compose up -d'"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Static IPs & DNS
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Reserve a new static IP
|
||||||
|
gcloud compute addresses create <name> --region=europe-west1 --project=apes-platform
|
||||||
|
|
||||||
|
# Get IP value
|
||||||
|
gcloud compute addresses describe <name> --region=europe-west1 --project=apes-platform --format='value(address)'
|
||||||
|
|
||||||
|
# DNS: add A record at Namecheap (Advanced DNS tab) pointing subdomain to IP
|
||||||
|
```
|
||||||
|
|
||||||
|
## Firewall Rules
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List rules
|
||||||
|
gcloud compute firewall-rules list --project=apes-platform
|
||||||
|
|
||||||
|
# Open a port
|
||||||
|
gcloud compute firewall-rules create <name> --allow=tcp:<port> --target-tags=web-server --project=apes-platform
|
||||||
|
```
|
||||||
|
|
||||||
|
## New VM Pattern
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gcloud compute instances create <name> \
|
||||||
|
--project=apes-platform \
|
||||||
|
--zone=europe-west1-b \
|
||||||
|
--machine-type=e2-small \
|
||||||
|
--image-family=debian-12 \
|
||||||
|
--image-project=debian-cloud \
|
||||||
|
--boot-disk-size=20GB \
|
||||||
|
--tags=web-server \
|
||||||
|
--address=<static-ip-name> \
|
||||||
|
--metadata-from-file=startup-script=<script-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
## IAM
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gcloud auth list
|
||||||
|
gcloud projects get-iam-policy apes-platform --format=json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
| Error | Fix |
|
||||||
|
|-------|-----|
|
||||||
|
| VM SSH timeout | Check firewall: `gcloud compute firewall-rules list --project=apes-platform` |
|
||||||
|
| Docker not running | SSH in, run `sudo systemctl start docker` |
|
||||||
|
| Caddy cert failed | Check DNS propagation: `dig @dns1.registrar-servers.com <domain> A +short` |
|
||||||
|
| Container not starting | Check logs: `sudo docker logs <container> --tail 50` |
|
||||||
|
| DNS not resolving | Flush local cache: `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` |
|
||||||
54
CLAUDE.md
Normal file
54
CLAUDE.md
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## What This Is
|
||||||
|
|
||||||
|
`apes` is a post-singularity research platform. No SaaS. Everything vibecoded. Self-hosted on GCP.
|
||||||
|
|
||||||
|
**Research goal:** Prove that RL training an LLM on formal games (Game of Life, Chess, Go) transfers to general capabilities/benchmarks.
|
||||||
|
|
||||||
|
**Team:** Benji and Neeraj (the apes). Claude Code agents are first-class collaborators.
|
||||||
|
|
||||||
|
## Infrastructure
|
||||||
|
|
||||||
|
| Service | URL | VM | Zone |
|
||||||
|
|---------|-----|----|----|
|
||||||
|
| Gitea | git.unslope.com | gitea-vm | europe-west1-b |
|
||||||
|
| Chat (planned) | apes.unslope.com | TBD | europe-west1-b |
|
||||||
|
|
||||||
|
**GCP project:** `apes-platform`
|
||||||
|
**Region:** `europe-west1`
|
||||||
|
**DNS:** Namecheap (Advanced DNS tab for A records)
|
||||||
|
|
||||||
|
## Non-Negotiable Rules
|
||||||
|
|
||||||
|
- **No SaaS.** If we can't self-host it, we don't use it.
|
||||||
|
- **Vibecoded.** Humans direct, agents build. Move fast, verify correctness.
|
||||||
|
- **GCP project is `apes-platform`.** Always pass `--project=apes-platform`.
|
||||||
|
- **Region is `europe-west1`.** Zone `europe-west1-b` unless there's a reason to change.
|
||||||
|
|
||||||
|
## Route By Task
|
||||||
|
|
||||||
|
| Need | Load |
|
||||||
|
|------|------|
|
||||||
|
| GCP commands | `/apes:gcloud` skill |
|
||||||
|
| Stress-test a decision | `/apes:critic` skill |
|
||||||
|
| Audit agent config quality | `/apes:ax` skill |
|
||||||
|
|
||||||
|
## Deployment Pattern
|
||||||
|
|
||||||
|
All services run as Docker Compose on GCP Compute Engine VMs behind Caddy (auto HTTPS via Let's Encrypt).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# SSH into a VM
|
||||||
|
gcloud compute ssh <vm> --zone=europe-west1-b --project=apes-platform
|
||||||
|
|
||||||
|
# Manage services
|
||||||
|
sudo bash -c 'cd /opt/<service> && docker compose up -d'
|
||||||
|
sudo docker logs <container> --tail 50
|
||||||
|
```
|
||||||
|
|
||||||
|
## Critic Reflex
|
||||||
|
|
||||||
|
When something is surprising, contradictory, or your confidence is low, use the `/apes:critic` skill before proceeding. Good triggers: vibecoded code behaving unexpectedly, multiple valid architectures, research methodology questions.
|
||||||
51
README.md
Normal file
51
README.md
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
# 🐒 apes
|
||||||
|
|
||||||
|
**post-singularity research platform — vibecoded, self-hosted, no SaaS**
|
||||||
|
|
||||||
|
> the apes are the humans. everything else is smarter than us now. might as well own it.
|
||||||
|
|
||||||
|
## what is this
|
||||||
|
|
||||||
|
a research project by benji and neeraj to test a hypothesis:
|
||||||
|
|
||||||
|
**can you RL an LLM on formal games (game of life, chess, go) and have those capabilities transfer to general benchmarks?**
|
||||||
|
|
||||||
|
the twist: this entire project is vibecoded. no SaaS. no managed services. just apes prompting agents to build everything from scratch on GCP.
|
||||||
|
|
||||||
|
the platform itself is proof of concept #0 — if apes can vibe a full collaboration stack into existence, the methodology holds.
|
||||||
|
|
||||||
|
## live at
|
||||||
|
|
||||||
|
[apes.unslope.com](https://apes.unslope.com)
|
||||||
|
|
||||||
|
## phase 1: the colony
|
||||||
|
|
||||||
|
a self-hosted slack clone deployed on GCP where apes and claude code agents talk together as peers.
|
||||||
|
|
||||||
|
- real-time messaging
|
||||||
|
- humans and agents in the same channels
|
||||||
|
- zero external SaaS dependencies
|
||||||
|
- fully vibecoded
|
||||||
|
|
||||||
|
## the research
|
||||||
|
|
||||||
|
| domain | why |
|
||||||
|
|--------|-----|
|
||||||
|
| game of life | emergent complexity from simple rules — can an LLM learn to reason about emergence? |
|
||||||
|
| chess | deep tactical + strategic reasoning with perfect information |
|
||||||
|
| go | intuition, pattern recognition, vast search spaces |
|
||||||
|
|
||||||
|
**hypothesis:** RL on these domains produces capabilities that transfer to reasoning, planning, and problem-solving benchmarks beyond the games themselves.
|
||||||
|
|
||||||
|
## philosophy
|
||||||
|
|
||||||
|
- **no SaaS** — if we can't build it, we don't use it
|
||||||
|
- **vibecoded** — humans direct, agents build
|
||||||
|
- **apes first** — the platform serves the humans, not the other way around
|
||||||
|
- **self-hosted** — runs on GCP, owned by us
|
||||||
|
|
||||||
|
## team
|
||||||
|
|
||||||
|
- **benji** — ape
|
||||||
|
- **neeraj** — ape
|
||||||
|
- **claude code** — the smart one
|
||||||
Reference in New Issue
Block a user