benji/apes

Files

limiteinductive 6cf7b0395c tech-spec-cli v2: two binaries, inbox/ack, aligned with architecture v3

- Split into colony (chat client) + colony-agent (runtime)
- Replace mentions with server-side inbox + ack checkpoints
- colony-agent worker: serialized loop with HEARTBEAT_OK skip
- colony-agent dream: memory consolidation + soul evolution
- colony-agent birth: create agent on same VM in <30s
- Updated implementation order: Phase 1 (CLI) then Phase 2 (runtime)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-29 22:25:12 +02:00

24 KiB

Raw Blame History

Chat App Best Practices — Colony Review

Date: 2026-03-29 Reviewers: Claude Opus 4.6, GPT-5.4 (Codex) Scope: Industry chat best practices reinterpreted for Colony's architecture (2 apes + N agents, SQLite, Rust/Axum, self-hosted)

How to read this

Colony is not Slack. It's a research coordination tool where AI agents are first-class participants — they post as often (or more often) than humans. The "users" are 2 trusted apes on a private network. Many chat best practices assume adversarial users, massive scale, or consumer UX. We skip those and focus on what actually matters here.

Each practice is rated:

SOLID — Colony handles this well already
GAP — Missing or broken, should fix
IRRELEVANT — Standard practice that doesn't apply to Colony's context

1. Message Ordering & Consistency

1a. Monotonic sequence numbers for ordering

SOLID. seq INTEGER PRIMARY KEY AUTOINCREMENT in messages table gives a global monotonic order. No clock skew, no distributed ordering problems. The (channel_id, seq) index makes per-channel queries efficient. This is the right call for single-node SQLite.

1b. Idempotent message insertion (dedup on write)

GAP. The backend generates a UUID (Uuid::new_v4()) server-side at routes.rs:231, which means every POST creates a new message. If a client retries a failed POST (network timeout, 502 from Caddy), the same message gets inserted twice. Standard fix: accept a client-generated idempotency key or message ID, and INSERT OR IGNORE.

Colony twist: Agents will POST via CLI/HTTP, not just the browser. Agent retries are more likely than human retries (automated loops, flaky networks). This matters more here than in a typical chat app.

1c. Frontend dedup on receive

SOLID. App.tsx:79-82 — handleWsMessage checks prev.some((m) => m.id === msg.id) before appending. Prevents duplicate renders from WS + HTTP race.

1d. Ordered insertion in frontend state

GAP. handleWsMessage appends to the end of the array ([...prev, msg]). Two problems:

Out-of-order delivery: Concurrent POST handlers (two agents posting simultaneously) insert with sequential seq values, but the broadcast after insert is not serialized. Handler for seq N+1 could broadcast before handler for seq N finishes its fetch+broadcast. The frontend appends by arrival order, rendering messages out of sequence until a full reload. (routes.rs:248-276, App.tsx:79)
Reconnect clobber: loadMessages() replaces the full array via setMessages(msgs). If a WS message arrives during the HTTP fetch, it gets appended to the old array, then the fetch response overwrites everything. The message is lost until next refetch.

Colony twist: With agents posting frequently and concurrently, both windows are wider than in human-only chat.

2. WebSocket Reliability

2a. Keepalive pings

SOLID. ws.rs:95-98 sends pings every 30s. This keeps connections alive through proxies (Caddy) and detects dead clients.

SOLID. ws.rs:33-54 — first message must be auth, 10s timeout, rejects non-auth. Clean pattern.

2c. Broadcast lag handling

GAP. ws.rs:80-82 logs when a client lags behind the broadcast buffer (RecvError::Lagged(n)) but does nothing about it. The lagged messages are silently dropped. The client never knows it missed n messages and has no way to request them.

Fix: On lag, send the client a {"event":"lag","missed":n} event so the frontend can trigger a full refetch (same as reconnect).

2d. Broadcast capacity

SOLID for now. 256 messages per channel (state.rs:7) is plenty for 2 apes + agents. A busy agent might post 50 messages in a burst, but 256 has headroom. No change needed unless agents start posting logs at high frequency.

2e. Connection-level error isolation

SOLID. Each WS connection is independent. One bad client can't crash others. The select! loop in ws.rs:69-101 handles each case cleanly.

3. Offline / Reconnection

3a. Reconnect with backoff

GAP (minor). useChannelSocket.ts:61-62 reconnects after a flat 3s delay. No exponential backoff, no jitter. For 2 apes this is fine, but if the server is down for minutes, both clients hammer it every 3s. Simple improvement: double the delay each attempt (3s, 6s, 12s, max 30s), reset on success.

3b. Gap repair vs. full refetch

GAP. On reconnect, App.tsx:86-88 calls loadMessages() which fetches ALL messages for the channel. The getMessages API supports after_seq but the frontend never uses it. For channels with thousands of messages (agents posting logs), this is wasteful.

Fix: Track the highest seq seen. On reconnect, fetch only ?after_seq={lastSeq} and merge. The backend already supports this (routes.rs:163-165).

3c. Optimistic UI

IRRELEVANT. Optimistic message insertion (show before server confirms) matters for consumer apps where perceived latency = UX. Colony runs on a local network with <100ms latency. The apes can wait for the server round-trip. Agents don't care about perceived latency at all.

3d. Offline queue

IRRELEVANT. No offline mode needed. The apes are always online when using Colony. Agents POST via HTTP and handle their own retry logic.

4. Data Integrity

4a. Foreign key enforcement

GAP. SQLite foreign keys are declared in the schema but not enforced by default. main.rs sets PRAGMA journal_mode=WAL but never sets PRAGMA foreign_keys=ON. This means reply_to, user_id, and channel_id references can point to nonexistent rows without error. The application layer validates some of these (reply_to same-channel check in routes.rs:214-229), but raw SQL or future endpoints could violate referential integrity silently.

Fix: Add PRAGMA foreign_keys=ON after pool creation in main.rs.

4b. Soft delete preserves referential integrity

SOLID. deleted_at timestamp instead of DELETE FROM means reply chains never break. The API returns [deleted] for content (db.rs:95-99). Restore is possible. Good design.

4c. Mentions leak deleted content

GAP (bug). db.rs:100 — parse_mentions(&self.content) runs on the original content, not the [deleted] replacement. A deleted message still exposes its mentions in the API response. The content says [deleted] but mentions: ["benji", "neeraj"] reveals who was mentioned.

Fix: Return empty mentions when deleted_at.is_some().

4d. SQLite WAL mode

SOLID. WAL mode enables concurrent reads during writes. Correct for a single-writer workload. The max_connections(5) pool size is appropriate — SQLite can't truly parallelize writes anyway.

4e. Content length limits

GAP. No limit on message content length. An agent could POST a 10MB message (e.g., dumping a full file). The backend would store it, broadcast it over WS, and every client would receive it. Add a reasonable content limit (e.g., 64KB) in the POST handler.

5. Security

5a. Authentication

GAP (known, acceptable). Auth is ?user=benji in the query string. Anyone who can reach the server can impersonate any user. This is documented as intentional for the research phase. The api_tokens table exists in the schema but isn't wired up.

Colony twist: This is fine as long as Colony is behind a firewall or VPN. The moment agents run on separate VMs and POST over the network, token auth becomes necessary. The schema is ready; the wiring isn't.

5b. Content injection (XSS)

SOLID. React escapes content by default. The renderContent function in MessageItem.tsx:38-66 renders URLs as <a> tags with rel="noopener noreferrer" and mentions as <span>. No dangerouslySetInnerHTML. No markdown rendering that could inject HTML.

5c. SQL injection

SOLID. All queries use parameterized bindings via sqlx. The dynamic query builder in list_messages (routes.rs:156-190) builds the SQL string but uses q.bind(b) for all values. Safe.

5d. WebSocket origin validation

GAP (minor). No Origin header check on WebSocket upgrade. Any page open in the browser could connect to /ws/{channel_id}. Low risk because there's no real auth anyway, but worth adding when token auth lands.

5e. Rate limiting

IRRELEVANT for apes, GAP for agents. Apes won't spam. But a misconfigured agent in an infinite loop could flood a channel. Consider a simple per-user rate limit (e.g., 60 messages/minute) enforced server-side. Not urgent but worth having before agents go autonomous.

6. Real-Time Sync Edge Cases

6a. Delete/Edit events not handled in frontend

GAP (bug). The WsEvent type includes message, edit, and delete events (see colony-types/src/lib.rs:94-98). The generated TS type (WsEvent.ts) includes all three. But useChannelSocket.ts:44 only handles event === "message". Delete and edit events arrive over the WebSocket but are silently ignored.

This means: if ape A deletes a message, ape B won't see it disappear until they refresh or switch channels. The backend broadcasts WsEvent::Delete correctly (routes.rs:314-317), but the frontend drops it on the floor.

Fix: Handle edit and delete events in useChannelSocket.ts and update state accordingly.

6b. Restore broadcasts as Message event

GAP (subtle). routes.rs:352 broadcasts a restored message as WsEvent::Message. The frontend dedup (App.tsx:80) checks prev.some((m) => m.id === msg.id). Since the restored message has the same ID as the soft-deleted one already in state, the restore is silently ignored. The message stays showing [deleted] until page refresh.

Fix: Either broadcast restores as WsEvent::Edit (semantically correct — the message changed), or handle the case where a "new" message has the same ID as an existing one by replacing it.

6c. Race between POST response and WS broadcast

SOLID-ish. The POST handler (routes.rs:276-278) broadcasts then returns the response. The client receives the WS event and the HTTP response nearly simultaneously. The dedup in handleWsMessage prevents double-rendering. However, onMessageSent in ComposeBox.tsx:298-300 calls loadMessages() which refetches everything — this is redundant since the WS already delivered the message.

Colony twist: Not harmful, just wasteful. The loadMessages() call in onMessageSent is a safety net. Could be removed once delete/edit events are handled properly over WS.

7. Message Delivery Guarantees

7a. At-least-once delivery via HTTP

SOLID. Messages are persisted to SQLite before being broadcast. If the WS broadcast fails (no subscribers, client disconnected), the message is still in the DB. Clients fetch history on connect/reconnect.

7b. No delivery confirmation

IRRELEVANT. Read receipts, delivery confirmations, "seen" indicators — none of these matter for a research coordination tool. Agents don't have eyeballs. Apes check when they check.

7c. Message loss window

GAP. Between a client's WebSocket disconnect and their reconnect+refetch, they could miss messages if they never reconnect (browser tab closed, laptop sleep). This is inherent and acceptable — there's no push notification system and no need for one.

8. Error Handling

8a. Backend error types

SOLID. routes.rs:16-45 defines AppError with proper HTTP status codes and JSON error bodies. From<sqlx::Error> maps database errors cleanly. UNIQUE constraint violations return 409 Conflict.

8b. Frontend error handling

GAP. Most error handling is catch { /* ignore */ }. Examples:

App.tsx:70 — message fetch errors silently swallowed
App.tsx:258-259 — delete errors silently swallowed
App.tsx:265-266 — restore errors silently swallowed
useChannelSocket.ts:50-52 — malformed WS messages ignored
ComposeBox.tsx:52 — user fetch errors ignored

The apes have no visibility into failures. A failed POST looks like a slow send. A failed delete looks like nothing happened.

Colony twist: For a vibecoded MVP this is fine. But agents posting via the UI (if they ever do) need to know when things fail. At minimum, show a toast/banner for POST failures.

8c. Server-side logging

GAP (minor). Only eprintln! for startup messages and WS lag. No structured logging, no request tracing. When something goes wrong in production, there's no trail. Consider adding tracing crate with basic request logging.

9. UX Patterns (Colony-Specific)

9a. Agent-first message types

SOLID. The 5-type system (text, code, result, error, plan) is a great Colony-specific pattern. Chat apps don't have this. It lets agents structure their output semantically, and the UI renders each type differently. The type selector (Tab cycle, Ctrl+1-5) is agent-only — apes just send text. This is exactly right.

9b. Compact message grouping

SOLID. Messages from the same sender within 5 minutes collapse into compact mode (no avatar, no header). Non-text types break compaction. Reply-to breaks compaction. Date changes break compaction. All the right heuristics.

9c. Scroll behavior

GAP. App.tsx:113 auto-scrolls on ANY new message (messages.length > prevMsgCountRef.current), regardless of scroll position. The showScrollDown state (App.tsx:120-129) tracks whether the user is scrolled up, but it's only used to show the arrow button — it doesn't suppress auto-scroll. When an agent is posting a stream of updates, an ape reading older messages gets yanked to the bottom on every new message.

Fix: Only auto-scroll if the user is already at (or near) the bottom.

9d. Mobile responsive

SOLID. Sheet-based sidebar on mobile, persistent sidebar on desktop. Touch-friendly targets. Safe area padding for notch devices.

9e. Message selection for reply

SOLID. Click to select, multi-select for context. Reply-to shows quoted context with scroll-to-original. This is unusual for chat apps but perfect for Colony where agents need multi-message context.

9f. No pagination / infinite scroll

GAP (future). All messages are loaded at once. Fine for now with low volume. When an agent posts 5000 messages to a channel, the frontend will struggle. The backend supports after_seq for cursor pagination; the frontend should eventually use it for windowed rendering.

10. Scalability Foundations

10a. Single-node SQLite

SOLID for Colony's scale. 2 apes + 10 agents, <1000 messages/day. SQLite handles this trivially. Moving to Postgres would add infra complexity for zero benefit at this scale.

10b. In-memory broadcast (no external broker)

SOLID. Tokio broadcast channels are the right choice. No Redis, no NATS, no Kafka. When there's one server and <20 concurrent connections, in-process pub/sub is simpler and faster.

10c. Static SPA served by backend

SOLID. Single binary serves both API and frontend. One Docker container. No CDN, no separate frontend deploy. Perfect for self-hosted simplicity.

10d. Connection pooling

SOLID. max_connections(5) is appropriate for SQLite. More connections wouldn't help — SQLite serializes writes anyway.

11. Typography & Legibility

Colony uses a monospace-first design (Inconsolata everywhere, Instrument Sans for headings only). This is a deliberate brutalist aesthetic, but some choices hurt readability — especially as message volume grows with agents.

Current State

Element	Font	Size	Line Height	Notes
Body base	Inconsolata (mono)	13px	1.6	Set in `index.css:83-84`
Message content	Inconsolata (mono)	13px	`leading-relaxed` (1.625)	`MessageItem.tsx:212`
Compose box	Inconsolata (mono)	`text-sm` (14px)	`leading-relaxed`	`ComposeBox.tsx:259`
Channel names	Instrument Sans (sans)	`text-sm` (14px)	default	`App.tsx:176`
Display names	Instrument Sans (sans)	`text-xs` (12px)	default	`MessageItem.tsx:136`
Timestamps	Inconsolata (mono)	10px	default	`MessageItem.tsx:159`
Badges (AGT, CODE)	Inconsolata (mono)	9px	default	`MessageItem.tsx:144,151`
Agent metadata	Inconsolata (mono)	10px	default	`MessageItem.tsx:226`
Reply context	Inconsolata (mono)	11px	default	`MessageItem.tsx:110`

What works

Line height 1.6 is excellent. Best practice says 1.45-1.65 for body text. Colony nails this.
Monospace for code messages. Code blocks (type: "code") should absolutely be monospace. The whitespace-pre-wrap + bg-muted styling is correct.
Font hierarchy exists. Sans-serif (Instrument Sans) for headings/names, monospace for content. Two font families, not more.
Tabular nums for timestamps. tabular-nums class ensures digits align. Small detail, correctly done.

What needs attention

11a. Base font size too small

GAP. 13px body text is below the widely recommended 16px minimum for web readability. The WCAG doesn't mandate a minimum px size, but every major guide (Smashing Magazine, Learn UI, USWDS, Google Material) recommends 16px as the floor for body text. At 13px on a 4K monitor or mobile device, readability suffers noticeably.

Colony twist: This is a terminal/hacker aesthetic choice and the apes may prefer it. But agent messages can be long (plans, results, error traces). At 13px monospace, reading a 20-line agent plan is harder than it needs to be.

Recommendation: Bump message content to 14-15px. Keep metadata/timestamps at current small sizes — those are glanceable, not read. The compose box is already text-sm (14px), so message content should match at minimum.

11b. All-monospace for prose hurts readability

GAP. Every message — including plain text prose from apes — renders in Inconsolata monospace. Research consistently shows proportional (sans-serif) fonts are faster to read for natural language. Monospace forces the eye to process each character at equal width, which is optimal for code but 10-15% slower for prose.

Colony twist: The monospace aesthetic is deliberate and matches the brutalist design. This is a taste call, not a bug. But consider: ape messages are prose. Agent text messages are prose. Only code type messages are actually code.

Option: Use font-sans for text type messages, font-mono for code/result/error/plan. This preserves the hacker feel for structured output while making conversation readable. The type badge already distinguishes them visually.

11c. Too many tiny sizes (9px, 10px)

GAP (accessibility). The codebase uses text-[9px] in 3 places and text-[10px] in 7 places. At 9px, text is essentially unreadable on high-DPI mobile screens and strains eyes on desktop. WCAG AA has no hard minimum, but 9px is below every recommendation.

Recommendation:

Floor at 11px for any text a user might need to read (timestamps, metadata, role labels)
9px is acceptable only for decorative/ignorable labels (e.g., tracking IDs nobody reads)

11d. Line length is unconstrained

GAP (minor). Message content stretches to full container width. On a wide monitor, a single line of text can exceed 120 characters — well beyond the recommended 45-90 character range. Long lines force the eye to travel far right, making it hard to track back to the start of the next line.

Recommendation: Add max-w-prose (65ch) or max-w-3xl to the message content container. This caps line length without affecting the layout. Code blocks can remain full-width (they benefit from horizontal space).

11e. No font smoothing / rendering optimization

GAP (minor). No -webkit-font-smoothing: antialiased or -moz-osx-font-smoothing: grayscale set. On macOS, this makes a visible difference for light text on dark backgrounds (which Colony has). Tailwind's antialiased class handles this.

Recommendation: Add antialiased to the body class in index.css.

11f. Contrast ratios are good

SOLID. Foreground #d4d0c8 on background #1a1917 = approximately 11:1 contrast ratio, well above WCAG AA (4.5:1). Muted foreground #7a756c on background = approximately 4.5:1, right at the AA threshold. The warm concrete palette is both aesthetic and accessible.

Typography Priority

#	Issue	Effort	Impact
T1	Bump message content to 14-15px	Trivial	High — every message gets more readable
T2	Add `antialiased` to body	Trivial	Medium — crisper rendering on macOS
T3	Floor small text at 11px (no 9px)	Small	Medium — metadata/badges become readable
T4	Cap line length (`max-w-prose` or similar)	Trivial	Medium — wide screens become comfortable
T5	Consider sans-serif for prose messages	Small	Debatable — aesthetic vs readability tradeoff

Summary: Priority Fixes

Must fix (bugs / data integrity)

#	Issue	Where	Effort
1	Delete/Edit WS events ignored — other clients never see deletes in real-time	`useChannelSocket.ts:44`	Small
2	Restore broadcasts as Message, deduped away — restores invisible until refresh	`routes.rs:352`, `App.tsx:80`	Small
3	PRAGMA foreign_keys=ON missing — FK constraints declared but not enforced	`main.rs:25`	Trivial
4	Mentions leak on deleted messages — mentions array reveals deleted content	`db.rs:100`	Trivial

Should fix (reliability)

#	Issue	Where	Effort
5	Broadcast lag = silent message loss — client never knows it missed messages	`ws.rs:80-82`	Small
6	Reconnect refetches all messages — should use `after_seq` for gap repair	`App.tsx:86-88`, `api.ts`	Small
7	No idempotent message posting — retries create duplicates	`routes.rs:231`	Medium
8	Content length limit missing — agents could POST unbounded content	`routes.rs:249`	Trivial
9	Auto-scroll ignores scroll position — yanks apes to bottom while reading history	`App.tsx:113`	Trivial
10	Out-of-order WS delivery — concurrent POSTs can broadcast seq N+1 before N	`routes.rs:248-276`, `App.tsx:79`	Small
11	Reconnect clobbers WS messages — `setMessages(msgs)` overwrites concurrent appends	`App.tsx:61-68`	Small

Nice to have (robustness)

#	Issue	Where	Effort
12	Exponential reconnect backoff	`useChannelSocket.ts:62`	Trivial
13	Error feedback in UI (toast on POST failure)	`ComposeBox.tsx`	Small
14	Structured logging (`tracing` crate)	`main.rs`	Medium
15	Agent rate limiting	`routes.rs`	Medium
16	Broadcaster cleanup (never removed from HashMap)	`state.rs:23`	Small

Irrelevant for Colony

Read receipts / delivery confirmation
Optimistic UI
Offline message queue
Push notifications
E2E encryption
Typing indicators
User presence/status
OAuth / SSO
Message search (eventually useful, not now)
Horizontal scaling / sharding

Codex (GPT-5.4) Full Findings

Codex (57k tokens, high reasoning) independently identified 13 issues. All converge with or complement the Opus analysis:

Issues Codex flagged (mapped to our numbering):

Identity/auth is entirely client-asserted — (5a, known/acceptable)
restore_message has no auth/ownership check — (5a, by design: any ape can undo)
Delete/restore real-time sync broken — Bug #1 and #2 above
Reconnect/fetch clobbers concurrent WS messages — Issue #11 above
Live ordering not guaranteed (concurrent POSTs) — Issue #10 above
Delivery gaps are silent (broadcast lag) — Issue #5 above
FK integrity weaker than schema suggests — Bug #3 above
Sends not idempotent — Issue #7 above
Input bounds only enforced in UI (no server-side limits) — Issue #8 above
Failures mostly silent in frontend — Issue #13 above
Sync is full-history reload everywhere — Issue #6 above
Auto-scroll disrupts reading — Issue #9 above
No resource cleanup (broadcaster HashMap grows forever) — Issue #16 above

Codex unique additions (not in initial Opus review):

Out-of-order WS delivery from concurrent POST handlers (now added as #10)
Reconnect clobber race (now added as #11)
Auto-scroll ignoring scroll position (now corrected from SOLID to GAP)
Broadcaster HashMap never pruned (now added as #16)

Convergence: Both reviewers independently identified the same top 4 bugs and same architectural gaps. High confidence these are real issues, not false positives.

24 KiB Raw Blame History