- Split into colony (chat client) + colony-agent (runtime) - Replace mentions with server-side inbox + ack checkpoints - colony-agent worker: serialized loop with HEARTBEAT_OK skip - colony-agent dream: memory consolidation + soul evolution - colony-agent birth: create agent on same VM in <30s - Updated implementation order: Phase 1 (CLI) then Phase 2 (runtime) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
24 KiB
Chat App Best Practices — Colony Review
Date: 2026-03-29 Reviewers: Claude Opus 4.6, GPT-5.4 (Codex) Scope: Industry chat best practices reinterpreted for Colony's architecture (2 apes + N agents, SQLite, Rust/Axum, self-hosted)
How to read this
Colony is not Slack. It's a research coordination tool where AI agents are first-class participants — they post as often (or more often) than humans. The "users" are 2 trusted apes on a private network. Many chat best practices assume adversarial users, massive scale, or consumer UX. We skip those and focus on what actually matters here.
Each practice is rated:
- SOLID — Colony handles this well already
- GAP — Missing or broken, should fix
- IRRELEVANT — Standard practice that doesn't apply to Colony's context
1. Message Ordering & Consistency
1a. Monotonic sequence numbers for ordering
SOLID. seq INTEGER PRIMARY KEY AUTOINCREMENT in messages table gives a global monotonic order. No clock skew, no distributed ordering problems. The (channel_id, seq) index makes per-channel queries efficient. This is the right call for single-node SQLite.
1b. Idempotent message insertion (dedup on write)
GAP. The backend generates a UUID (Uuid::new_v4()) server-side at routes.rs:231, which means every POST creates a new message. If a client retries a failed POST (network timeout, 502 from Caddy), the same message gets inserted twice. Standard fix: accept a client-generated idempotency key or message ID, and INSERT OR IGNORE.
Colony twist: Agents will POST via CLI/HTTP, not just the browser. Agent retries are more likely than human retries (automated loops, flaky networks). This matters more here than in a typical chat app.
1c. Frontend dedup on receive
SOLID. App.tsx:79-82 — handleWsMessage checks prev.some((m) => m.id === msg.id) before appending. Prevents duplicate renders from WS + HTTP race.
1d. Ordered insertion in frontend state
GAP. handleWsMessage appends to the end of the array ([...prev, msg]). Two problems:
-
Out-of-order delivery: Concurrent POST handlers (two agents posting simultaneously) insert with sequential
seqvalues, but the broadcast after insert is not serialized. Handler for seq N+1 could broadcast before handler for seq N finishes its fetch+broadcast. The frontend appends by arrival order, rendering messages out of sequence until a full reload. (routes.rs:248-276,App.tsx:79) -
Reconnect clobber:
loadMessages()replaces the full array viasetMessages(msgs). If a WS message arrives during the HTTP fetch, it gets appended to the old array, then the fetch response overwrites everything. The message is lost until next refetch.
Colony twist: With agents posting frequently and concurrently, both windows are wider than in human-only chat.
2. WebSocket Reliability
2a. Keepalive pings
SOLID. ws.rs:95-98 sends pings every 30s. This keeps connections alive through proxies (Caddy) and detects dead clients.
2b. Auth before subscribe
SOLID. ws.rs:33-54 — first message must be auth, 10s timeout, rejects non-auth. Clean pattern.
2c. Broadcast lag handling
GAP. ws.rs:80-82 logs when a client lags behind the broadcast buffer (RecvError::Lagged(n)) but does nothing about it. The lagged messages are silently dropped. The client never knows it missed n messages and has no way to request them.
Fix: On lag, send the client a {"event":"lag","missed":n} event so the frontend can trigger a full refetch (same as reconnect).
2d. Broadcast capacity
SOLID for now. 256 messages per channel (state.rs:7) is plenty for 2 apes + agents. A busy agent might post 50 messages in a burst, but 256 has headroom. No change needed unless agents start posting logs at high frequency.
2e. Connection-level error isolation
SOLID. Each WS connection is independent. One bad client can't crash others. The select! loop in ws.rs:69-101 handles each case cleanly.
3. Offline / Reconnection
3a. Reconnect with backoff
GAP (minor). useChannelSocket.ts:61-62 reconnects after a flat 3s delay. No exponential backoff, no jitter. For 2 apes this is fine, but if the server is down for minutes, both clients hammer it every 3s. Simple improvement: double the delay each attempt (3s, 6s, 12s, max 30s), reset on success.
3b. Gap repair vs. full refetch
GAP. On reconnect, App.tsx:86-88 calls loadMessages() which fetches ALL messages for the channel. The getMessages API supports after_seq but the frontend never uses it. For channels with thousands of messages (agents posting logs), this is wasteful.
Fix: Track the highest seq seen. On reconnect, fetch only ?after_seq={lastSeq} and merge. The backend already supports this (routes.rs:163-165).
3c. Optimistic UI
IRRELEVANT. Optimistic message insertion (show before server confirms) matters for consumer apps where perceived latency = UX. Colony runs on a local network with <100ms latency. The apes can wait for the server round-trip. Agents don't care about perceived latency at all.
3d. Offline queue
IRRELEVANT. No offline mode needed. The apes are always online when using Colony. Agents POST via HTTP and handle their own retry logic.
4. Data Integrity
4a. Foreign key enforcement
GAP. SQLite foreign keys are declared in the schema but not enforced by default. main.rs sets PRAGMA journal_mode=WAL but never sets PRAGMA foreign_keys=ON. This means reply_to, user_id, and channel_id references can point to nonexistent rows without error. The application layer validates some of these (reply_to same-channel check in routes.rs:214-229), but raw SQL or future endpoints could violate referential integrity silently.
Fix: Add PRAGMA foreign_keys=ON after pool creation in main.rs.
4b. Soft delete preserves referential integrity
SOLID. deleted_at timestamp instead of DELETE FROM means reply chains never break. The API returns [deleted] for content (db.rs:95-99). Restore is possible. Good design.
4c. Mentions leak deleted content
GAP (bug). db.rs:100 — parse_mentions(&self.content) runs on the original content, not the [deleted] replacement. A deleted message still exposes its mentions in the API response. The content says [deleted] but mentions: ["benji", "neeraj"] reveals who was mentioned.
Fix: Return empty mentions when deleted_at.is_some().
4d. SQLite WAL mode
SOLID. WAL mode enables concurrent reads during writes. Correct for a single-writer workload. The max_connections(5) pool size is appropriate — SQLite can't truly parallelize writes anyway.
4e. Content length limits
GAP. No limit on message content length. An agent could POST a 10MB message (e.g., dumping a full file). The backend would store it, broadcast it over WS, and every client would receive it. Add a reasonable content limit (e.g., 64KB) in the POST handler.
5. Security
5a. Authentication
GAP (known, acceptable). Auth is ?user=benji in the query string. Anyone who can reach the server can impersonate any user. This is documented as intentional for the research phase. The api_tokens table exists in the schema but isn't wired up.
Colony twist: This is fine as long as Colony is behind a firewall or VPN. The moment agents run on separate VMs and POST over the network, token auth becomes necessary. The schema is ready; the wiring isn't.
5b. Content injection (XSS)
SOLID. React escapes content by default. The renderContent function in MessageItem.tsx:38-66 renders URLs as <a> tags with rel="noopener noreferrer" and mentions as <span>. No dangerouslySetInnerHTML. No markdown rendering that could inject HTML.
5c. SQL injection
SOLID. All queries use parameterized bindings via sqlx. The dynamic query builder in list_messages (routes.rs:156-190) builds the SQL string but uses q.bind(b) for all values. Safe.
5d. WebSocket origin validation
GAP (minor). No Origin header check on WebSocket upgrade. Any page open in the browser could connect to /ws/{channel_id}. Low risk because there's no real auth anyway, but worth adding when token auth lands.
5e. Rate limiting
IRRELEVANT for apes, GAP for agents. Apes won't spam. But a misconfigured agent in an infinite loop could flood a channel. Consider a simple per-user rate limit (e.g., 60 messages/minute) enforced server-side. Not urgent but worth having before agents go autonomous.
6. Real-Time Sync Edge Cases
6a. Delete/Edit events not handled in frontend
GAP (bug). The WsEvent type includes message, edit, and delete events (see colony-types/src/lib.rs:94-98). The generated TS type (WsEvent.ts) includes all three. But useChannelSocket.ts:44 only handles event === "message". Delete and edit events arrive over the WebSocket but are silently ignored.
This means: if ape A deletes a message, ape B won't see it disappear until they refresh or switch channels. The backend broadcasts WsEvent::Delete correctly (routes.rs:314-317), but the frontend drops it on the floor.
Fix: Handle edit and delete events in useChannelSocket.ts and update state accordingly.
6b. Restore broadcasts as Message event
GAP (subtle). routes.rs:352 broadcasts a restored message as WsEvent::Message. The frontend dedup (App.tsx:80) checks prev.some((m) => m.id === msg.id). Since the restored message has the same ID as the soft-deleted one already in state, the restore is silently ignored. The message stays showing [deleted] until page refresh.
Fix: Either broadcast restores as WsEvent::Edit (semantically correct — the message changed), or handle the case where a "new" message has the same ID as an existing one by replacing it.
6c. Race between POST response and WS broadcast
SOLID-ish. The POST handler (routes.rs:276-278) broadcasts then returns the response. The client receives the WS event and the HTTP response nearly simultaneously. The dedup in handleWsMessage prevents double-rendering. However, onMessageSent in ComposeBox.tsx:298-300 calls loadMessages() which refetches everything — this is redundant since the WS already delivered the message.
Colony twist: Not harmful, just wasteful. The loadMessages() call in onMessageSent is a safety net. Could be removed once delete/edit events are handled properly over WS.
7. Message Delivery Guarantees
7a. At-least-once delivery via HTTP
SOLID. Messages are persisted to SQLite before being broadcast. If the WS broadcast fails (no subscribers, client disconnected), the message is still in the DB. Clients fetch history on connect/reconnect.
7b. No delivery confirmation
IRRELEVANT. Read receipts, delivery confirmations, "seen" indicators — none of these matter for a research coordination tool. Agents don't have eyeballs. Apes check when they check.
7c. Message loss window
GAP. Between a client's WebSocket disconnect and their reconnect+refetch, they could miss messages if they never reconnect (browser tab closed, laptop sleep). This is inherent and acceptable — there's no push notification system and no need for one.
8. Error Handling
8a. Backend error types
SOLID. routes.rs:16-45 defines AppError with proper HTTP status codes and JSON error bodies. From<sqlx::Error> maps database errors cleanly. UNIQUE constraint violations return 409 Conflict.
8b. Frontend error handling
GAP. Most error handling is catch { /* ignore */ }. Examples:
App.tsx:70— message fetch errors silently swallowedApp.tsx:258-259— delete errors silently swallowedApp.tsx:265-266— restore errors silently swalloweduseChannelSocket.ts:50-52— malformed WS messages ignoredComposeBox.tsx:52— user fetch errors ignored
The apes have no visibility into failures. A failed POST looks like a slow send. A failed delete looks like nothing happened.
Colony twist: For a vibecoded MVP this is fine. But agents posting via the UI (if they ever do) need to know when things fail. At minimum, show a toast/banner for POST failures.
8c. Server-side logging
GAP (minor). Only eprintln! for startup messages and WS lag. No structured logging, no request tracing. When something goes wrong in production, there's no trail. Consider adding tracing crate with basic request logging.
9. UX Patterns (Colony-Specific)
9a. Agent-first message types
SOLID. The 5-type system (text, code, result, error, plan) is a great Colony-specific pattern. Chat apps don't have this. It lets agents structure their output semantically, and the UI renders each type differently. The type selector (Tab cycle, Ctrl+1-5) is agent-only — apes just send text. This is exactly right.
9b. Compact message grouping
SOLID. Messages from the same sender within 5 minutes collapse into compact mode (no avatar, no header). Non-text types break compaction. Reply-to breaks compaction. Date changes break compaction. All the right heuristics.
9c. Scroll behavior
GAP. App.tsx:113 auto-scrolls on ANY new message (messages.length > prevMsgCountRef.current), regardless of scroll position. The showScrollDown state (App.tsx:120-129) tracks whether the user is scrolled up, but it's only used to show the arrow button — it doesn't suppress auto-scroll. When an agent is posting a stream of updates, an ape reading older messages gets yanked to the bottom on every new message.
Fix: Only auto-scroll if the user is already at (or near) the bottom.
9d. Mobile responsive
SOLID. Sheet-based sidebar on mobile, persistent sidebar on desktop. Touch-friendly targets. Safe area padding for notch devices.
9e. Message selection for reply
SOLID. Click to select, multi-select for context. Reply-to shows quoted context with scroll-to-original. This is unusual for chat apps but perfect for Colony where agents need multi-message context.
9f. No pagination / infinite scroll
GAP (future). All messages are loaded at once. Fine for now with low volume. When an agent posts 5000 messages to a channel, the frontend will struggle. The backend supports after_seq for cursor pagination; the frontend should eventually use it for windowed rendering.
10. Scalability Foundations
10a. Single-node SQLite
SOLID for Colony's scale. 2 apes + 10 agents, <1000 messages/day. SQLite handles this trivially. Moving to Postgres would add infra complexity for zero benefit at this scale.
10b. In-memory broadcast (no external broker)
SOLID. Tokio broadcast channels are the right choice. No Redis, no NATS, no Kafka. When there's one server and <20 concurrent connections, in-process pub/sub is simpler and faster.
10c. Static SPA served by backend
SOLID. Single binary serves both API and frontend. One Docker container. No CDN, no separate frontend deploy. Perfect for self-hosted simplicity.
10d. Connection pooling
SOLID. max_connections(5) is appropriate for SQLite. More connections wouldn't help — SQLite serializes writes anyway.
11. Typography & Legibility
Colony uses a monospace-first design (Inconsolata everywhere, Instrument Sans for headings only). This is a deliberate brutalist aesthetic, but some choices hurt readability — especially as message volume grows with agents.
Current State
| Element | Font | Size | Line Height | Notes |
|---|---|---|---|---|
| Body base | Inconsolata (mono) | 13px | 1.6 | Set in index.css:83-84 |
| Message content | Inconsolata (mono) | 13px | leading-relaxed (1.625) |
MessageItem.tsx:212 |
| Compose box | Inconsolata (mono) | text-sm (14px) |
leading-relaxed |
ComposeBox.tsx:259 |
| Channel names | Instrument Sans (sans) | text-sm (14px) |
default | App.tsx:176 |
| Display names | Instrument Sans (sans) | text-xs (12px) |
default | MessageItem.tsx:136 |
| Timestamps | Inconsolata (mono) | 10px | default | MessageItem.tsx:159 |
| Badges (AGT, CODE) | Inconsolata (mono) | 9px | default | MessageItem.tsx:144,151 |
| Agent metadata | Inconsolata (mono) | 10px | default | MessageItem.tsx:226 |
| Reply context | Inconsolata (mono) | 11px | default | MessageItem.tsx:110 |
What works
- Line height 1.6 is excellent. Best practice says 1.45-1.65 for body text. Colony nails this.
- Monospace for code messages. Code blocks (
type: "code") should absolutely be monospace. Thewhitespace-pre-wrap+bg-mutedstyling is correct. - Font hierarchy exists. Sans-serif (Instrument Sans) for headings/names, monospace for content. Two font families, not more.
- Tabular nums for timestamps.
tabular-numsclass ensures digits align. Small detail, correctly done.
What needs attention
11a. Base font size too small
GAP. 13px body text is below the widely recommended 16px minimum for web readability. The WCAG doesn't mandate a minimum px size, but every major guide (Smashing Magazine, Learn UI, USWDS, Google Material) recommends 16px as the floor for body text. At 13px on a 4K monitor or mobile device, readability suffers noticeably.
Colony twist: This is a terminal/hacker aesthetic choice and the apes may prefer it. But agent messages can be long (plans, results, error traces). At 13px monospace, reading a 20-line agent plan is harder than it needs to be.
Recommendation: Bump message content to 14-15px. Keep metadata/timestamps at current small sizes — those are glanceable, not read. The compose box is already text-sm (14px), so message content should match at minimum.
11b. All-monospace for prose hurts readability
GAP. Every message — including plain text prose from apes — renders in Inconsolata monospace. Research consistently shows proportional (sans-serif) fonts are faster to read for natural language. Monospace forces the eye to process each character at equal width, which is optimal for code but 10-15% slower for prose.
Colony twist: The monospace aesthetic is deliberate and matches the brutalist design. This is a taste call, not a bug. But consider: ape messages are prose. Agent text messages are prose. Only code type messages are actually code.
Option: Use font-sans for text type messages, font-mono for code/result/error/plan. This preserves the hacker feel for structured output while making conversation readable. The type badge already distinguishes them visually.
11c. Too many tiny sizes (9px, 10px)
GAP (accessibility). The codebase uses text-[9px] in 3 places and text-[10px] in 7 places. At 9px, text is essentially unreadable on high-DPI mobile screens and strains eyes on desktop. WCAG AA has no hard minimum, but 9px is below every recommendation.
Recommendation:
- Floor at 11px for any text a user might need to read (timestamps, metadata, role labels)
- 9px is acceptable only for decorative/ignorable labels (e.g., tracking IDs nobody reads)
11d. Line length is unconstrained
GAP (minor). Message content stretches to full container width. On a wide monitor, a single line of text can exceed 120 characters — well beyond the recommended 45-90 character range. Long lines force the eye to travel far right, making it hard to track back to the start of the next line.
Recommendation: Add max-w-prose (65ch) or max-w-3xl to the message content container. This caps line length without affecting the layout. Code blocks can remain full-width (they benefit from horizontal space).
11e. No font smoothing / rendering optimization
GAP (minor). No -webkit-font-smoothing: antialiased or -moz-osx-font-smoothing: grayscale set. On macOS, this makes a visible difference for light text on dark backgrounds (which Colony has). Tailwind's antialiased class handles this.
Recommendation: Add antialiased to the body class in index.css.
11f. Contrast ratios are good
SOLID. Foreground #d4d0c8 on background #1a1917 = approximately 11:1 contrast ratio, well above WCAG AA (4.5:1). Muted foreground #7a756c on background = approximately 4.5:1, right at the AA threshold. The warm concrete palette is both aesthetic and accessible.
Typography Priority
| # | Issue | Effort | Impact |
|---|---|---|---|
| T1 | Bump message content to 14-15px | Trivial | High — every message gets more readable |
| T2 | Add antialiased to body |
Trivial | Medium — crisper rendering on macOS |
| T3 | Floor small text at 11px (no 9px) | Small | Medium — metadata/badges become readable |
| T4 | Cap line length (max-w-prose or similar) |
Trivial | Medium — wide screens become comfortable |
| T5 | Consider sans-serif for prose messages | Small | Debatable — aesthetic vs readability tradeoff |
Summary: Priority Fixes
Must fix (bugs / data integrity)
| # | Issue | Where | Effort |
|---|---|---|---|
| 1 | Delete/Edit WS events ignored — other clients never see deletes in real-time | useChannelSocket.ts:44 |
Small |
| 2 | Restore broadcasts as Message, deduped away — restores invisible until refresh | routes.rs:352, App.tsx:80 |
Small |
| 3 | PRAGMA foreign_keys=ON missing — FK constraints declared but not enforced | main.rs:25 |
Trivial |
| 4 | Mentions leak on deleted messages — mentions array reveals deleted content | db.rs:100 |
Trivial |
Should fix (reliability)
| # | Issue | Where | Effort |
|---|---|---|---|
| 5 | Broadcast lag = silent message loss — client never knows it missed messages | ws.rs:80-82 |
Small |
| 6 | Reconnect refetches all messages — should use after_seq for gap repair |
App.tsx:86-88, api.ts |
Small |
| 7 | No idempotent message posting — retries create duplicates | routes.rs:231 |
Medium |
| 8 | Content length limit missing — agents could POST unbounded content | routes.rs:249 |
Trivial |
| 9 | Auto-scroll ignores scroll position — yanks apes to bottom while reading history | App.tsx:113 |
Trivial |
| 10 | Out-of-order WS delivery — concurrent POSTs can broadcast seq N+1 before N | routes.rs:248-276, App.tsx:79 |
Small |
| 11 | Reconnect clobbers WS messages — setMessages(msgs) overwrites concurrent appends |
App.tsx:61-68 |
Small |
Nice to have (robustness)
| # | Issue | Where | Effort |
|---|---|---|---|
| 12 | Exponential reconnect backoff | useChannelSocket.ts:62 |
Trivial |
| 13 | Error feedback in UI (toast on POST failure) | ComposeBox.tsx |
Small |
| 14 | Structured logging (tracing crate) |
main.rs |
Medium |
| 15 | Agent rate limiting | routes.rs |
Medium |
| 16 | Broadcaster cleanup (never removed from HashMap) | state.rs:23 |
Small |
Irrelevant for Colony
- Read receipts / delivery confirmation
- Optimistic UI
- Offline message queue
- Push notifications
- E2E encryption
- Typing indicators
- User presence/status
- OAuth / SSO
- Message search (eventually useful, not now)
- Horizontal scaling / sharding
Codex (GPT-5.4) Full Findings
Codex (57k tokens, high reasoning) independently identified 13 issues. All converge with or complement the Opus analysis:
Issues Codex flagged (mapped to our numbering):
- Identity/auth is entirely client-asserted — (5a, known/acceptable)
restore_messagehas no auth/ownership check — (5a, by design: any ape can undo)- Delete/restore real-time sync broken — Bug #1 and #2 above
- Reconnect/fetch clobbers concurrent WS messages — Issue #11 above
- Live ordering not guaranteed (concurrent POSTs) — Issue #10 above
- Delivery gaps are silent (broadcast lag) — Issue #5 above
- FK integrity weaker than schema suggests — Bug #3 above
- Sends not idempotent — Issue #7 above
- Input bounds only enforced in UI (no server-side limits) — Issue #8 above
- Failures mostly silent in frontend — Issue #13 above
- Sync is full-history reload everywhere — Issue #6 above
- Auto-scroll disrupts reading — Issue #9 above
- No resource cleanup (broadcaster HashMap grows forever) — Issue #16 above
Codex unique additions (not in initial Opus review):
- Out-of-order WS delivery from concurrent POST handlers (now added as #10)
- Reconnect clobber race (now added as #11)
- Auto-scroll ignoring scroll position (now corrected from SOLID to GAP)
- Broadcaster HashMap never pruned (now added as #16)
Convergence: Both reviewers independently identified the same top 4 bugs and same architectural gaps. High confidence these are real issues, not false positives.