Winston · OpenClaw Performance Audit

Codebase health, token efficiency & action status · Mayer Systems Studio

Audit Date: 2026-06-01
5 of 20 Implemented
Cleanup Shipped 2026-06-01
Critical Issues
4
Immediate action required
High Priority
8
Next sprint targets
Medium Priority
7
Planned improvement
Written
5/20
Fix docs authored
Implemented
5/20
Changes deployed

Audit Items

20 items
# Problem Description Proposed Fix Written Implemented
01
Critical
server.py monolith
Single file containing all logic — routing, tools, API calls, helpers. Impossible to maintain or test in isolation. Every edit touches the whole surface area. Decompose into focused modules: router.py, tools/, api/, helpers/. One responsibility per module. Shared init handles wiring. ✗ Not yet ✗ No
02
Critical
Duplicate DaVinci scripts
Near-identical DaVinci Resolve scripts scattered across the workspace. Bug fixes must be applied in N places — or they diverge silently over time. Consolidate into a canonical library under scripts/davinci/. Shared utilities in lib/. Audit, verify, then delete all duplicates. ✗ Not yet ✗ No
03
Critical
Hardcoded paths
Absolute paths like C:\Users\mrpoo\Desktop\... baked into scripts. Breaks on any machine change, path rename, or new environment. Silent failures when paths don't exist. Move all paths to .env + central config.py. Use Path(os.environ.get(...)) everywhere. Validate paths at startup with clear errors. ✗ Not yet ✗ No
04
Critical
Silent pipeline failures
Render jobs, API calls, and file operations fail without raising errors or logging. Jobs appear to "complete" while producing nothing. Success reported without evidence. Enforce output validation after every stage. All exceptions must bubble up. Structured logging with stage checkpoints. Never report success without tool-verified output. ✗ Not yet ✗ No
05
High
Stale JSON files
Cached JSON state files (project manifests, clip reports, beat maps) accumulate and are never pruned. Old data feeds new decisions. Cache invalidation is manual and inconsistently applied. Implement TTL-based invalidation on all JSON caches. Add a cache-bust flag to pipeline init. Prune stale files on session start. Document cache lifespan per file type. ✗ Not yet ✗ No
06
High
Encoding bugs
UTF-8/CP1252 conflicts when reading/writing files with non-ASCII characters. PowerShell heredocs corrupt JS single quotes. Intermittent failures that are hard to reproduce reliably. Enforce encoding="utf-8" on all file I/O. Write complex JS/HTML via Node.js file writer — never PowerShell heredoc. Add encoding lint to pre-commit hooks. ✗ Not yet ✗ No
07
High
Excessive context files
Too many context files loaded at session start. Deprecated stubs (IDENTITY.md, USER.md, HEARTBEAT.md, WHO.md) injecting dead weight. Stray temp files + audit artifacts in workspace root adding tokens every run. ✅ Shipped 2026-06-01: Rebuilt IDENTITY.md + USER.md with real content (OpenClaw standard names). HEARTBEAT.md restored as live checklist. OPERATING-SYSTEM.md archived to memory/. WHO.md deleted. 23 stray media/temp files purged. 3 stale analysis docs archived. ~15KB removed from root injection surface. ✓ Done ✓ Yes
08
Medium
Token efficiency — verbose prompts
System prompts and workspace files contain redundant, repetitive instructions. Same rules stated in LAWS.md, MEMORY.md, SOUL.md, and OPERATING-SYSTEM-V2.md. Every duplicate costs tokens on every call. ✅ Partially shipped 2026-06-01: OPERATING-SYSTEM.md (post-mortem narrative) archived — V2 is now sole SOT. Deprecated pointer stubs replaced with canonical content. Next: deduplicate rules across MEMORY.md / SOUL.md / LAWS.md. Target: 40% context reduction. ✓ Done ✓ Partial
09
High
MEMORY.md bloat
Active projects, SOP details, skill tables, and lessons learned all in one file — injected every turn. Most is reference material that should never be in hot context. Was ~12KB loaded on every single call. ✅ Shipped 2026-06-01: Split into MEMORY.md (evergreen rules, 3.8KB) + PROJECTS.md + SKILLS-REGISTRY.md. MEMORY.md reduced from 11.6KB → 3.8KB (67% reduction). Reference files load on demand only. ✓ Done ✓ Yes
10
High
Duplicate rules across files
Stop protocol appears in SOUL.md, OPERATING-SYSTEM-V2.md, AGENTS.md, and LAWS.md. Gate rules in SOUL.md and OS-V2 both. Same content injected 3–4× every session, burning tokens on redundancy. One SOT per rule. OPERATING-SYSTEM-V2.md owns all operational rules. Other files get a one-line pointer. Eliminates ~30% of duplicated instruction content across bootstrap files. ✗ Not yet ✗ No
11
High
VIDEO-PRODUCTION-SOP.md in root
17.8KB SOP injected every session regardless of whether video work is happening. Irrelevant on coding, design, and admin sessions — pure token waste on those runs. ✅ Shipped 2026-06-01: Moved to memory/VIDEO-PRODUCTION-SOP.md. Pointer added to MEMORY.md. No longer injected at startup. Saves ~17KB on every non-video session. ✓ Done ✓ Yes
12
Medium
AGENTS.md size & duplication
AGENTS.md contains canonical protocols, file structure tables, group chat rules, and execution logging format — some of which duplicates OS-V2 and LAWS.md. Injected every session as a bootstrap file. Trim to navigation-only (file map + startup order). Move rules to their SOT files. Target: reduce from ~3KB to <1KB. Rules live once; AGENTS.md just points to them. ✗ Not yet ✗ No
13
Medium
No bootstrap file audit cadence
No recurring process to catch new file bloat before it accumulates again. Root workspace files grow silently between sessions. Today's cleanup will drift without enforcement. Monthly cron job or /audit command: list workspace root files + sizes, flag any file >5KB or any new file not in approved bootstrap list. Auto-report to Telegram. ✗ Not yet ✗ No

Operational Patterns & Behavioral Standards

6 items
# Pattern Current State Target Behavior Written Active
P1
High
Model selection discipline
Default model used for all tasks regardless of complexity. Heavy models (Opus) on simple tasks = unnecessary cost. Light tasks don't need full reasoning power. ✅ Implemented 2026-06-01: Defaulted to cortex_proxy/claude-sonnet-4-6 for all sessions and sub-agents. Opus reserved for complex reasoning/architecture only. Config confirmed in openclaw.json agents.defaults. ✓ Done ✓ Yes
P2
High
Session hygiene — /new cadence
Sessions run long, context accumulates, cache hit rate drops, new token cost per turn rises. No clear trigger for when to start a fresh session. Start /new when: (1) task type changes (e.g. production → housekeeping), (2) context hits 20%+ of window, (3) after any major deliverable ships. Check /status at session start to confirm cache baseline. ✗ Not yet ✗ No
P3
High
Token spend visibility
No systematic check on token burn per session or per task type. Costs are invisible until they accumulate. No baseline to compare against after optimizations. Run /status at session start and end. Log cache hit %, context size, and new tokens per session in execution-log.md. Establish baseline after each optimization sprint so improvements are measurable. ✗ Not yet ✗ No
P4
Medium
Gateway config visibility
Compaction floor, session targets, context injection rules all set but never reviewed. Config drift happens silently. No single place to see "what is OpenClaw actually doing right now." Quarterly gateway config review. Document key settings in ECOSYSTEM.md: compaction floor, bootstrap files list, channel configs, model defaults. Flag any setting that deviates from intent. ✗ Not yet ✗ No
P5
Medium
Production vs. housekeeping session separation
Mixing production work (video, three-pagers) with housekeeping (file cleanup, audit) in the same session inflates context and muddies cache. Each task type loads different files. Dedicated session types: production sessions start clean with project files loaded. Housekeeping sessions are separate. Never pivot mid-session from deep production to admin without /new. ✗ Not yet ✗ No
P6
Medium
Continuation prompt standardization
No standard format for continuation prompts between sessions. Context that should carry forward relies on memory search rather than a structured handoff. Increases ramp-up cost each session. Standard continuation prompt template: project + last state + next action + any blockers. Written to SESSION_HANDOFF.md at session close. Winston reads it first thing next session before any tool call. ✗ Not yet ✗ No
P7
High
Gate enforcement — TSSC before every tool call
Gate (Tool, Skill, Scope, Cost) exists in SOUL.md and OS-V2 but enforcement is inconsistent. Skipped under time pressure or momentum. Violations lead to scope creep, wrong tools, and wasted tokens. Gate is non-negotiable before every first tool call. Format: TOOL [name — why] · SKILL [path or “None”] · SCOPE DO/DON’T · COST ~Xk tokens. Citing a skill without a visible read call = violation. Chris can challenge any turn that skips it. ✓ Written ✓ Active

Implementation Progress

server.py decomposition
0%
DaVinci script consolidation
0%
Hardcoded path removal
0%
Silent failure logging
0%
JSON cache invalidation
0%
Encoding standardization
0%
Context file pruning
80%
Token efficiency / prompt trim
35%
MEMORY.md split
100%
Rule deduplication
0%
SOP file relocation
100%
AGENTS.md trim
0%
Bootstrap audit cadence
0%