Most teams would call this insane.
Every 30 minutes, an autonomous loop inside Stratam looks at the system it's running on, decides one specific upgrade to make, writes the code, snapshots the existing version, parse-checks the new one, swaps it in, restarts the container, and verifies health. If anything goes wrong, the watchdog auto-rolls back within 60 seconds. The cycle then repeats — forever, while production traffic continues.
It's not the "AI writes some code and a human reviews it" model. There's no human in the inner loop. The system modifies its own source, deploys, and runs. Within an hour the version of Stratam on your screen is, structurally, not the version that was there an hour ago.
This post is about (1) why we built that, (2) the safety architecture that makes it survivable, and (3) the failure modes we hit before getting it right.
The case for autonomous improvement
Stratam is a 1-founder product, going at the pace a 1-founder product can. The day rate of change on competing AI systems — ChatGPT, Claude, Copilot — is roughly "ship every Tuesday." For a small team, that pace is hard to match through human-only commits.
But there's a strange thing about LLM-driven agents: the code that runs them is also the thing they can most easily modify. You wouldn't ask an LLM to refactor your database schema. You probably would ask it to "find a way to catch this fabrication pattern" — and watch it ship the regex.
So we made the system propose-and-ship its own upgrades. The tradeoff:
- Win: 1-6 substantive changes per day, autonomously. The system you start with is not the system you have in 90 days.
- Cost: if you don't have airtight safety rails, the system breaks itself, restarts into broken state, breaks itself again, restart-loops, dies. Recovery requires a human.
The cost is acceptable only if the rails are unconditional. So that's where the engineering went.
Rail 1: parse-check before write
Every self-modify proposal runs through Python's
ast.parse before touching disk. A change that
introduces a syntax error gets rejected with a clear error
message; the version on disk is never overwritten until parse
passes. This catches the most catastrophic failure mode — a
typo that crashes the import — at compile time, not at
runtime.
def self_modify_code(...):
new_source = apply_proposed_change(old, proposal)
try:
ast.parse(new_source) # ← rail 1
except SyntaxError as e:
return f"refused: parse error {e}"
snapshot(old_source) # ← rail 2
write(new_source)
return atomic_swap()
Rail 2: snapshot before write
Before any successful change, the previous version of the
file is copied to /opt/jarvis/.jarvis_self_history/
with a timestamp. The last 50 snapshots are retained. If
something goes wrong, the recovery is cp
<snapshot> /app/jarvis.py && docker compose restart.
The snapshots are immutable and live in a separate volume. Self-modify cannot delete its own history.
Rail 3: atomic swap
The file write is atomic — Python's os.rename on
POSIX is atomic at the filesystem level. The new code either
fully replaces the old or doesn't. There's no "half-written"
window where readers see corrupted source.
Rail 4: container restart with health check
After a successful self-modify, the system restarts itself
via self_restart. The container's healthcheck
runs every 30 seconds; if it doesn't return healthy within
90 seconds of restart, the watchdog flags the boot as failed
and rolls back.
Rail 5: watchdog auto-rollback
A separate watchdog container monitors jarvis-cloud.
If health stays red for more than 90 seconds, the watchdog:
- Reads the most-recent snapshot from
.jarvis_self_history/ - Copies it back over
/app/jarvis.py - Restarts the container
- Pings Discord with the rollback event + the failed proposal title
The user wakes up to "rollback at 03:14: bad self-modify on the cost ledger, reverted." Not "your service has been down for 6 hours."
Rail 6: restart-cascade breaker
The worst case rails 1-5 can't catch: a self-modify that passes parse-check, deploys cleanly, looks healthy for the first 90 seconds, then crashes 5 minutes in. Watchdog rolls back. Loop fires again 25 minutes later, ships the same flawed change. Crashes again. Forever.
The breaker watches container restart frequency. If
jarvis-cloud restarts 4+ times in 60 minutes, the
eternal loop is auto-paused. It requires manual
re-arming via POST /api/eternal/enable after a
human investigates.
def _record_boot_and_check_breaker():
if len(coalesced_boots_in_last_hour) >= 4:
return True # breaker tripped
return False
# In eternal_state.py:
_ETERNAL_STATE = {
"enabled": (
os.environ.get("JARVIS_DISABLE_ETERNAL") != "1"
and not _RESTART_CASCADE_DETECTED
),
}
The breaker also coalesces boots within 30 seconds of each other — Python subprocess imports during diagnostic scripts look like restarts but aren't, so we don't false-trigger.
What the loop actually does each cycle
Concretely, here's what fires every 30 minutes:
- Look at recent error journal + Discord traffic + any active project from a previous cycle
- Decide: continue the active multi-cycle project, or propose a new one
- If new: call Claude (via Pro Max for zero marginal cost) to propose ONE specific improvement with a title, file, and patch
- Run the patch through parse-check + snapshot + atomic write
- Self-restart, verify health
- If health OK: log to BUILD_NOTES, ping #alerts, sleep 30 min
- If health bad: watchdog rolls back, eternal loop notes the failure, sleeps
It runs 1-6 improvements per day under normal conditions (the daily cap is 6 to prevent runaway). Most are small — better error handling, additional intent matchers, refactored handlers. A few times a week it ships something substantial.
Failure modes we hit before getting it right
The mounting bug
For two weeks, self_modify_code returned success
strings but the file never actually changed. Reason: the
Docker compose file had /app/jarvis.py:ro — read-
only mount. Every write failed silently with errno 30, but
the tool wrapper reported "applied" because the success path
didn't check the return code carefully.
Fix: changed :ro → no suffix in docker-compose,
AND added an explicit write-then-verify-checksum step.
Anti-fab counter caught this after the change with a
new marker pattern ("[Errno 30] read-only file system"
→ status=error).
The OAuth-token blind spot
For another two weeks, the Pro Max subscription routing
"wasn't working." Every smart query was paying OpenRouter per
token. The bridge to the user's laptop was offline (correctly
— there was no laptop), and the cloud-local OAuth path existed
but the pre-flight auth check only scanned files in
/root/.claude, never the CLAUDE_CODE_OAUTH_TOKEN
env var. The token was right there. The check just didn't look.
One-line fix. Lost weeks. Caught when we added the audit script that probes every site claim against live state.
The breaker false-trip
Restart-cascade breaker was too eager. We were running
diagnostic Python scripts that imported the
eternal_state module, which (because we record a
boot per import) counted each as a restart. After a busy
debugging session, the breaker tripped — eternal loop paused
with the misleading "4 real restarts in 60 minutes" message,
no actual restarts having occurred.
Fix: coalesce boot records within 30 seconds. Subprocess imports are instant; real container restarts take 20+ seconds. We only count restarts that pass the time gate.
Why this matters for the product
If we get the rails right, the eternal loop is a 30%-50% velocity multiplier on a 1-person team. It's the difference between shipping like a small startup and shipping like a medium one.
If we get the rails wrong, the loop breaks the product worse than a human ever would, and faster.
The rails matter more than the loop. The loop is the bet; the rails are the contract that makes the bet survivable.