Why Stratam ships to himself every 30 minutes

Most teams would call this insane.

Every 30 minutes, an autonomous loop inside Stratam looks at the system it's running on, decides one specific upgrade to make, writes the code, snapshots the existing version, parse-checks the new one, swaps it in, restarts the container, and verifies health. If anything goes wrong, the watchdog auto-rolls back within 60 seconds. The cycle then repeats — forever, while production traffic continues.

It's not the "AI writes some code and a human reviews it" model. There's no human in the inner loop. The system modifies its own source, deploys, and runs. Within an hour the version of Stratam on your screen is, structurally, not the version that was there an hour ago.

This post is about (1) why we built that, (2) the safety architecture that makes it survivable, and (3) the failure modes we hit before getting it right.

The case for autonomous improvement

Stratam is a 1-founder product, going at the pace a 1-founder product can. The day rate of change on competing AI systems — ChatGPT, Claude, Copilot — is roughly "ship every Tuesday." For a small team, that pace is hard to match through human-only commits.

But there's a strange thing about LLM-driven agents: the code that runs them is also the thing they can most easily modify. You wouldn't ask an LLM to refactor your database schema. You probably would ask it to "find a way to catch this fabrication pattern" — and watch it ship the regex.

So we made the system propose-and-ship its own upgrades. The tradeoff:

Win: 1-6 substantive changes per day, autonomously. The system you start with is not the system you have in 90 days.
Cost: if you don't have airtight safety rails, the system breaks itself, restarts into broken state, breaks itself again, restart-loops, dies. Recovery requires a human.

The cost is acceptable only if the rails are unconditional. So that's where the engineering went.

Rail 1: parse-check before write

Every self-modify proposal runs through Python's ast.parse before touching disk. A change that introduces a syntax error gets rejected with a clear error message; the version on disk is never overwritten until parse passes. This catches the most catastrophic failure mode — a typo that crashes the import — at compile time, not at runtime.

def self_modify_code(...):
    new_source = apply_proposed_change(old, proposal)
    try:
        ast.parse(new_source)  # ← rail 1
    except SyntaxError as e:
        return f"refused: parse error {e}"
    snapshot(old_source)  # ← rail 2
    write(new_source)
    return atomic_swap()

Rail 2: snapshot before write

Before any successful change, the previous version of the file is copied to /opt/jarvis/.jarvis_self_history/ with a timestamp. The last 50 snapshots are retained. If something goes wrong, the recovery is cp <snapshot> /app/jarvis.py && docker compose restart.

The snapshots are immutable and live in a separate volume. Self-modify cannot delete its own history.

Rail 3: atomic swap

The file write is atomic — Python's os.rename on POSIX is atomic at the filesystem level. The new code either fully replaces the old or doesn't. There's no "half-written" window where readers see corrupted source.

Rail 4: container restart with health check

After a successful self-modify, the system restarts itself via self_restart. The container's healthcheck runs every 30 seconds; if it doesn't return healthy within 90 seconds of restart, the watchdog flags the boot as failed and rolls back.

Rail 5: watchdog auto-rollback

A separate watchdog container monitors jarvis-cloud. If health stays red for more than 90 seconds, the watchdog:

Reads the most-recent snapshot from .jarvis_self_history/
Copies it back over /app/jarvis.py
Restarts the container
Pings Discord with the rollback event + the failed proposal title

The user wakes up to "rollback at 03:14: bad self-modify on the cost ledger, reverted." Not "your service has been down for 6 hours."

Rail 6: restart-cascade breaker

The worst case rails 1-5 can't catch: a self-modify that passes parse-check, deploys cleanly, looks healthy for the first 90 seconds, then crashes 5 minutes in. Watchdog rolls back. Loop fires again 25 minutes later, ships the same flawed change. Crashes again. Forever.

The breaker watches container restart frequency. If jarvis-cloud restarts 4+ times in 60 minutes, the eternal loop is auto-paused. It requires manual re-arming via POST /api/eternal/enable after a human investigates.

def _record_boot_and_check_breaker():
    if len(coalesced_boots_in_last_hour) >= 4:
        return True  # breaker tripped
    return False

# In eternal_state.py:
_ETERNAL_STATE = {
    "enabled": (
        os.environ.get("JARVIS_DISABLE_ETERNAL") != "1"
        and not _RESTART_CASCADE_DETECTED
    ),
}

The breaker also coalesces boots within 30 seconds of each other — Python subprocess imports during diagnostic scripts look like restarts but aren't, so we don't false-trigger.

What the loop actually does each cycle

Concretely, here's what fires every 30 minutes:

Look at recent error journal + Discord traffic + any active project from a previous cycle
Decide: continue the active multi-cycle project, or propose a new one
If new: call Claude (via Pro Max for zero marginal cost) to propose ONE specific improvement with a title, file, and patch
Run the patch through parse-check + snapshot + atomic write
Self-restart, verify health
If health OK: log to BUILD_NOTES, ping #alerts, sleep 30 min
If health bad: watchdog rolls back, eternal loop notes the failure, sleeps

It runs 1-6 improvements per day under normal conditions (the daily cap is 6 to prevent runaway). Most are small — better error handling, additional intent matchers, refactored handlers. A few times a week it ships something substantial.

Failure modes we hit before getting it right

The mounting bug

For two weeks, self_modify_code returned success strings but the file never actually changed. Reason: the Docker compose file had /app/jarvis.py:ro — read- only mount. Every write failed silently with errno 30, but the tool wrapper reported "applied" because the success path didn't check the return code carefully.

Fix: changed :ro → no suffix in docker-compose, AND added an explicit write-then-verify-checksum step. Anti-fab counter caught this after the change with a new marker pattern ("[Errno 30] read-only file system" → status=error).

The OAuth-token blind spot

For another two weeks, the Pro Max subscription routing "wasn't working." Every smart query was paying OpenRouter per token. The bridge to the user's laptop was offline (correctly — there was no laptop), and the cloud-local OAuth path existed but the pre-flight auth check only scanned files in /root/.claude, never the CLAUDE_CODE_OAUTH_TOKEN env var. The token was right there. The check just didn't look.

One-line fix. Lost weeks. Caught when we added the audit script that probes every site claim against live state.

The breaker false-trip

Restart-cascade breaker was too eager. We were running diagnostic Python scripts that imported the eternal_state module, which (because we record a boot per import) counted each as a restart. After a busy debugging session, the breaker tripped — eternal loop paused with the misleading "4 real restarts in 60 minutes" message, no actual restarts having occurred.

Fix: coalesce boot records within 30 seconds. Subprocess imports are instant; real container restarts take 20+ seconds. We only count restarts that pass the time gate.

Why this matters for the product

If we get the rails right, the eternal loop is a 30%-50% velocity multiplier on a 1-person team. It's the difference between shipping like a small startup and shipping like a medium one.

If we get the rails wrong, the loop breaks the product worse than a human ever would, and faster.

The rails matter more than the loop. The loop is the bet; the rails are the contract that makes the bet survivable.