How I run dev work with Claude, on autopilot and at the keyboard

A few of you asked how my setup works, so here it is end to end. It runs in two modes that share one quality bar. On autopilot, a cron scheduler wakes a Claude Code session every hour to pull a GitHub issue and work it in an isolated git worktree. Before any pull request reaches me, it's been read by a second model and a separate coach agent whose only job is to ask whether the work is good. Hands-on, I'm at the keyboard and we run heavier multi-agent reviews together on bigger changes. Either way, the Codex cloud connector reviews the PR, I have the final say, and I get a Telegram message when something's done.

Quality comes from stacking independent checks. No single model has to be brilliant; enough of them have to disagree. Stack a few and a single miss gets caught in review, before it can become a bad merge.

2
modes: autopilot + hands-on
4
models, 0 direct API calls
hourly
scheduled wakes, 7 a.m. to 9 p.m.
1
human merge gate (me)

Working notes · Center for Cooperative Media · updated May 2026

Not a developer? Start with the plain-language key below — it defines every term the page leans on. None of this is as complicated as the words make it sound.

Start here — the words I use

This is a developer setup, but the ideas behind it are simple. If a term below is unfamiliar, here's the plain version. The rest of the page builds on these.

issue
A to-do item logged on GitHub, the website where the code lives. The list of open issues is the work queue.
pull request
A finished change, bundled up and held for review. Nothing in it goes live until it's approved. Usually shortened to PR.
worktree
A separate, private copy of the project's files. Each session gets its own, so two can run at the same time without overwriting each other.
session
One run of Claude Code working on a task, from start to handoff.
model
An AI system. This setup uses four: Claude writes the code, Codex reviews it, Copilot breaks ties, and Gemini sorts email.
commit / merge
To commit is to save a set of changes as one step. To merge is to accept those changes into the live version of the code.
cron
A built-in timer that runs a job automatically at set times, with no one watching. The thing that wakes a session every hour.
CLI
A command-line tool: you run it by typing a command, instead of clicking through an app or website.
context window
How much text a model can hold in mind at once. It's limited, and the work gets worse as it fills up.
compaction
Summarizing a long session down to its essentials, to free up room in the context window.
reasoning effort
A setting for how hard a model thinks before it answers. More effort costs more and runs slower, and isn't always better.
01

The big loop

autopilot

The autonomous mode runs as one cycle. Open issues are the queue. A scheduled session pulls one, does the work in a worktree, runs it through the pre-PR review chain, opens a pull request, and pings me. I review on my phone and either approve the merge or send it back. The loop closes and the next wake starts again.

Fig. 1 — the autonomous loop · pinch or +/− to zoom · drag to pan · ⛶ full size

flowchart TD
    issues["Open GitHub issues<br/>the shared to-do list"]
    pick["Scheduled wake<br/>picks one item"]
    tree["Worktree session<br/>does the work"]
    gates["Pre-PR review chain<br/>codex then coach"]
    pr["Open pull request"]
    connector["Codex connector reviews<br/>on GitHub"]
    ping["Telegram:<br/>done / PR opened"]
    joe{"I review<br/>on my phone"}
    merge["Merge"]

    issues --> pick --> tree --> gates --> pr --> connector --> ping --> joe
    joe -->|approve| merge --> issues
    joe -->|needs changes| tree

    class joe human
    class merge done
    classDef human fill:#e4002b14,stroke:#e4002b,stroke-width:1.5px
    classDef coach fill:#e4002b26,stroke:#e4002b,stroke-width:2px
    classDef done fill:#e4002b,stroke:#e4002b,color:#ffffff
          
a step
your decision, the coach, or the merge
02

When sessions wake up

autopilot

Two Raspberry Pis (small, inexpensive, low-power computers) do the work: a small services Pi runs the scheduler, the Telegram bot, and the notifications poller; a second Pi handles heavier compute. Cron on the services Pi drives everything.

  • Scheduled check-ins — a wake fires every hour from 7 a.m. to 9 p.m. This is the one that pulls an issue and progresses it.
  • Event wakes — every 15 minutes a lighter pass drains a queue of things that just happened: a new meeting transcript, an email from me, a forwarded message, a Slack mention.
  • The notifications poller — also every 15 minutes, it checks email, shared Drive files, and 38 Slack channels, and writes anything interesting into that queue.

A wake is a Claude Code session in a tmux pane (a terminal that keeps running on the Pi after I disconnect), at high reasoning effort with a hard timeout (a time limit that ends a session if it runs too long). The session picks its work, does 15 to 30 minutes of it, then hands off: a Telegram summary, a work-log doc, and any board updates.

Fig. 2 — wake and event lifecycle · pinch or +/− to zoom · drag to pan · ⛶ full size

flowchart TD
    cron["cron on the services Pi"]
    checkin["check-in wake<br/>hourly, 7 a.m. to 9 p.m."]
    events["event wake<br/>every 15 min"]
    notif["notifications poller<br/>every 15 min<br/>email, Drive, 38 Slack channels"]
    queue["pending-events queue"]
    spawn["tmux Claude session<br/>Opus, high effort,<br/>hard timeout"]
    prog["progression picker<br/>shortlists 3 candidates"]
    work["15 to 30 min of real work"]
    handoff["handoff:<br/>Telegram, work-log doc,<br/>board update"]

    cron --> checkin
    cron --> events
    cron --> notif
    notif --> queue --> events
    checkin --> spawn
    events --> spawn
    spawn --> prog --> work --> handoff

    class handoff done
    classDef done fill:#e4002b,stroke:#e4002b,color:#ffffff
          

Fig. 2b — what a finished wake sends me, names and projects redacted

Telegram wake handoff

Claude check-in wake complete  (18 min)

Midday wake (Fri May 29) — I'm out.

Progressed issue #18 in a repo (tech-debt): a freshness check was flagging years buried in HTML attributes (upload paths, CSS classes, script bodies) as false-positive findings. Fixed it with a parser that reads reader-facing text only (element text + alt / title / aria-label). Live re-audit on a section: 6 false positives down to 2 true ones. PR #61, 17 new tests, 131 green, lint clean. Awaiting the cloud review + my merge.

New since last check-in: Google Docs comments on a draft doc from two colleagues — FYI. No new emails.

codex: 1 finding / 1 fixed / 2 clean passes  ·  coach: applied

Always wrap the session in timeout --foreground

Wrap the session in timeout --foreground. Plain timeout starts a new process group and kills the Node tree before it produces output — you get zero-byte result files and silent multi-day outages that look like nothing ran. The idle-killer also watches CPU time in the process subtree, so long, quiet tool phases (a slow API call, a browser run) don't trip a false "this session is stuck."

03

What each session is told

autopilot

The prompt is assembled fresh each wake. A picker shortlists three candidates — an open GitHub issue, a backlog item, or a wildcard — and the session commits to one. Then comes a stack of shared instruction blocks: how to write a work-log doc, how to update the board and CRM safely (my task tracker and contacts list), an SMTP-only email rule (send through a standard mail server, never a model's own email feature), and how to leave a Telegram summary.

A quality-bar check brackets the work. Before it starts, the session has to name the single best move that will make the output better than a rote pass — a fact to verify instead of assert, an earlier work-log to build on instead of repeating it, or a claim to test empirically. Before it wraps up, it re-reads what it produced and confirms it made that move.

How an autonomous session proves it did the work

The sessions commit under my git identity, so I needed a way to tell "a wake did this" from "I did this." Each wake carries a receipt token like wake-20260529T1405-a1b2c3. A verifier later confirms the session progressed its item by finding that token in a commit, a PR body, or an issue comment — and records a verdict (shipped an artifact, changed state, blocked on me, and so on). No token, no credit, so the record of what each wake moved stays accurate.

04

Hands-on sessions

at the keyboard

Not everything runs on a timer. When I'm at the keyboard the same machinery is there, but I'm in the loop in real time — so we can take on bigger, riskier changes than an unattended wake should, and I can steer mid-flight.

  • superjawn for anything non-trivial. I run it through superjawn — my fork of superpowers (a workflow add-on for Claude Code) that forces a research phase before any brainstorming or planning, so the work starts from evidence. It writes the spec before touching code and I approve it; ambiguous specs produce ambiguous code, so the spec is where the time goes.
  • Fan out. Instead of one session grinding through a problem, I have it spawn several subagents in parallel (extra AI helpers the main session launches, each on one piece of the problem) — one exploring the codebase, one running a codex review, one acting as the coach, one driving the Copilot CLI — then synthesize what they bring back. More compute on the same problem, and independent passes that catch each other's misses.
  • Reviews on demand. The codex and coach passes from the autonomous flow are available here too, and I can fire a Copilot CLI review on a single fix without spending one of the once-per-PR cloud reviews.
  • Verify before "done." Nothing gets called finished on a claim. The session has to prove it — run the test and show the output, diff the behavior against main, demonstrate the fix fires. "It should work now" is not done.
  • I'm the gate at every step — the plan, the fan-out, the diff, the merge — which is what makes the bigger changes safe to attempt.

Fig. 3 — hands-on multi-agent fan-out · pinch or +/− to zoom · drag to pan · ⛶ full size

flowchart TD
    me["Me, at the keyboard"]
    plan["Main session<br/>superjawn: research first"]
    sub1["subagent:<br/>explore / research"]
    sub2["subagent:<br/>codex review"]
    sub3["subagent:<br/>coach (quality)"]
    sub4["subagent:<br/>Copilot CLI"]
    synth["Main session<br/>synthesizes findings"]
    decide{"I decide<br/>in real time"}

    me --> plan
    plan -->|spawn in parallel| sub1
    plan -->|spawn in parallel| sub2
    plan -->|spawn in parallel| sub3
    plan -->|spawn in parallel| sub4
    sub1 --> synth
    sub2 --> synth
    sub3 --> synth
    sub4 --> synth
    synth --> decide
    decide -->|looks good| plan
    decide -->|push back| plan

    class me,decide human
    class sub3 coach
    classDef human fill:#e4002b14,stroke:#e4002b,stroke-width:1.5px
    classDef coach fill:#e4002b26,stroke:#e4002b,stroke-width:2px
          
a step or subagent
me, my decision, or the coach
Same checks, different control loop

Autopilot and hands-on share the same reviewers and the same merge discipline. What changes is who closes the loop: when I'm away, a Telegram approve/deny on a finished PR; at the keyboard, me approving the plan, the fan-out, and the diff as we go. Both modes run through the next section.

When the work is a UI, design gets the same treatment

Interfaces have their own AI-slop failure mode, so design gets the same treatment as code. When a session builds a page, it works against an explicit design rulebook: a distinctive typeface over the default sans, one committed aesthetic over a timid mix, and a list of template tells to avoid — the accent stripe down the edge of a card, the kicker label floating above a heading, the lone gradient on plain text. This page went through that pass. Same idea as the code reviews: name the generic defaults up front so the work clears a higher bar than "it renders."

05

Three reviewers check one change

both modes

The review stack is the core of the setup, and it runs in both modes. Work gets done with Claude Opus at high reasoning. Before anything is committed, a different model reads the change.

First, Codex reviews the uncommitted diff — the exact lines the change adds or removes, before they're saved — for bugs, security, swallowed errors, and style, in at most two passes: one to surface findings, at most one more to confirm the fixes. The effort is matched to the risk — low for a trivial, well-tested diff; high for a non-trivial or security-touching one. High-severity findings get fixed in place; anything that would balloon the scope becomes its own issue instead.

Then an independent coach: a separate Claude subagent at high reasoning that judges quality. Codex already covered correctness, so the coach reads the diff and answers two questions — would this impress a sharp reviewer or land as competent but forgettable, and what single change would raise it most? It runs even when Codex is down, and skips itself when there's no code to review.

Only then does the change get committed and the PR opened, where the Codex cloud connector reviews it again, independently, on GitHub — reasoning about the live repo, not just the diff. The last gate is me.

A second model reviews the high-reasoning model's work before I open the PR — then a coach asks the question neither bug-checker does: is this good?

Match the reviewer's effort to the risk

Codex reviews at an effort set by the change, not a fixed default. A trivial, well-tested diff gets a low-reasoning pass: it's far lighter on tokens, and it keeps the reviewer from overthinking — turn a code reviewer up too high on a simple change and it starts inventing problems, rewriting things that were already fine, and burying the two findings that matter under a pile of speculation. A non-trivial or security-touching change gets a high-reasoning pass instead, where the stronger judgment is worth the cost and is usually right in one shot. Either way it's capped at two passes, not run to convergence. The model writing the code and the coach judging it always run at high reasoning.

Fig. 4 — the pre-PR review chain · pinch or +/− to zoom · drag to pan · ⛶ full size

flowchart TD
    work["Claude Opus, high effort<br/>writes the change"]
    codex["Codex review<br/>via CLI, effort by risk<br/>bugs, security, style"]
    decide{"high or medium<br/>findings?"}
    coach["Coach subagent<br/>Claude, high reasoning<br/>is this good?"]
    commit["commit + push"]
    pr["open pull request"]
    connector["Codex cloud connector<br/>reasons about live repo"]
    gate{"my merge gate"}
    merge["merge"]

    work --> codex --> decide
    decide -->|yes, fix in place| work
    decide -->|clean or 2 passes| coach --> commit --> pr --> connector --> gate
    gate -->|approve| merge
    gate -->|needs changes| work

    class decide,gate human
    class coach coach
    class merge done
    classDef human fill:#e4002b14,stroke:#e4002b,stroke-width:1.5px
    classDef coach fill:#e4002b26,stroke:#e4002b,stroke-width:2px
    classDef done fill:#e4002b,stroke:#e4002b,color:#ffffff
          
a step
your decision, the coach, or the merge

Stacked end to end, that's five independent checks before anything lands — each one looking for something the others don't:

self
Quality-bar passThe session names and makes one high-impact improvement.
Codex
CorrectnessBugs, security, swallowed errors, style.
Coach
QualityIs this good, or just done?
Codex connector
IndependentFresh eyes on the live repo, on the PR.
me
Merge gateNothing ships without my yes.
Why two reviewers instead of one stronger one
  • Different blind spots. The model that wrote the code is the worst judge of it. A different model, and especially a different vendor, fails differently — so it catches things the author rationalized.
  • Different jobs. A correctness pass and a high-reasoning "is this good" pass are not the same review. Asking one model to do both at once gets you a worse version of each.
  • It's nearly free. Codex runs on my ChatGPT subscription and the coach runs on my Claude subscription — both through CLIs, no metered API calls. Adding a reviewer costs latency, not dollars.
06

Issues are the to-do list

both modes

I abuse GitHub issues on every repo. They're the queue the scheduler pulls from, and they're how out-of-scope work gets captured without derailing the change in front of me. When a session notices a problem outside its current task — a bug it tripped over, a follow-up a reviewer flagged as scope creep — it doesn't widen the diff. It files an issue right there, with a finding, the impact, and a suggested fix, and moves on. That issue becomes a future session's work.

It's the most useful habit in the whole setup. Small PRs stay small and reviewable, nothing gets lost, and the backlog builds itself from real work.

Fig. 5 — issues as the to-do list · pinch or +/− to zoom · drag to pan · ⛶ full size

flowchart TD
    me["I file issues<br/>by hand"]
    backlog["Open issues"]
    session["Session works one item"]
    oos{"Spot work outside<br/>this change?"}
    file["gh issue create<br/>finding / impact / fix"]
    pr["PR closes the issue"]

    me --> backlog
    backlog --> session --> oos
    oos -->|no| pr
    oos -->|yes| file --> backlog
    pr --> backlog

    class oos human
    class pr done
    classDef human fill:#e4002b14,stroke:#e4002b,stroke-width:1.5px
    classDef done fill:#e4002b,stroke:#e4002b,color:#ffffff
          
One failure mode the review stack caught

A couple of times, a session picked up an issue I'd already half-finished and never closed, and redid work that was mostly done. Each time, the independent review on the PR caught the duplication before I merged. That's why the reviewers are stacked and independent: when something slips past one layer, the next one catches it as a comment on a PR, before it reaches main.

07

Worktrees keep sessions out of each other's way

both modes

All work happens in git worktrees. Each session gets its own checkout of the repo on its own branch, so two sessions can touch the same repo at the same time without stepping on each other, and the main checkout always stays clean. When a session is done, its branch becomes a PR and the worktree goes away.

Fig. 6 — worktree isolation

worktree A · session 1
branch: fix/issue-142
edit, review, commit
PR #142
worktree B · session 2
branch: feat/issue-150
edit, review, commit
PR #150
both branch from the same repo · main checkout stays clean, no collisions
One sharp edge with worktrees and automated review

If a worktree's branch is behind main (missing changes that have since gone live), an automated reviewer reading "the diff" can see the PR's real changes plus the reverse of every sibling change that already merged into main — and flag those reversals as new bugs. The fix is to rebase the worktree onto current main (replay its changes on top of the latest code) before running the review, so the reviewer only sees what the PR changes.

08

Who does what

both modes

Four models, each on a job that fits it, and a hard rule underneath: no direct API calls — the pay-as-you-go way of calling a model, billed by the word. Everything runs through a CLI on a subscription I already pay for, so adding a model to the pipeline doesn't add a metered bill.

Fig. 7 — who does what

writer

Claude Opus runs the sessions and the coach subagent — the model doing the work, and a separate instance judging it.

correctness

Codex reviews the uncommitted diff before commit, at an effort set by the change — low for a simple diff, high for a risky one. Runs on my ChatGPT subscription.

independent

The Codex cloud connector reviews the PR on GitHub once it opens, reasoning about the live repo state, not just the diff — a second look from a different surface than the pre-PR pass.

support

Gemini handles the cheap, high-volume jobs: triaging email and returning structured output (answers in a fixed, machine-readable format) from the CLI.

ModelRoleInvoked viaMarginal cost
Claude OpusSessions + coach reviewclaude CLIsubscription
CodexPre-PR review, effort by riskcodex exec CLInone (OAuth)
Codex (cloud connector)Independent PR reviewchatgpt-codex-connector botsubscription (weekly allowance)
GeminiEmail triage, structured outputgemini CLIsubscription
Each repo teaches its own reviewers

The reviews are grounded in each repo. Every repo carries a per-repo CLAUDE.md that spells out the project's architecture, the patterns to enforce, and the specific mistakes to watch for — concrete, repo-specific bugs like a missing query limit, a NaN write, a wrong dependency arrow, or a factual error in the docs. The Claude sessions read it on every wake, and the cloud connector reasons about the live repo on GitHub rather than the diff alone — so the review comes back shaped by that repo's context. The project carries its own standards, and every model that works in it inherits them.

The cloud connector fires once when the PR opens and reasons about live repo state. It's subscription-billed — no metered minutes — but it still spends the ChatGPT plan's finite weekly and 5-hour allowance, so I treat it as a once-per-PR resource: I address what it raises in one fix round and don't re-trigger it. The local passes — the Codex CLI review and the coach — do the iterating before the PR ever opens.

09

Context and pre-compaction prep

both modes

Long sessions degrade as the context window fills — the model gets measurably worse well before it runs out of room. I call the last stretch the dumb zone and try to never work in it. Three habits keep sessions sharp.

First, I compact early and on purpose. Autocompact is set to trigger around 300K tokens (the units models count text in, each one roughly three-quarters of a word), and when I hit a natural stopping point I run a manual compact with a note about what we're doing next — so the summary that survives is built around the next task, not whatever happened to be on screen.

Second, before a planned compaction I have the session write a durable handoff file to disk — the task, the decisions made so far, the exact file paths, what's done and what's next. Compaction is lossy: the auto-summary keeps what the model guesses matters, and you can't predict what it drops. A file is byte-exact, and the first step after compacting is to re-read it, which restores full fidelity. This page was built across a compaction exactly that way.

Third, I keep the always-loaded project instructions lean. The CLAUDE.md in each repo gets condensed to pointers — the anti-slop writing rules and the core engineering principles stay in full because they apply every session, but changelogs and old handoffs move to linked files. Every token of standing instructions is a token spent on every session, so the file earns its length or it gets trimmed.

compact here, before the dumb zone
fresh contextworking welldegradingfull
sharp
slipping
the dumb zone

Fig. 8 — the pre-compaction handoff

step 1
Reach a stopFinish a coherent chunk, don't compact mid-thought.
step 2
Write the handoffTask, decisions, file paths, next steps — to disk.
step 3
Compact with a noteTell it what we're doing next.
step 4
Resume from the fileRe-read the handoff first; fidelity restored.
10

Steal this

If you want to build something like it, the parts that matter are simpler than the wiring suggests — and they hold whether you're running on a timer or sitting at the keyboard:

  • Make a queue you trust. Issues work because they're already where the work lives. The scheduler just pulls from them one at a time.
  • Pull one item per session. Small, bounded work produces small, reviewable PRs.
  • Isolate every session. Worktrees mean concurrency never corrupts your main checkout.
  • Put at least one independent reviewer between the writer and main. Use a different model than the one that wrote the code; a different vendor is better still. Stack a couple and the cost of a miss drops to a comment.
  • Match the reasoning level to the task. High for writing and judgment; low for a fast, literal code review that won't overthink.
  • Prove "done," don't claim it. A passing test, pasted output, a before/after beats "this should work now." Make verification a step the session has to run.
  • Respect the context window. Compact before the dumb zone, and write a handoff to disk before you do.
  • Keep one human gate. Everything is built to make that gate a quick yes — from my phone when I'm away, or step by step when I'm there.
Next thing I want to add

An idea from Lars worth stealing: pipe the week's trending GitHub repos into the session brief, filtered for relevance, so a session reaches for an existing tool instead of reinventing one. Not wired up yet — but it's next on the list.

11

Prompts you can borrow

None of this depends on my exact stack. The principles above are instructions you give an agent, so here they are as text — paste them into whatever you use, Claude or Codex or something else, and adapt. They're tool-agnostic and loose on purpose; the shape carries the value. Take what fits, change the rest.

01  the quality bar
Before you start, tell me the single most useful thing you could do to make this better than a rote pass — a fact to verify instead of assert, an earlier file or note to build on, an approach worth trying. Do that thing. Before you finish, re-read what you produced and confirm you did it. If the task is trivial, say so and skip this rather than manufacturing busywork.
02  an independent review before commit
When you finish a change, don't commit yet. First have a different model read the diff for bugs, security holes, and swallowed errors. Match its effort to the risk — low reasoning for a simple, well-tested diff so it stays a fast, literal read instead of a rewrite that invents problems; higher for a complex or security-touching one. Fix anything serious in place; turn anything that would balloon the scope into a separate ticket. Then commit.
03  a coach pass for quality, not bugs
After the correctness review, do one more pass with fresh eyes whose only question is quality, not bugs: would this impress a sharp reviewer, or land as competent but forgettable — and what single change would raise it most? Make that change if it's cheap. Skip this entirely if there's nothing of substance to review.
04  out-of-scope work becomes a ticket
If you notice a problem outside the task in front of you, do not widen the change to fix it. Write it down as a tracked issue — what you found, why it matters, and a suggested fix — and keep the current change small and focused. That note becomes future work, not scope creep in this one.
05  a handoff before you lose context
We're going to run low on context soon. Before that happens, write a handoff file to disk: the task, the decisions made so far, the exact file paths, what's done, and what's left. Be specific enough that a fresh session could resume from the file alone. When we pick this back up, read it first before doing anything else.
06  a standards file the whole repo inherits
Read this repo and draft a short standards file for it — a CLAUDE.md, or a reviewer-instructions file. Two lines on the architecture, the patterns to follow here, and the three mistakes most likely to happen in this codebase. Keep it tight: every line gets read on every task, so each one has to earn its place.
12

Get the kit

Most people I show this to didn't know it was a thing you could do — point an agent at your own GitHub issues and let it work them on a schedule, with the review and approval steps built in. So I packaged the autopilot half as a drop-in kit. You don't have to write the code or know Python: you hand two files to your own agent, answer its questions, and it builds the loop for your machine, whether that's a Mac, a Windows box, or Linux.

It's the careful version on purpose. What keeps autonomous work from turning into unreviewed output that quietly breaks something is bounded scope, a second model on every diff, and a person at the merge. The kit is built around those three, and it's the part most people ask about first.

You aim it as tightly as you want, then widen one ring at a time:

  • Which repos — and which one is the focus right now.
  • Which issues — a label like agent-ready you apply by hand is the tightest gate.
  • Which files — allow docs/** only, deny .github/workflows/, or both.
  • How big — cap the files and lines a single change can touch.
  • Who checks and who approves — a second model reviews every diff; you merge. It never merges its own work.
The kit

github.com/jamditis/claude-skills-journalism → docs/autonomy/kit — a spec your agent follows, the config you fill in, the prompt blocks above in full, reference code, per-OS scheduler templates, and a costs breakdown. MIT-licensed. Open a session with your agent, give it BUILD-WITH-YOUR-AGENT.md and config.example.yaml, and say “set this up for me.”

Tested on Linux only, so far

I've run this end to end on one setup: a Raspberry Pi 5 (8GB) on Ubuntu 25.10, ARM64, with Python 3.13, cron, tmux, and timeout --foreground from uutils coreutils. The macOS and Windows recipes follow the same design but haven't been run end to end as of this version — testing them is planned. And “Linux” has roughly 58 million permutations once you count distros, kernels, and coreutils flavors, so even a different Linux box can differ in the details. Unless you're on that exact setup, have your agent treat the matching template with some skepticism: verify each piece on a throwaway issue before you trust an unattended schedule.

That's the whole thing. Built on a couple of Raspberry Pis, a cron file, and the rule that no model gets to be the only one who checked its own work. Questions or "I built one too" notes welcome.