Why XCSteward exists

Coding agents made an old iOS testing problem impossible to ignore: local simulator test execution is fragile shared state.

Where this came from

I started XCSteward while dogfooding coding agents on my own iOS projects. The agents were genuinely useful — but to do their job they kept invoking xcodebuild, simctl, and XCTest, over and over, against real simulators on my Mac.

On a lot of stacks, running tests locally is comparatively boring. On macOS with iOS simulators it is not. Test execution leans on a pile of fragile shared state: CoreSimulator, booted devices, simctl, xcodebuild, DerivedData, .xcresult bundles, logs, timeouts, cleanup — and state that survives failed or interrupted runs. Once an agent is driving that machinery on a loop, the environment did not merely become “hard to reason about.” It broke.

The concrete failures looked like this:

None of this is exotic. Anyone who has run iOS tests on a Mac for long enough has hit some of it. The agents did not invent these failures — they just hit them far more often than I ever did by hand, and left the machine in a worse state each time.

This is not generic test flakiness

It is worth being precise, because “flaky tests” is a broad complaint and XCSteward only addresses a narrow slice of it. It targets operational fragility in test execution — the simulator and the local Mac environment around your tests — not the tests themselves. If your assertion is wrong, your UI selector is racy, your backend is down, or your snapshot baseline is stale, XCSteward has nothing useful to say.

Out of scope — these are test or app problems

  • Broken tests that fail on their own logic.
  • App-level UI test flakiness and timing bugs in your test code.
  • Code signing and provisioning problems.
  • Missing or un-downloaded Xcode runtimes.
  • Bugs inside a specific simulator runtime / vendor image.
  • Network, backend, or mock-server instability.

In scope — execution-environment fragility

  • Simulator runs that hang before tests start.
  • xcodebuild stuck resolving or booting destinations.
  • simctl / CoreSimulator wedged or deadlocked.
  • Artifacts and logs lost when a run is interrupted.
  • Stale device or DerivedData state leaking between runs.

Why coding agents made it urgent

A human usually runs one test command, waits, looks at the result, and runs the next one. Agents, scripts, and hooks do not work that way. They can fire more commands, more often, and sometimes concurrently — a build here, a re-run there, a cleanup script in between. Put two or three projects on the same Mac, each with its own agent, and the pressure on CoreSimulator and the Simulator subsystem multiplies.

The important framing: the problem is not “AI wrote bad code.” It is that AI increased the rate and concurrency of access to an execution subsystem that was already fragile under a single careful user. The invocation path — agent, human, shell script, hook, MCP, or a local CI-like job — is not the core issue. The issue is that all of them touch the same fragile shared state, and nothing coordinates that access.

What XCSteward does

Concretely, it is a local stewardship layer that runs close to your tools and governs how runs execute:

Plus structured output — JSON and durable artifacts — so both a human and an agent can tell what happened, instead of scraping a scrollback buffer that may already be gone.

The current CLI keeps those audiences separate. Humans can use submit --wait, status --watch, and logs --follow for compact job context and live observation. Agents and automation should keep using --json, add --progress for long waits when useful, and branch on the JSON contract rather than human text.

What XCSteward does not claim

Narrow tools earn trust by being honest about their edges. So, plainly:

Why local-first

The failure happens on the Mac that owns the simulator state. That is where xcodebuild, simctl, CoreSimulator, DerivedData, logs, and the devices themselves live. A steward that is going to lease devices, enforce readiness, isolate artifacts, and clean up wedged state has to run there too — right next to the thing it is protecting.

This is not an argument against hosted CI. Hosted CI is still useful, and XCSteward is not trying to replace it. It targets a different layer: the local, shared-Mac execution environment where agents, scripts, hooks, and a human are all reaching for the same simulators at once.

If this sounds familiar

XCSteward is most useful to people who have actually felt this: iOS developers using coding agents, mobile platform engineers, iOS CI and test-infra owners, and anyone who has watched a simulator wedge, a boot fail, an xcodebuild hang, or an .xcresult bundle vanish. If that is you, try the alpha or share a failure mode — the most valuable feedback is a real, reproducible one.

Try the alpha Browse the failure-mode library

XCSteward is dogfooded on my own real iOS projects — each run is tracked in the public dogfood ledger.