Why XCSteward exists

Coding agents made an old iOS testing problem impossible to ignore: local simulator test execution is fragile shared state.

Where this came from

I started XCSteward while dogfooding coding agents on my own iOS projects. The agents were genuinely useful — but to do their job they kept invoking xcodebuild, simctl, and XCTest, over and over, against real simulators on my Mac.

On a lot of stacks, running tests locally is comparatively boring. On macOS with iOS simulators it is not. Test execution leans on a pile of fragile shared state: CoreSimulator, booted devices, simctl, xcodebuild, DerivedData, .xcresult bundles, logs, timeouts, cleanup — and state that survives failed or interrupted runs. Once an agent is driving that machinery on a loop, the environment did not merely become “hard to reason about.” It broke.

The concrete failures looked like this:

Wedged or unresponsive simulators.
Boots, shutdowns, or erases that failed or hung.
xcodebuild timing out with no clean exit.
Destination resolution hanging before tests even started.
CoreSimulator getting into a state nothing could talk to.
Corrupt or missing build artifacts.
Lost logs and missing .xcresult bundles after a crash.
No obvious recovery steps short of killing processes and rebooting.

None of this is exotic. Anyone who has run iOS tests on a Mac for long enough has hit some of it. The agents did not invent these failures — they just hit them far more often than I ever did by hand, and left the machine in a worse state each time.

This is not generic test flakiness

It is worth being precise, because “flaky tests” is a broad complaint and XCSteward only addresses a narrow slice of it. It targets operational fragility in test execution — the simulator and the local Mac environment around your tests — not the tests themselves. If your assertion is wrong, your UI selector is racy, your backend is down, or your snapshot baseline is stale, XCSteward has nothing useful to say.

Out of scope — these are test or app problems

Broken tests that fail on their own logic.
App-level UI test flakiness and timing bugs in your test code.
Code signing and provisioning problems.
Missing or un-downloaded Xcode runtimes.
Bugs inside a specific simulator runtime / vendor image.
Network, backend, or mock-server instability.

In scope — execution-environment fragility

Simulator runs that hang before tests start.
xcodebuild stuck resolving or booting destinations.
simctl / CoreSimulator wedged or deadlocked.
Artifacts and logs lost when a run is interrupted.
Stale device or DerivedData state leaking between runs.

Why coding agents made it urgent

A human usually runs one test command, waits, looks at the result, and runs the next one. Agents, scripts, and hooks do not work that way. They can fire more commands, more often, and sometimes concurrently — a build here, a re-run there, a cleanup script in between. Put two or three projects on the same Mac, each with its own agent, and the pressure on CoreSimulator and the Simulator subsystem multiplies.

The important framing: the problem is not “AI wrote bad code.” It is that AI increased the rate and concurrency of access to an execution subsystem that was already fragile under a single careful user. The invocation path — agent, human, shell script, hook, MCP, or a local CI-like job — is not the core issue. The issue is that all of them touch the same fragile shared state, and nothing coordinates that access.

Complementary to MCPs, not a replacement

MCPs help an agent ask for a test run. XCSteward makes that test run happen through a controlled local lane.

Existing Xcode / XCTest MCPs are useful bridges — they let an agent call Xcode, Simulator, or XCTest tools. Shell commands, hooks, and scripts do the same job from a different angle. They are all invocation paths: ways to request that a run happen.

XCSteward coordinates what happens after a run is requested. The invocation path does not by itself lease devices, isolate DerivedData and artifacts, preserve logs, serialize competing runs, or clean up broken state. That is the gap XCSteward fills, sitting underneath your MCP, shell, and hooks rather than competing with them.

What XCSteward does

Concretely, it is a local stewardship layer that runs close to your tools and governs how runs execute:

A controlled local execution lane. Run tests through one predictable path instead of ad-hoc xcodebuild invocations scattered across scripts and agents.
A scheduler / queue. Submit runs from multiple agents and scripts; they execute in a coordinated order rather than colliding on shared state.
Readiness checks. Confirm the simulator subsystem and device are actually ready — not just "Booted" — before handing off to xcodebuild.
Isolated artifacts. Per-run DerivedData, result bundles, logs, and structured summaries, so runs do not overwrite each other.
Timeouts and cleanup. Bound each phase so a wedge becomes a fast, legible failure, and tear down devices and processes deterministically afterward.
Observable CLI runs. Plain waits print the job id, job directory, watch/follow commands, and compact progress instead of disappearing into a silent command.
Explicit bootstrap diagnosis. Pre-XCTest runner or environment setup failures are classified separately from real test failures, with the simulator detail and artifacts preserved.
A JSON contract for automation. Agents and scripts can use JSON summaries, phase-aware progress events, profile discovery, metadata, per-run env injection, and bounded explanations instead of scraping human text.

Plus structured output — JSON and durable artifacts — so both a human and an agent can tell what happened, instead of scraping a scrollback buffer that may already be gone.

The current CLI keeps those audiences separate. Humans can use submit --wait, status --watch, and logs --follow for compact job context and live observation. Agents and automation should keep using --json, add --progress for long waits when useful, and branch on the JSON contract rather than human text.

What XCSteward does not claim

Narrow tools earn trust by being honest about their edges. So, plainly:

It does not fix all XCTest flakiness.
It does not fix bad tests or wrong assertions.
It does not replace your CI.
It does not replace XcodeBuildMCP or any other MCP.
It does not make the iOS Simulator perfectly reliable.
It does not patch Apple / Xcode / CoreSimulator bugs.
It is alpha software, and is honest about that.

Why local-first

The failure happens on the Mac that owns the simulator state. That is where xcodebuild, simctl, CoreSimulator, DerivedData, logs, and the devices themselves live. A steward that is going to lease devices, enforce readiness, isolate artifacts, and clean up wedged state has to run there too — right next to the thing it is protecting.

This is not an argument against hosted CI. Hosted CI is still useful, and XCSteward is not trying to replace it. It targets a different layer: the local, shared-Mac execution environment where agents, scripts, hooks, and a human are all reaching for the same simulators at once.

If this sounds familiar

XCSteward is most useful to people who have actually felt this: iOS developers using coding agents, mobile platform engineers, iOS CI and test-infra owners, and anyone who has watched a simulator wedge, a boot fail, an xcodebuild hang, or an .xcresult bundle vanish. If that is you, try the alpha or share a failure mode — the most valuable feedback is a real, reproducible one.

Try the alpha Browse the failure-mode library

XCSteward is dogfooded on my own real iOS projects — each run is tracked in the public dogfood ledger.