A coding agent that runs iOS tests as part of its loop keeps hitting
xcodebuild timeouts or hangs. It retries, which can make things worse, and the
failures are hard to reproduce by hand.
What it usually looks like
The agent invokes xcodebuild, the call stalls at boot/resolution, and the
agent’s own timeout fires.
The agent retries automatically, launching another xcodebuild while the
first is still wedged — compounding the contention.
Multiple agents (or an agent plus your manual run) touch simulators at once.
The agent scrapes long, ambiguous output and cannot tell “still booting” from
“stuck”.
If the runner fails before XCTest attaches, the agent treats it like a generic
failed test because raw xcodebuild output does not give it a stable
classification.
Failures spike when the Mac is busy and vanish when it is idle.
Why it happens / likely failure classes
Agents make the underlying local simulator fragility frequent and hard to reason
about:
Single-run fragility too. Even one agent on an idle Mac hits boot,
resolution, and readiness stalls — concurrency just amplifies it.
Output scraping instead of a structured result makes timeouts ambiguous.
Missing job ownership context makes it harder to tell which agent, task,
or retry produced a given run and artifact set.
Quick checks
# Is the agent's xcodebuild colliding with other simulator activity?pgrep -lf 'xcodebuild|simctl|Simulator'# Are stale runs from earlier agent attempts still alive?pgrep -lf 'xcodebuild|XCTest|testmanagerd'# Does the same command run cleanly by hand on an idle machine?time xcodebuild test -scheme YourScheme \ -destination "platform=iOS Simulator,id=<UDID>"
If the command is reliable by hand but fails under the agent’s loop, the problem
is coordination and retries, not your tests.
Manual mitigations
Serialize the agent’s simulator runs behind a lock so retries queue
instead of stacking:
Pin a concrete device by UDID and isolate artifacts per attempt.
Bound each run with a timeout and recover (shutdown/erase) before retrying,
rather than launching another run on a wedged subsystem.
Give the agent a structured pass/fail signal instead of having it scrape
console output.
Attach ownership hints when your runner supports it, so retries and
artifacts can be tied back to a task.
When XCSteward may help
This is one of the situations XCSteward is most directly designed for:
A queue / single execution lane so an agent’s runs (and retries) are
serialized instead of colliding with each other or with you.
A stable CLI contract with structured results, so an agent gets a clear
pass/fail/timeout instead of scraping walls of text.
Agent-friendly discovery and setup through projects --json,
profile show <name> --json, and profile init --detect --json.
Streaming machine progress with --progress on long JSON waits, and
status <job-id> --watch --json when the agent wants newline-delimited full
JobSummary updates. When command events are available, progress events add
phase and phase_elapsed_seconds.
Bounded diagnosis through explain <job-id> --json, plus metadata and
labels from submit --metadata key=value and --label. Use repeatable
submit --env KEY=VALUE when the agent needs per-run xcodebuild
environment injection; XCSteward records override keys, not sensitive values.
Pre-XCTest bootstrap classification with runner_bootstrap_failure, which
means runner or environment setup failed before XCTest attached. Agents can
inspect explain <job-id> --json and retry environment failures carefully
instead of blind-retrying real test failures.
Timeout-before-attach detail with diagnostic_excerpt.subtype = pre_xctest_timeout, meaning the test command hit its timeout before
XCSteward observed XCTest attach/test execution evidence.
Readiness checks, bounded timeouts, and deterministic recovery so a wedged
run fails fast and the next attempt starts clean.
Isolated artifacts per run so concurrent attempts do not corrupt each
other.
A strong candidate to test against this class of failure if agents drive your
iOS test runs.
When XCSteward probably will not help
It does not make a flaky or genuinely broken test pass — it makes
execution more predictable, not the test logic correct.
It is not an agent framework and does not change how your agent decides to
retry; the reusable generic skill under
Examples/agents/skills/xcsteward/ is guidance for using the CLI contract,
not a new protocol layer.
It does not address code signing, missing runtimes, or vendor image bugs.
Common questions
What does "AI coding agent hits xcodebuild timeouts" usually mean?
It usually points to agent-driven contention / unbounded runs. A coding agent running iOS tests keeps hitting xcodebuild timeouts or hangs — frequently because its runs collide with other simulator activity. Start by checking simulator readiness, destination selection, CoreSimulator/simctl responsiveness, and whether another xcodebuild, simctl, or Simulator process is already active before treating it as a test-code failure.
Can XCSteward help with "AI coding agent hits xcodebuild timeouts"?
This is a strong fit when the failure is operational: simulator readiness, destination resolution, CoreSimulator responsiveness, cleanup, timeouts, or local concurrency. XCSteward may help by making those phases bounded, serialized, and easier to inspect. It will not fix broken tests, code signing, missing runtimes, or vendor image bugs.
What should I check first?
Check whether xcrun simctl commands return promptly, whether xcodebuild can resolve a concrete simulator destination, whether the device is truly ready rather than merely Booted, and whether concurrent agents, scripts, or manual runs are touching the same simulator subsystem.