Updated
Updated · O'Reilly Media · May 15
Viv Trivedy Defines AI Harness Engineering, Citing Top 30-to-Top 5 Benchmark Jump
Updated
Updated · O'Reilly Media · May 15

Viv Trivedy Defines AI Harness Engineering, Citing Top 30-to-Top 5 Benchmark Jump

1 articles · Updated · O'Reilly Media · May 15
  • Viv Trivedy frames “harness engineering” as the discipline of improving AI agents by fixing each recurring failure in prompts, tools, hooks, memory and execution logic rather than waiting for a better model.
  • A key proof point is Terminal Bench 2.0, where the same Claude Opus 4.6 model scored far better in a custom harness; Trivedy says one coding agent rose from Top 30 to Top 5 by changing only the harness.
  • The approach treats an agent as “model plus harness,” with the harness covering system prompts, AGENTS.md files, tool routing, sandboxes, observability, context compaction, verification loops and subagents.
  • That shifts engineering toward ratcheting rules from real mistakes—blocking destructive commands, injecting test failures back into the loop, and keeping repo instructions short enough to stay effective.
  • The broader implication is a move from raw LLM APIs toward “harness as a service,” as SDKs from major vendors package loops, tools and context management while teams tune domain-specific behavior.
Could harness engineering become the true competitive edge in AI, overshadowing the importance of the underlying language models themselves?
How might approaches from other engineering fields inspire better safety and governance for AI agents as harness complexity grows?