UST Urges 2.5-Second, 3-Cent Contracts for Enterprise GenAI Services

5 articles · Updated · InfoWorld · Jun 1

UST says enterprise GenAI should be run like any other production service, with explicit contracts covering p95 latency, availability, error budgets, behavior under load and per-request cost.
A 2.5-second latency target or 3-cent answer budget changes core design choices, pushing teams to tune retrieval, routing and model selection instead of relying on prompt tweaks.
UST frames retrieval as the main system in most enterprise assistants, calling for permission-aware access, refresh and rollback paths, quality metrics, and context formatted for citations and tracing.
Continuous evaluation and end-to-end tracing should start early, with real-query test sets, separate retrieval and generation metrics, request IDs, and logs for re-ranking, routing, tool calls and policy decisions.
At scale, UST says dependable GenAI also needs cache-first routing, graceful degradation modes such as sources-only answers or human handoff, plus runbooks and rollback plans before broad rollout.

As AI costs rise, how can enterprises prevent successful pilots from becoming too expensive to scale across the organization?

GenAI creates a new productivity divide. How can companies bridge this human skill gap before it fractures their workforce?

When your AI can access all company data, how do you stop it from becoming your biggest insider threat?