Updated
Updated · InfoWorld · Jul 2
AWS Raises Bedrock AgentCore Quotas 5x to 5,000 Sessions for Enterprise AI
Updated
Updated · InfoWorld · Jul 2

AWS Raises Bedrock AgentCore Quotas 5x to 5,000 Sessions for Enterprise AI

1 articles · Updated · InfoWorld · Jul 2

Summary

  • AWS lifted Amazon Bedrock AgentCore default runtime limits by up to fivefold, including 5,000 concurrent sessions in its two main US regions and 2,500 elsewhere, with the changes applying automatically to enterprise accounts.
  • 200 tokens per second per agent—up from 25—and 400 new container sessions per minute—up from 100—are meant to let enterprises handle more simultaneous requests and peak-demand spikes without filing quota-increase requests.
  • Forrester, Gartner and Kanerika analysts said the move reflects enterprises shifting AI agents from pilots to production, where higher concurrency, longer-running sessions and multi-agent orchestration can quickly hit default ceilings and delay deployments for days or weeks.
  • Customer service, DevOps, IT operations, finance, healthcare, supply chain and security teams running high-concurrency workloads stand to benefit most, as throttled stateful sessions can break workflows and lose intermediate context.
  • AWS's approach contrasts with Microsoft Azure Foundry Agent Service, where many runtime limits are fixed and scaling flexibility sits more at the model deployment layer than the agent runtime layer.

Insights

Beyond easier scaling, what are the hidden operational costs of deploying complex multi-agent systems on AWS?
How does AWS's agent scaling strategy truly stack up against rivals' long-term enterprise value?
With the EU AI Act deadline imminent, can new tools truly govern AI agents now scaling faster than ever?

Amazon Bedrock AgentCore Runtime Raises Default Quotas 5x–8x: Transforming Enterprise AI Scalability in 2026

Overview

In June 2026, Amazon made a major update to its Bedrock AgentCore Runtime by automatically increasing default service quotas for all accounts. This change was designed to support higher-scale workloads, allowing users to handle much larger volumes of concurrent operations. Specifically, the active session workload limit jumped fivefold in key US regions and quintupled in others, making it easier for businesses to run more demanding applications without manual adjustments. As a result, users can now leverage these expanded capabilities immediately, streamlining the deployment and management of complex AI agent workloads.

...