Updated
Updated · NVIDIA Blog · May 13
Hermes Agent, Qwen 3.6 Bring 35B Local AI to NVIDIA RTX and 128GB DGX Spark
Updated
Updated · NVIDIA Blog · May 13

Hermes Agent, Qwen 3.6 Bring 35B Local AI to NVIDIA RTX and 128GB DGX Spark

3 articles · Updated · NVIDIA Blog · May 13
  • Hermes Agent now runs as an always-on local AI framework on NVIDIA RTX PCs, RTX PRO workstations and DGX Spark, pairing with Alibaba’s new Qwen 3.6 models for faster on-device agentic workloads.
  • Qwen 3.6 is the performance driver: the 35B model uses about 20GB of memory while beating prior 120B models, and the 27B model matches Qwen 3.5 397B accuracy at roughly one-sixteenth the size.
  • Hermes is pitched as more reliable than typical agent frameworks because it writes and refines its own skills, uses isolated sub-agents for subtasks, and is stress-tested to work with roughly 30B-class local models.
  • NVIDIA says its Tensor Cores cut inference latency so multistep tasks and skill refinement finish in seconds, while DGX Spark’s 128GB unified memory and 1 petaflop of AI performance can sustain all-day agent workflows.
  • The push reflects growing demand for local, provider-agnostic agents: Hermes topped 140,000 GitHub stars in under three months and was the most-used agent on OpenRouter as of last week.
If AI agents boost developer morale but decrease team output, what is their true value in the workplace?
With AI agents acting autonomously, who bears the legal responsibility when they inevitably cause a costly error?
As autonomous AI learns user habits, how can enterprises prevent it from becoming an undetectable insider threat?

Local AI Agents in 2026: Hermes Agent and Qwen 3.6 Performance, Cost, and Security on NVIDIA GPUs

Overview

In spring 2026, artificial intelligence saw major progress with the rise of the Hermes agent, a large language model designed to run directly on user hardware. This local operation is a key architectural choice, as the user’s experience with Hermes depends on the power and specifications of their own system. Specialized hardware, especially NVIDIA RTX GPUs, plays a crucial role in handling the heavy computational demands of these advanced AI agents. By focusing on local execution, Hermes enables high-quality, responsive AI experiences, bridging the gap between powerful AI capabilities and accessible, on-device performance.

...