Tutorial Shows 35B Local Coding Agents Matching 20-40 Tok/Sec With Open-Source Harnesses
Updated
Updated · magazine.sebastianraschka.com · Jun 27
Tutorial Shows 35B Local Coding Agents Matching 20-40 Tok/Sec With Open-Source Harnesses
3 articles · Updated · magazine.sebastianraschka.com · Jun 27
Summary
A new tutorial walks readers through building a fully local coding agent stack using Ollama plus open-source harnesses such as Qwen-Code, Codex and Claude Code.
Qwen3.6 35B-A3B is the main model tested: it needs about 30-40 GB of RAM, downloads at roughly 22 GB, and delivered about 40 tok/sec on a recent Mac Mini and 30 tok/sec on a DGX at 50k context.
Small capability checks found Qwen3.6 and Cohere's North Mini Code solved 4 of 5 benchmark tasks in Qwen-Code, while Codex sometimes outperformed Qwen's native harness and Claude Code used far more tokens.
The guide also urges a security audit before use, warning that coding agents can run shell commands and that Qwen-Code may send telemetry unless settings disable usage statistics, prompt logging, auto-updates and hooks.
The broader takeaway is that 30-35B open-weight models are now viable for many coding workflows locally, offering lower privacy risk and fixed costs if users accept setup complexity and hardware demands.