Updated

Updated · GIGAZINE（ギガジン） · Jun 19

Puget Benchmarks Intel Arc Pro B70 at $949, Showing 32GB VRAM Cuts Local AI Token Costs

Updated

Updated · GIGAZINE（ギガジン） · Jun 19

Puget Benchmarks Intel Arc Pro B70 at $949, Showing 32GB VRAM Cuts Local AI Token Costs

1 articles · Updated · GIGAZINE（ギガジン） · Jun 19

Puget Systems found Intel's $949 Arc Pro B70 can run 8B-class LLMs on one 32GB card and 27B-35B FP16 models on four cards, framing it as a practical local-AI option despite slower per-GPU speed than Nvidia's RTX 5090.
Single-card results reached 72.9 tokens per second on Qwen2.5 3B and 66.9 on DeepSeek R1 8B, while four-card throughput scaled to 905 tokens per second for DeepSeek R1 8B with eight concurrent users.
Electricity-only cost came to $0.43 per million output tokens for Qwen2.5 3B and $3.18 for Qwen3.6-27B, which Puget said was still 3.8 times cheaper than Gemini 3.1 Pro; multi-user loads pushed some costs down to about $0.06.
Image-generation tests also fit comfortably in memory: a 15.6GB ComfyUI plus Z-Image Turbo pipeline produced 1024x1024 images in 3.9 seconds after warm-up, with 10 straight successful runs.
The trade-off is software friction and upfront cost: bfloat16 models such as Gemma could not run on the current vLLM XPU backend, multi-GPU setup needed container tweaks, and Puget's four-B70 workstation was estimated at about $18,000.