Updated
Updated · GIGAZINE(ギガジン) · Jun 19
Puget Benchmarks Intel Arc Pro B70 at $949, Showing 32GB VRAM Cuts Local AI Token Costs
Updated
Updated · GIGAZINE(ギガジン) · Jun 19

Puget Benchmarks Intel Arc Pro B70 at $949, Showing 32GB VRAM Cuts Local AI Token Costs

1 articles · Updated · GIGAZINE(ギガジン) · Jun 19

Summary

  • Puget Systems found Intel's $949 Arc Pro B70 can run 8B-class LLMs on one 32GB card and 27B-35B FP16 models on four cards, framing it as a practical local-AI option despite slower per-GPU speed than Nvidia's RTX 5090.
  • Single-card results reached 72.9 tokens per second on Qwen2.5 3B and 66.9 on DeepSeek R1 8B, while four-card throughput scaled to 905 tokens per second for DeepSeek R1 8B with eight concurrent users.
  • Electricity-only cost came to $0.43 per million output tokens for Qwen2.5 3B and $3.18 for Qwen3.6-27B, which Puget said was still 3.8 times cheaper than Gemini 3.1 Pro; multi-user loads pushed some costs down to about $0.06.
  • Image-generation tests also fit comfortably in memory: a 15.6GB ComfyUI plus Z-Image Turbo pipeline produced 1024x1024 images in 3.9 seconds after warm-up, with 10 straight successful runs.
  • The trade-off is software friction and upfront cost: bfloat16 models such as Gemma could not run on the current vLLM XPU backend, multi-GPU setup needed container tweaks, and Puget's four-B70 workstation was estimated at about $18,000.

Insights

Is Intel's high-VRAM, low-cost GPU strategy enough to challenge NVIDIA's dominance in the AI hardware market?
With soaring cloud and memory costs, will local AI systems like this become the new standard for businesses?