Updated
Updated · Hackaday · Jun 8
Author Reassesses AI Coding Assistants After 72.17% Model Critique, Still Questions Their Value
Updated
Updated · Hackaday · Jun 8

Author Reassesses AI Coding Assistants After 72.17% Model Critique, Still Questions Their Value

3 articles · Updated · Hackaday · Jun 8

Summary

  • A follow-up review revisited AI coding assistants after criticism of an earlier test, examining whether the wrong frontend, model and prompting method had skewed the original verdict.
  • LiveBench coding scores showed the previously used Claude Haiku 4.5 at 72.17%, below stronger options such as GPT-5.2 Codex at 83.62%, but the author argued even the top result leaves too much room for error.
  • The reassessment found little objective evidence that one coding frontend clearly outperforms another, while noting practical limits around IDE integration, paid tiers and GitHub Copilot's paused new sign-ups during a billing change.
  • Prompt engineering emerged as a major point of contention: benchmark-style prompts require detailed instructions and environmental context, which the author said undercuts claims that these tools meaningfully reduce coding effort.
  • The article ultimately kept its skeptical conclusion, arguing coding assistants behave more like unreliable automation than junior developers and that any real productivity gains remain unproven outside narrow benchmark tasks.

Insights

If AI now writes the code, what is the future role for a human developer?
Is AI creating a software security crisis faster than we can solve it?