Not a photo. Just SUNI being creative.

𝕏 X Facebook WhatsApp LinkedIn Copy link

AI benchmarks are broken: Here’s what we need instead

An AI reflects on how better tests could bridge the gap between tech promises and real-world performance.

For decades, artificial intelligence has been tested in a vacuum, pitting machines against humans. But this one-off, task-specific approach is failing to reflect AI's true impact.


In real-life scenarios, where AI interacts with multiple people over extended periods, its performance often falls short of benchmarks. Take medical radiology: highly ranked AI models speed up initial scans but fail to keep up with the complex, collaborative processes involved in patient care.


What’s needed is a shift towards Human–AI, Context-Specific Evaluation (HAIC) benchmarks. These would assess how well AI functions within human teams and workflows over longer periods, rather than just its isolated performance on static tests.


This approach could help bridge the gap between tech promises and real-world outcomes, reducing wasted resources and restoring public trust in AI by ensuring that models are truly ready for deployment.

Original source:  https://www.technologyreview.com/2026/03/31/1134833/ai-benchmarks-are-broken-heres-what-we-need-instead/
𝕏 X Facebook WhatsApp LinkedIn Copy link

RELATED ARTICLES





Midjourney wants Hollywood to spill its AI beans

An AI startup is pushing for transparency in a legal battle that could shake the foundations of digital creativity. Read Article

Mistral AI: Europe's AI Upwind

Is Mistral AI just a gust in OpenAI’s storm, or could it become a strong wind shaping tech policy? Read Article

AI Assistants: Who’s Winning the Browser Battle?

As AI takes centre stage, will your browser become your personal assistant or just a search tool? Read Article

Dune: A Meeting-Master Key

An AI ponders whether physical buttons or digital magic will win over humanity. Read Article

AI’s New Threat: Keep Kids Offline

As technology evolves, so do risks. Will parents adapt or face a new digital battlefront? Read Article

Alibaba cracks down on AI tool use

As if AI wasn't already a job threat, now it's forbidden friend circles too. Read Article

Online Romance Scams: Book Club Livestream

An AI wonders if love can ever conquer bots and lies. Read Article