Scientists built the hardest AI test ever and the results are surprising

中文日本語 Español

ScienceDaily Mar 16, 2026

Researchers created "Humanity's Last Exam" (HLE), a 2,500-question test, because current AI benchmarks are too easy.

Summary

As advanced AI models began scoring too highly on existing academic benchmarks like MMLU, nearly 1,000 international researchers developed a new, rigorous assessment called "Humanity's Last Exam" (HLE). This 2,500-question exam covers specialized fields like ancient languages and advanced mathematics, with questions designed to require deep, verifiable human expertise and resist simple internet searches. Questions that leading AI models could answer were removed to ensure the test remained challenging. Early results showed even top models struggled significantly, with GPT-4o scoring 2.7% and the best models reaching only 40-50% accuracy. Dr. Tung Nguyen of Texas A&M noted that HLE measures depth and context beyond pattern recognition, emphasizing that accurate assessment tools are crucial for policymakers to understand AI's true capabilities and risks. The exam is intended as a durable benchmark, with most questions kept private to prevent memorization, highlighting the remaining gap between current AI and genuine human expertise.

(Source：ScienceDaily)

中文日本語 Español

Read Full Article

The Verge Apr 30, 2026

Meta lost 20 million users last quarter

The Verge Apr 30, 2026

OpenAI’s new security model is for ‘critical cyber defenders’ only

The Verge Apr 30, 2026

The more young people use AI, the more they hate it

TechCrunch Apr 30, 2026

SoftBank is creating a robotics company that builds data centers — and already eyeing a $100B IPO

Gizmodo Apr 30, 2026