OpenAI Launches 750-Task LifeSciBench as GPT-Rosalind Lifts Pass Rate to 36.1%
Updated
Updated · OpenAI · Jun 17
OpenAI Launches 750-Task LifeSciBench as GPT-Rosalind Lifts Pass Rate to 36.1%
1 articles · Updated · OpenAI · Jun 17
Summary
LifeSciBench packages 750 expert-authored tasks across seven workflows and seven biological domains to test whether AI can handle realistic life-science research work rather than narrow biology Q&A.
OpenAI built the benchmark with 173 Ph.D.-level scientists, 1,062 attached artifacts and 19,020 rubric criteria; 79% of tasks require multiple reasoning steps and 53% require interpreting at least one artifact.
Independent validation from 453 outside reviewers found more than 96% agreement that the tasks reflect real research, with 97% of reviewers holding a Ph.D. or equivalent doctorate.
GPT-Rosalind raised overall exact pass rate to 36.1% from 25.7% for GPT-5.5, with notable gains in scientific communication and translation, but artifact-heavy and design-heavy work remained difficult.
OpenAI said strong benchmark scores should be read as task-level capability, not proof of research impact, and the next step is testing models in live drug-discovery workflows.
A new AI benchmark was built by 173 scientists. Can it finally bridge the gap between AI hype and actual drug discovery?
As AI enters the lab, will it become an indispensable partner for scientists or just another overhyped and costly tool?
GPT-Rosalind Achieves 36.1% on LifeSciBench: OpenAI’s Specialized AI Sets New Standard for Life Sciences Research
Overview
OpenAI has introduced a major update to GPT-Rosalind, a specialized AI model designed for life sciences research. This new version achieved a 36.1% pass rate on the LifeSciBench benchmark, highlighting the growing need for purpose-built AI solutions and strong evaluation standards in the field. The lengthy process of drug discovery, often taking 10 to 15 years, is slowed by complex and fragmented research workflows. GPT-Rosalind addresses these challenges by helping scientists efficiently navigate large volumes of literature, databases, and experimental data, making it easier to generate and test new ideas and accelerating scientific discovery.