Updated

Updated · Anthropic · Apr 27

Claude models outperform experts on bioinformatics challenges with BioMysteryBench

Updated

Updated · Anthropic · Apr 27

Claude models outperform experts on bioinformatics challenges with BioMysteryBench

4 articles · Updated · Anthropic · Apr 27

Claude Mythos Preview solved 30% of 23 human-difficult bioinformatics tasks, while Opus 4.6 achieved 81% overall accuracy on CompBioBench, surpassing panels of five domain experts in several cases.
BioMysteryBench, featuring 99 real-world bioinformatics questions, demonstrated that recent Claude models reliably solve most human-solvable problems and uniquely tackle tasks previously unsolved by humans, using diverse and sometimes novel strategies.
These results highlight rapid advances in AI scientific capability, with Claude models now serving as valuable collaborators in bioinformatics research and prompting new benchmarks to measure AI's evolving research skills.

What makes the new Claude Mythos model excel at science yet remain inconsistent on the hardest problems?

If AI is outperforming scientists, why does it still fail specialized exams designed by human experts?

How can scientists trust an AI's 'brittle' solutions to problems that humans cannot solve themselves?

Is AI developing true scientific understanding or just becoming a highly sophisticated research tool?

How should university education change now that AI is a core collaborator in scientific research?