Enlarge / Google, Anthropic, Cohere, and Mistral have each released AI models over the past two months as they seek to unseat OpenAI from the top of public rankings. (credit: FT) The increasing power of the latest artificial intelligence systems is stretching traditional evaluation methods to the breaking point, posing a challenge to businesses and public bodies over how best to work with the fast-evolving technology. Flaws in the evaluation criteria commonly used to gauge performance, accuracy, and safety are being exposed as more models come to market, according to people who build, test, and invest in AI tools.