Understanding the Most Viral Chart in Artificial Intelligence
Another chart that goes up and to the right.
A worker uses ChatGPT artificial intelligence software
Photographer: Nicolas Maeterlinck/AFP/Getty ImagesListen to Odd Lots on Apple Podcasts
Listen to Odd Lots on Spotify
Watch Odd Lots on YouTube
Subscribe to the newsletter
We live in an era of charts that are going up and to the right. This image obviously describes the stock market, particularly any company whose business is adjacent to artificial intelligence. But beyond stocks, another sort of chart we keep seeing is of AI capabilities also going up and to the right. The most famous and viral of these comes from an organization called METR, which stands for Model Evaluation and Threat Research. The organization is focused on understanding the degree to which AI models can engage in autonomous, complex tasks. METR see this is as a particularly important benchmark, given the risk that AI could one day be engaged in recursive self improvement, taking humans out of the loop. But how do you really gauge a model's ability to do complex problems. And what is being measured for exactly? On this episode, we speak with METR's President Chris Painter as well as Joel Becker, a member of the technical staff who works on evaluation methods for the organization. We discuss both the mechanics and the philosophy of METR's work, and what it means when we see a a chart showing that Clause Opus 4.6 can do a task that would take a human nearly 12 hours.