People use Super Mario to compare AI now
Thought Pokémon was a difficult indicator for AI? A group of researchers claim that Super Mario Bros. is even more difficult.
HAO AI LAB, a research organization at the University of California, San Diego, threw AI at SUPER MARIO Bros. Live. Anthropic CLOD 3.7 Perform the best, followed by Claude 3.5. Google’s Gemini 1.5 Pro and on Openai GPT-4O fights.
It was not quite the same version of Super Mario Bros. as the original edition of 1985 to be clear. The game works in an emulator and integrates with a frame, GamingagentTo control AIS over Mario.

Gamingagent, which Hao developed internally, fed the basic AI instructions, such as “if an obstacle or enemy is close, move/jump to the left to avoid the screenshots in the game. AI then generates inputs in the form of a Python Mario control code.
However, Hao says the game forces every model to “learn” to plan complex maneuvers and to develop strategies for gameplay. Interestingly, the laboratory found that models of reasoning like Openai O1They “think” through step-by-step problems to get to solutions, they were worse than “non-vocal” models, although they were generally stronger in most indicators.
One of the main causes of reasoning models has real-time game problems, such as they take some time-sekunds, usually to solve action, according to researchers. At Super Mario Bros. Time is everything. The second may mean the difference between a jump safely cleaned and a decline to your death.
Games have been used to compare AI for decades. But Some experts have questioned wisdom To draw links between AI’s playing skills and technological progress. Unlike the real world, games tend to be abstract and relatively simple and provide theoretically infinite amount of AI training data.
The recent lamentable play indicator indicates what Andrei Carpati, researcher and founder at Openai, called a “evaluation crisis”.
“I don’t really know what (AI) indicators to look at right now,” he writes in PostS “Tldr my reaction is that I don’t really know how good these models are right now.”
At least we can watch AI Play Mario.