Why Your Childhood Pokémon Games Are the Newest Test for Advanced AI

Classic Pokémon games from the 1990s are turning into an unexpected testing ground for modern artificial intelligence, according to a new report from the Wall Street Journal.

The Journal reports that major AI labs are using the original Pokémon games, including Pokémon Blue and Red, to measure how well new AI models can reason, plan, and work toward long-term goals. Unlike short question-and-answer tests, Pokémon forces an AI to navigate mazes, make strategic decisions, remember past actions, and adapt over time.

“It provides us with like this great way to just see how a model is doing and to evaluate it in a quantitative way,” said David Hershey, applied AI lead at Anthropic.

Hershey helped create the “Claude Plays Pokémon” livestream on Twitch, where Anthropic’s AI model plays the game in real time. The stream launched last February and helped spark similar experiments like “GPT Plays Pokémon” from OpenAI and “Gemini Plays Pokémon” from Google.

Using games to test AI is not new. In the past, researchers relied on chess, poker, Go, and even Minecraft. But Pokémon has struck a chord because it is less predictable and more open-ended. Players must decide when to train characters, when to catch new ones, and how to solve puzzles that are not always straightforward.

“The thing that has made Pokémon fun and that has captured the [machine learning] community’s interest is that it’s a lot less constrained than Pong or some of the other games that people have historically done this on,” Hershey said. “It’s a pretty hard problem for a computer program to be able to do.”

The Pokémon AI streams have drawn heavy attention online, with hundreds of thousands of comments as viewers watch models slowly learn the game. At one point, OpenAI reportedly had a live Pokémon stream playing on a TV inside its office. Google CEO Sundar Pichai has also publicly praised Gemini’s progress in the game.

Anthropic has leaned into the trend, bringing “Claude Plays Pokémon” booths to industry conferences and even hosting an internal Slack channel where employees track Claude’s progress. “We’re all a whole bunch of nerds,” Hershey said.

Newer AI models are improving, but none have fully beaten the game without help. Hershey said much of the progress also comes from building better supporting software, known as a harness, including memory systems that help AI remember important details learned earlier in the game.

Both GPT and Gemini have completed the original Pokémon game with different setups, and developers say those models are now being tested on Pokémon sequels.

“This is a perfect game for AI right now,” said Jonathan Verron, one of the developers behind the Pokémon AI streams. “I’ve tried to think about other games, but I haven’t found as good an example as Pokémon.”

In the future, you can thank Pokemon when your army of T-1000s are taking over the world.

Want to see more of our stories on Google?

Add iPhone in Canada as a Preferred Source on Google

P.S. Want to keep this site truly independent? Support us by buying us a beer, treating us to a coffee, or shopping through Amazon here. Links in this post are affiliate links, so we earn a tiny commission at no charge to you. Thanks for supporting independent Canadian media!

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x