Apple Says Today’s Smartest AIs Still Can’t Solve Simple Puzzles

Apple last week published a research paper claiming today’s top-of-the-line “thinking” AI models aren’t actually as smart as they let on and that their ability to “think” completely collapses when faced with more complex puzzles.
Titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” the study focuses on evaluating current frontier Large Reasoning Models (LRMs)’s ability to “think” by presenting them with increasingly complex puzzles.
Where Large Language Models (LLMs) understand and generate answers in human language, LRMs are AI models that generate detailed thinking processes before providing answers. According to Apple’s research, these models may score higher on reasoning benchmarks, but the quality of their thinking processes and reasoning traces leaves much to be desired.
“Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities,” the study reads. Per the research, the reasoning effort of today’s LRMs increases with problem complexity — but only to a certain point, beyond which it sharply declines even when the model has an adequate token budget.
Apple’s study suggests that when faced with problems beyond a certain degree of complexity, current LRMs are good at memorizing answers but are far from actually being able to “think” to solve them. What’s more, the study also found that LLMs outperform their LRM counterparts in solving low-complexity puzzles.
The study was conducted by Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar. It name-drops OpenAI’s o1/o3, DeepSeek R1, Claude 3.7 Sonnet Thinking, and Apple rival Google’s Gemini Thinking as some of the models plagued by these apparent shortcomings.
The timing of this research’s release is curious, given that Apple is set to kick off its annual software event, WWDC, later today. Leaks suggest the event will, disappointingly, be light on AI innovations from the tech giant. Instead, it will largely focus on unveiling Apple’s “Solarium” interface redesign for its platforms.
Stay tuned for our coverage of WWDC 2025, with the keynote scheduled to start streaming today at 10 am PT/1 pm ET.
Want to see more of our stories on Google?
P.S. Want to keep this site truly independent? Support us by buying us a beer, treating us to a coffee, or shopping through Amazon here. Links in this post are affiliate links, so we earn a tiny commission at no charge to you. Thanks for supporting independent Canadian media!
“AI models aren’t actually as smart as they let on and that their ability to “think” completely collapses when faced with more complex puzzles.” I think you could say the same about most people too lol.