AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph

Some of the flashiest achievements in artificial intelligence in the past decade have come from a technique by which the computer acts randomly from a set of choices and is rewarded or punished for each correct or wrong move.
It’s the technique most famously employed in AlphaZero, Google DeepMind’s 2016 program that achieved mastery at the games of chess, shogi, and Go in 2018. The same approach helped the AlphaStar program achieve “grandmaster” play in the video game Starcraft II.
Also: 50 years ago the Homebrew Computer Club met for the first time – and sparked a tech revolution
On Wednesday, two AI scholars were rewarded for advancing so-called reinforcement learning, a very broad approach to how a computer proceeds in an unknown environment.
Andrew G. Barto, professor emeritus in the Department of Information and Computer Sciences at the University of Massachusetts, Amherst, and Richard S. Sutton, professor of computer science at the University of Alberta, Canada, were jointly awarded the 2025 Turing Award by the Association for Computing Machinery.
The ACM award states that “Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning — one of the most important approaches for creating intelligent systems.”
The ACM honor comes with a $1 million prize and is widely viewed as the computer industry’s equivalent of a Nobel Prize.
Reinforcement learning can be thought of by analogy with a mouse in a maze: the mouse must find its way through an unknown environment to an ultimate reward, the cheese. To do so, the mouse must learn which moves seem to lead to progress and which lead to dead ends.
Also: Open AI, Anthropic invite US scientists to experiment with frontier models
Neuroscientists and others have hypothesized that intelligent entities such as mice have an “internal model of the world,” which lets them retain lessons from exploring the mazes and other challenges, and formulate plans.
Sutton and Barto hypothesized that a computer could be similarly made to formulate an internal model of the state of its world.
Reinforcement learning programs absorb information about the environment, be it a maze or a chess board, as their input. The program acts somewhat randomly at first, trying out different moves in that environment. The moves either meet with rewards or lack of rewards.
That feedback, positive and negative, starts to form a calculation by the program, an estimation of what rewards can be obtained by making different moves. Based on that estimation, the program formulates a “policy” to guide future actions to success.
At a high level, such programs must balance the tactics of exploring new choices of action, on the one hand, and exploiting known good choices on the other, for neither alone will lead to success.
Those wanting to dig deeper can get a copy of the textbook on the matter that Sutton and Barto wrote on the topic in 2018.
Reinforcement learning in the sense that Sutton and Barto use it is not the same as reinforcement learning referenced by OpenAI and other purveyors of large language model AI. OpenAI and others use “reinforcement learning from human feedback,” RLHF, to shape the output of GPT and other large language models to be inoffensive and helpful. But that is a different AI technique, only the name has been borrowed.
Sutton, who was also a Distinguished Research Scientist at DeepMind from 2017 to 2023, has emphasized in recent years that reinforcement learning is a theory of thought.
During a 2020 symposium on AI, Sutton bemoaned that “there is very little computational theory” in AI today.
Also: Gartner identifies top trends in data and analytics for 2025 – and AI takes the lead
“Reinforcement learning is the first computational theory of intelligence,” declared Sutton. “AI needs an agreed-upon computational theory of intelligence,” he added, and “RL is the stand-out candidate for that.”
Reinforcement learning may also have implications for how creativity and free play can happen as an expression of intelligence, including in artificial intelligence.
Barto and Sutton have emphasized the importance of play in learning. During the 2020 symposium, Sutton remarked that in reinforcement learning, curiosity has a “low-level role,” to drive exploration.
“In recent years, people have begun to look at a larger role for what we are referring to, which I like to refer to as ‘play’,” said Sutton. “We set goals that are not necessarily useful, but may be useful later. I set a task and say, Hey, what am I able to do. What affordances.”
Sutton said play might be among the “big things” people do. “Play is a big thing,” he said.