Self-learning vs teaching:What a great revolution in chess can teach us about education


On 5 December 2017, even as two militants, divisional commander of LeT Furkan and his associate, Abu Mavia, were planning to lay an ambush for Indian army vehicles near Qaezgund, another revolution was quietly underway thousands of kilometres away. DeepMind Technologies, the British artificial intelligence (AI) company (currently owned by Alphabet Inc., Google’s parent company), unleashed AlphaZero on the world of chess. Perhaps by the time the civilians trying to help the militants escape were shot at and injured by the soldiers—Suhail Khan of Nus, Badargund had a bullet injury in his lumbar region and Obaid’s eyes had been hit by pellets—AlphaZero had completed and won the revolution. We experience time differently.

Zugzwang of AlphaZero against Stockfish

AlphaZero is a computer programme developed to master the games of chess, shogi and go. It makes use of Google’s custom tensor processing units (TPUs). A TPU is an AI accelerator application-specific integrated circuit (ASIC) developed exclusively for neural network machine learning. TPUs differ from central processing units (CPUs) and graphic processing units (GPUs) on certain important parameters. While CPUs function as the brains of computers, performing basic arithmetic, logical and control operations, GPUs (graphics cards) are specialized electronic circuits designed to render 2D and 3D graphics in tandem with CPUs—increasingly, GPUs are also used in advanced financial modelling and cutting-edge scientific research. TPUs, on the other hand, are Google’s machine learning circuits tailored for TensorFlow, a JavaScript library. Unlike CPUs and GPUs, TPUs cannot usually perform general purpose tasks because “compilers” that ease this process have not been developed for TPUs. But this is no handicap as TPUs are extremely efficient at what they are meant to do—assist advanced machine learning and training.

Artificial neural networks are not usually programmed with rules to perform specific tasks. Instead, they learn from example. For example, if we want to design an English language editing software, we can feed examples of perfectly edited pieces as well as examples of badly edited pieces into an artificial neural network and allow it to learn by comparing the two sets. This method differs from the one employed by traditional programmes, which are fed an exhaustive list of algorithms and work according to set rules.

Training using parallelly-set 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, AlphaZero made impossibly rapid progress. Within four hours of training, it had a higher Elo rating than Stockfish 8 (the world’s leading chess engine up to that point) as per DeepMind’s own estimates. After nine hours of training, playing on a single machine with only four TPUs, it decisively defeated Stockfish 8 in a time-controlled 100-game tournament with returns of 28 wins (25 as white and three as black), 0 losses and 72 draws.

To appreciate the earth-shattering significance of this event, we must go back a couple of decades to 1997 when Deep Blue, IBM’s supercomputer, defeated the reigning world champion and one of the greatest chess players of all time, Garry Kasparov 3½–2½ in a six game rematch (Kasparov had defeated Deep Blue a year earlier). The event became a synecdoche of the ascendancy of machine over man. From that point onwards, computers would continue to evolve, obtaining greater and greater processing powers and speeds, while humans could not keep up. In 2008, a team of developers led by Marco Costalba, Joona Kiiski, Gary Linscott and Tord Romstad released Stockfish, a free and open-source chess engine which soon came to rule the world of chess. By 2012, Stockfish had become the unofficial world computer chess champion and has since then only occasionally been challenged by other engines, notable among them being Komodo (developed by Don Dailey, Larry Kaufman and Mark Lefler and released in 2010) and Houdini (developed by Robert Houdart and also released in 2010). So dominant has Stockfish (along with the other engines) become that the game play and moves of the topmost human grandmasters are fed into it and its verdict of their quality is accepted as final and indisputable. In effect, a computer engine adjudges the skills of even the best human players!

Chess, which could well have been invented in Kashmir for historical and emotional reasons, is a game of such beauty because it demands a perfect balance between calculation and intuition. Pieces are important, but so are positional advantages. Over the centuries, great chess players have made inspired (and inspirational) moves in which they have sacrificed even their most valuable assets (like queens and rooks) to gain an unassailable positional strength. Alternatively, they have pressed forth the smallest pawn surplus advantage to create winning positions. It is difficult to describe beauty in the context of chess (or anything else, for that matter), but when one sees it, one knows it. While it is true that most of the history of chess (its journey and development, its great players and matches, in Central Asia, Kashmir, Persia, South Asia, etc.) is shrouded in mystery, it is evident even in the miniscule portion available to us that the greatest of the great chess players, Tal, Kasparov, Morphy, Capablanca, Fischer, Botvinnik, Karpov, Lasker and so on did not just win games, they created artistic and aesthetic masterpieces in the process. But it is also possible to win in chess by playing in a brutal and ruthless manner.

Tal famously remarked: My name is Mikhail Tal so I must sacrifice

The rise of computer engines in the field of chess has had a momentous impact in this respect. Human players have begun to mimic the playing style of engines.

Stockfish relies on its grand search depth. Komodo prefers a more positional style of play. Houdini is aggressive and sacrificial, “in the style of the Romantic Era of chess”. Leela Chess Zero (developed by Gian-Carlo Pascutto and Gary Linscott), in many ways the forebear of AlphaZero, relies on reinforcement learning through self-play. But, ultimately, all of these engines (with the exception of Leela) are programmes bound by their boundary conditions. Their algorithms, and the games and positions they are fed, goad these programmes towards a ruthless and reductionist approach to playing chess. Their moves are effective but graceless, opaque but uninspired. So for humans to follow in their footsteps has been a massive step-down. Magnus Carlsen, an undoubted all-time great, has a style which resembles Stockfish more than it resembles, say, his coach Kasparov, or even another all-time great like Tal. He is efficient like a machine, but not always as elegant as someone like Fischer.

By 2017, these terms had solidified. Machines, remaining steadfast to their codes, never deviating, not budging an inch, were super-efficient and ruthless. Machines were different from human beings. Machines were better than human beings. Human beings ought to follow the path of machines if they wanted to improve themselves.

AlphaZero dropped into this ocean of cement, and broke the waves.

AlphaZero was not nourished on complex algorithms and millions of historical chess games. Instead, it was just fed the basic rules of chess (without any access to opening books or endgame tables) and allowed to train by playing against itself and then self-analyzing those games, learning from its mistakes and remembering its good moves. Yet it managed to beat the premium chess engine in a matter of hours. As the dust settled over the battle between the two titans, people began to notice an even sweeter irony. AlphaZero, the new alpha of chess AI, eschewed the barbarity of machine-like game play, favouring a style of positional chess which a Tal or a Kasparov or a Morphy would be proud of. For example, AlphaZero’s strategy to push the rook’s pawn all the way to h6 against the black king on g8 was a novel move, but a move that had something very human about it.

h6 move

To address the criticism by Stockfish’s Tom Romstad that the engine’s setting in the game was suboptimal (Stockfish had been allotted 64 threads and a hash size of 1 GB), there was a rematch. In the original, both engines had been on a fixed time-control of one move per minute. In the revised game, both engines were given three hours plus 15 seconds per move to finish the game. AlphaZero won again (155–10 games).

AlphaZero’s victory over Stockfish reaffirms the value of autodidacticism. Consider Stockfish a typical student taught in an educational institute. It has teachers (programmers) who create the basic parameters for the knowledge it obtains (through heuristic algorithms which are refined over time) and then feed it the knowledge they deem useful and necessary for it. AlphaZero, on the other hand, is an intelligent and self-learning student whose method consists of a critical examination of what it has learned, to become better. It only seeks help from an outside mentor when it feels the need to do so. Stockfish can compute upto 60 million moves per second; AlphaZero computes only about 10,000—remembering its mistakes better, it does not need to reconsider them.

Carlsen: I do not believe in fortresses

However, both AlphaZero and Stockfish perform at an optimum level when they have the service of multiple processing units as well as a vast digital memory at their disposal. As the integration of AI and the human brain becomes a reality, we need to consider the ramifications of this fact. Our future generations need to have access to more and more computing power and digital memory, but we must not limit their imagination by narrow rules of the mind.

It is not by accident that AlphaZero’s playing style resembles the greatest human masters of the game.

The freedom of self-learning makes even machines more human.

Related Posts

ONLINE CLASSES IN LOCKDOWN

  By Ukaab H. Reshi, Riyaz A. Bhat, Mir A. Jan and Mir. I. Farooq As we are all aware of the fact that the Coronavirus...

World Water Day And A Waterless Village in Poonch

  Although district Poonch as a whole suffers acute water-shortage but the situation of Banpat is worrisome.   Banpat is a sparse settlement situated in the backyard...

Exams and Lizard

My first night in a hostel room in Minto Circle, a boarding school run by Aligarh Muslim University (AMU), which I had to share...

Defeating the Purpose of Education

Education is all-round development of children, a process by which children realise the capacities that can contribute to the betterment of the society and...