A paper published in Nature this week documents a major advance in deep learning. DeepMind, the Alphabet/Google deep learning group reports that they have built a new version of their Go-playing AI program that represents a major improvement over the previous state of the art.
The earlier version last year surprised the AI community and the Go-playing world by demonstrating that a computer was capable of beating the best human Go players in the world – a feat that many thought the world would not see for another decade or more. Lee Sedol, the human grandmaster that was defeated 4-games-to-1 in the 2016 match was surprised at the beauty and depth of mastery displayed by the original Alpha-Go program. That initial version was programmed by feeding millions of positions from some 160,000 games played by humans and having the neural network find from these examples the basic features that led to winning moves.
The new breakthrough, though, is even more impressive. Rather than use examples from human play as the initial knowledge base, the DeepMind team started from scratch and the only knowledge given to the computer was the basic rules of the game. From these simple rules, the deep learning network was given the freedom to start playing games against itself and learn on its own what worked and what didn’t. At the end of the first three hours of training, the program was playing like a typical beginner, greedily capturing stones at every opportunity with no sense of any long-term strategy.
After only 19 hours of training, the program had advanced well beyond the skills of typical beginners and was displaying the sense that it was mastering several typical strategies known to experienced human Go players. The real surprise came after 70 hours of training, though, when the program started displaying super-human performance. In fact, after only 3 days of training, the program was already exceeding the abilities of the original AlphaGo that beat Lee Sedol. Three weeks later, it had learned enough to be the best Go player in the world, human or computer. In fact, in a match against the original AlphaGo, AlphaGo Zero beat the orignal 100 games to 0. A stunning result.
The most amazing part of all this, IMO, is that the new AlphaGo Zero gained this amount of knowledge in a very short time from just the very basic rules of the game. The other interesting fact is that the original program was run on a network of 48 Google Tensor Processing Units (TPUs), while the new AlphaGo Zero learned to play at super-human level on a much smaller 4-TPU system. The techniques used to achieve these results have immediate application in other domains, such as protein-folding for drug discovery, medical diagnostics, investment advisers, etc. The rapid advancements displayed by AlphaGo Zero are in line with the exponential march to the Singularity, when computers will out-match humans in every domain.