Stratego is a popular board game that involves strategy and deception. Players take turns moving their pieces, with the goal of capturing their opponent’s flag or trapping their opponent’s pieces. It is a challenging game that requires players to think carefully about their moves and outmaneuver their opponent.
DeepMind, a company owned by Alphabet, has developed an AI that can play the board game Stratego. Stratego is a game of imperfect information, meaning that players do not have full knowledge of the game state at any given time. This makes it a difficult game for AI to play, as it must make decisions without complete information.
The DeepMind AI uses a combination of neural networks and Monte Carlo Tree Search (MCTS) algorithms to play the game. The neural networks are trained using supervised learning, in which the AI is fed a large number of example game states and their corresponding optimal moves. The MCTS algorithms then use this knowledge to make decisions during gameplay.
The AI was able to defeat a strong human player in a series of games, showing that it is capable of making strategic decisions and adapting to its opponent’s moves. The researchers believe that the AI’s ability to handle imperfect information could have applications in other domains, such as cybersecurity and finance.
The development of this AI is an important step in the field of game AI, as it shows that AI can be successful in games with incomplete information. This has implications for other areas of AI research, as many real-world problems also involve incomplete information.
A team from Google Brain recently published a paper (on arXiv) describing the use of a Deep Reinforcement Learning algorithm to design chips customized for AI applications. In other words, they used an AI to build AI chips. The problem they faced was “placement of TensorFlow graphs onto hardware devices to minimize training or inference time, or placement of an ASIC or FPGA netlist onto a grid to optimize for power, performance, and
area”. They note in the paper that Deep RL is well-suited to problems like this, “where exhaustive or hueristic-based methods cannot scale”. The specific Deep RL family of algorithms they used were policy-gradient methods, such as REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor Critic (SAC). Using these techniques, the authors state their belief that “it is AI itself that will provide the means to shorten the chip design cycle, creating a symbiotic relationship between hardware and AI, with each fueling advances in the other”.
Image Source: Google, Inc.
I was recently asked by Udacity to be a beta tester (and, subsequently, a mentor and project reviewer) for one of their newest course offerings: the Deep Reinforcement Learning NanoDegree (DRLND) program. This is a very interesting course with some fascinating projects. Students get to work with the same types of Reinforcement Learning algorithms that have recently made headlines (for example, the AlphaGo project that beat the world’s best professional Go players while demonstrating very deep and strategic concepts and the Dota2 RL that recently showed professional-level skills in an extremely complicated multi-player video game). Some example projects in the DRLND program include developing an RL that can play a video game that collects yellow bananas while navigating a 2D field of play and finding an actor/critic model that is able to control 20 independent 2-axis robot arms to position an end-effector within a target sphere that continuously moves in 3D space about the robot.
This newest Udacity course requires all projects to be written in Python using the PyTorch framework. This was the first time I’ve had an opportunity to work with PyTorch, so I thought I would relay my experience and compare the advantages and disadvantages of the PyTorch framework compared to TensorFlow as I see them.
- Sponsoring Companies: These two frameworks were developed by and continue to be supported by two of the biggest names in the AI industry today. TensorFlow was originally developed by Google Brain (the AI company within the Google/Alphabet corporate structure) and Google not only continues to support this framework but has even started to develop and field custom hardware specifically designed to execute TensorFlow graphs at high speed (competitive with the fasted Nvidia GPUs that also have libraries customized to execute TensorFlow computation graphs. PyTorch was developed by Facebook and they have used PyTorch for many of the projects that help to improve the user experience on their platform, including the Facebook face-recognition app. Both companies have released their respective frameworks as open source projects in order to achieve the widest distribution and acceptance of their products. In summary, I would rate this category a tie: Both frameworks are supported by industry titans, each committed to the success and longevity of their respective frameworks.
- User Communities: TensorFlow wins in this category, hands down. As one of the original deep learning frameworks, TensorFlow has a very broad and experienced user community. The documentation is excellent and help sites such as StackOverflow are available to answer just about any question a user might have. PyTorch, however, might be thought of as the new kid on the block. The PyTorch user base is not as deep, the documentation is not as comprehensive, and getting answers to questions is not as easy as is the case with TensorFlow. Score one for TensorFlow.
- Static vs. Dynamic Computation Graphs: This category is by far the one that most differentiates the approach taken by the two frameworks. The TensorFlow framework is based on the concept of a static computation graph. The API is designed to enable the programmer to define a computation graph that, once it is complete, can be passed to a GPU (or a CPU library) for execution. Just about every call made to the API is designed to describe the nodes of this graph, how they are connected, how and in what format the data will be input to the graph, how intermediate data will be updated during execution, and how the outputs will be delivered back to the function that triggered the graph’s execution. The problem with this approach, however, is that the entire architecture of your network must basically be defined and described using the TensorFlow API before you will see any results. In addition, there is a deep learning curve with this framework, with many new and potentially confusing artifices that must be mastered to achieve success (such as placeholders and variables, and the TensorFlow session concepts). In contrast, PyTorch is based on a dynamic execution graph. The graph is constructed and can be executed statement by statement. This is a much more intuitive approach and has a very natural feel for Python programmers that are used to this interactive flow. The learning curve for PyTorch is not nearly as steep as TensorFlow’s learning curve and even though the documentation is not as polished, most users will find PyTorch’s dynamic computation graph easier to use and master. Score one for PyTorch!
- Ease of Development and Debugging: Besides the natural flow that PyTorch’s dynamic computation graph provides, there is another big advantage to this approach: ease of debugging. At any point in developing the computation graph, the programmer is free to insert print statements between nodes or at any point, really, in the graph to see what’s going on. This is a great benefit to debugging one’s code. Things are not nearly as simple with TensorFlow, and I can tell you from experience that when there is a problem with your TensorFlow Graph, it can be very difficult to find out the root cause of the problem. This is another point that goes to PyTorch.
Bottom Line: There are good reasons to use each of these different deep learning frameworks. TensorFlow most certainly represents basic knowledge that every deep learning practitioner is expected to know. However, the more I use PyTorch, the more I am liking it and think it may be the future. My recommendation is to be familiar with both these frameworks. Know TensorFlow for basic DL knowledge, learn PyTorch for ease of use and less troublesome debugging.
Nvidia recently announced a new, open-source PyTorch extension that helps users improve the performance of deep learning training on Nvidia’s Volta GPUs. The key improvement that APEX brings to deep learning is that it enables engineers to use mixed precision arithmetic to improve training speed while still maintaining accuracy and stability of training algorithms. The extension requires PyTorch 0.4, Python 3, and Nvidia’s CUDA 9 library. Additional information is available on the Nvidia site.
Noa Ovadia Interacting with IBM Debater
IBM just demonstrated a deep-learning system that is a follow-on of sorts to the Watson Jeopardy demonstration from several years ago. For this IBM Debater project, Watson was trained to intelligently debate on approximately 100 different topics. In this particular demonstration, the IBM system was challenged by Noa Ovadia a college senior who was the Israeli debate champion of 2016. The two held a traditional debate on the topic of Subsidized Space Exploration. Each side made an opening statement followed by a rebuttal of the opponent’s position, and then a closing statement. Although this demonstration was not a traditional Turing test, it shows that progress is being made in the quest for machines that can intelligently interact with humans in conversational speech.
In fact, this is the 2nd demonstration of recent progress toward this goal. A few weeks ago, Google unveiled the Duplex system that demonstrated very human-like speech and interaction with a phone call to schedule restaurant reservations and hair salon appointments. It appeared in those demos that the real humans on the other end of a phone call did not actually realize they were talking to a computer and held a very natural conversation. Although the Google Duplex conversations took place in a very constrained topic area (as was the IBM Debater demonstration) these advancements show that rapid progress is being made to extend the limited conversation ability of systems like Amazon Echo, Apple Siri, and Google Assistant.
Experience shows that once the basic infrastructure has been implemented and demonstrated for one domain, these types of systems rapidly expand to support many more domains of knowledge and interaction. I can foresee similar systems supporting conversations one might have with a doctor, for example, or a financial planner, or in fact, any relatively-constrained domain of knowledge. I expect it won’t be long before many of the jobs currently held by humans, to provide advice to other humans on various topics, will transition to systems like those now being demonstrated by IBM and Google.
This rapid progress is both fascinating and worrisome. Past labor transitions where technology has eliminated certain jobs has always resulted in new jobs being created that never existed before. Is this time different, or are there whole new classes of jobs on the horizon that none of us currently envision? Only time will tell.
The Google Scholar resource ranks the top journals and conferences using a fully automated h-index score. The h-index is named after Jorge Hirsch, a physicist at the University of California, San Diego (UCSD), who proposed the index to determine theoretical physicists’ relative quality. It is sometimes called the Hirsch index. According to Wikipedia, the h index measures “both the productivity and citation impact of the publications of a scientist or scholar. The index is based on the set of the scientist’s most cited papers and the number of citations that they have received in other publications. The index can also be applied to the productivity and impact of a scholarly journal” (as is the case here).
Searching the Google Scholar site for Deep Learning resources returned the following list of the top 15 journals and conferences (the number to the right of each entry is the resource’s h5-index). For comparison, Nature, the top-rated journal in the sciences, has an h5-index rating of 366.
The #1 Deep Learning resource in this list is the International Conference on Deep Learning, which takes place next month (Jul 10-15, 2018) in Stockholm, Sweden.
The #2 resource is the arXiv Machine Learning (stat.ML) archive of pre-press journal papers, hosted by the Cornell University Library. This is an excellent collection of scholarly papers on topics related to machine learning.
The full list of Google Scholar’s top-15 resources follows:
1. International Conference on Machine Learning (ICML) – 91
2. arXiv Machine Learning (stat.ML) – 76
3. The Journal of Machine Learning Research – 73
4. Machine Learning – 37
5. European Conference on Machine Learning and Knowledge Discovery in Databases – 31
6. International Journal of Machine Learning and Cybernetics – 23
7. IEEE International Workshop on Machine Learning for Signal Processing – 19
8. International Conference on Machine Learning and Applications – 18
9. International Journal of Machine Learning and Computing – 16
10. International Workshop on Machine Learning in Medical Imaging – 12
11. Machine Learning and Data Mining in Pattern Recognition (MLDM) – 11
12. International Conference on Machine Learning and Cybernetics – 10
13. Asian Conference on Machine Learning – 10
14. Artificial Intelligent Systems and Machine Learning – 5
15. Transactions on Machine Learning and Artificial Intelligence – 5
The CSAIL group at the Massachusetts Institute of Technology (MIT) have improved the state-of-the art in inferring road networks from satellite imagery. This is a time-consuming, tedious, and error-prone process that has traditionally relied on human inputs. Open Street Map (OSM) is the gold-standard for cataloging road networks throughout the world, but relies almost exclusively on human input. So there are many areas that have yet to be mapped and since the data is provided by volunteers with a mixed bag of skill and attention to detail, the data is not 100% accurate. For example, the city of Toronto produces a gold standard road map and recent studies indicate this map differs from the OSM version with an error rate of approximately 14%.
Previous attempts at using deep learning to infer road networks from satellite imagery have relied on a traditional Convolutional Neural Network (CNN) trained on a large number of labeled images to produce pixel-by-pixel classification of road (vs. non-road) pixels in an image. This technique has achieved limited success with real-world imagery due primarily to varying lighting conditions and the many occlusions caused by trees, buildings, and shadows in satellite imagery that greatly complicate this process (even for human analysts).
The advancement made by the MIT engineers was to change from making pixel-by-pixel classifications to a new technique where the CNN’s goal is re-oriented to iteratively construct a graph of the road network directly from the imagery. As described in the paper: “RoadTracer: Automatic Extraction of Road Networks from Aerial Images “, the MIT process “consists of a search algorithm, guided by a decision function implemented via a CNN, to compute the graph iteratively. The search walks along roads starting from a single location known to be on the road network. Vertices and edges are added in the path that the search follows. The decision function is invoked at each step to determine the best action to take: either add an edge to the road network, or step back to the previous vertex in the search tree.”
Roadtracer identifies 45% more road segments than the authors’ previous segmentation approach (see figure, above) and out-performs the previous state-of-the art system by a wide margin. It would be interesting to see if the search algorithm could be improved by a Reinforcement Learning network – another technique that is gaining widespread prominance in the deep learning community.
A recent Nature article (https://go.nature.com/2GdzegP) and accompanying blog post by the paper’s authors (https://deepmind.com/blog/grid-cells/) describes how the
Image Source: Wikipedia Commons, Creative Commons License
The Deep Mind team then went a step further by creating a deep reinforcement learning agent to investigate whether or not the resulting artificial neural network was indeed capable of supporting navigation. As the Deep Mind team explained, “This agent performed at a super-human level…and exhibited the type of flexible navigation normally associated with animals, taking novel routes and shortcuts when they became available”. An example demonstrating these abilities is illustrated, below. In this example, the agent was trained in a maze with 5 doors when all but door #5 were closed (a). During testing, all doors were opened (b) and the agent successfully found shortcuts to the desired destination.
The Deep Reinforcement agent trained in a maze with all but one door closed (left) found shortcuts during testing (right) when all doors were open.
In my opintion, the most interesting features of this work included the use of two different neural network architectures in a single study:
- An RNN to develop a model of a portion of a mammalian brain that spontaneously mimicked the grid-cell structure of actual mammalian brains, and
- A deep reinforcement learning, agent-based system to explore how the resulting network enables velocity-vector-based navigation.
One of the things that we often see in Sci-Fi movies, but rarely experience in real life, is the ability to have a natural conversation with a computer. Rapid advances in AI and deep learning in recent years have brought us Amazon Echo and Google Assistant, but these devices have mostly single-phrase request processing, where the computer takes a single request and responds with the most likely response. This is not what I would call an interactive conversation.
Earlier this week, Google’s AI team demonstrated what I consider to be a major advance toward the goal of natural human-computer interaction, and I am frankly pretty amazed at the result. Google’s new Duplex product is able to make telephone calls and interact with the other party in ways that are remarkably similar to how a human would interact. Duplex is able to both sound very human (with natural changes in inflection and interspersed “umm’s and ‘uh’s”) and respond with human-like interactions to the natural flow of a conversation.
Google uses a Recurrent Neural Network (RNN) as the basis for understanding the current context of a conversation and generating the sequence of words to say next in a conversation. The network is trained to perform specific tasks (such as booking an appointment or making a reservation) using traditional deep learning techniques. While each task is trained separately, the entire collection of recorded conversations for all tasks was used as the corpus for training all the various task-specific networks. Once the RNN has generated a sequence of words to say next in the conversation Google’s standard Text-to-Speech (TTS) system is used to generate sounds for the desired phrase to be spoken.
Latency is an important aspect in natural conversations. Humans don’t generally expect long delays between phrases of a conversation, and Duplex attempts to keep the latency low (less than 100 ms, typically) using several different techniques, including relying on low-confidence models when that is determined to be necessary to meet the latency demands. When a complex phrase is being responded to, the system is actually smart enough to add more latency than required to match the approximate time humans might take to respond to a complex utterance.
You can read more about Google Duplex, including recorded samples of interactive speech on Google’s AI blog: https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html
At this year’s GPU Technology Conference (Mar 28-30, 2018), Nvidia announced a new GPU specifically designed for deep learning. The Quadro GV100, based on Nvidia’s latest Volta architecture, sports 5120 CUDA cores, 640 tensor cores, and 32GB of VRAM – producing 14.8 TFLOPS of single-precision floating point performance, 7.4 TFLOPS of double-precision performance, and an incredible 118.5 TFLOPS of tensor performance to speed up deep learning inference and training. The GV100 is designed to interface via PCIe, consumes only 250W of power, and two cards can be linked together to provide 64GB of shared memory. However, the $9,000 price point for a single card makes this something that hobbyists will likely do without, but serious deep learning researchers may still find attractive. Reportedly, Nvidia spent $3 Billion developing the GV100, which puts the $9,000 per card price point in perspective.
The 640 tensor cores Nvidia incorporated into the GV100 architecture are a new type of processor specifically designated to perform 4×4 matrix multiplications in support of deep learning operations. Each tensor core performs the multiplication of two 4×4 matrices and adds the result to a third 4×4 matrix – exactly the type of operation that consumes the vast majority of processing in a deep learning training or inference scenario, resulting in the equivalent of a 120 TFLOP super-computer on a card. When looked at from that point of view, the $9K price of the card actually looks like a bargain. It will be interesting to see where these cards get used and what new breakthroughs in AI and deep learning we might see in the future as researchers apply this technology to this fast-moving domain.