At this year’s GPU Technology Conference (Mar 28-30, 2018), Nvidia announced a new GPU specifically designed for deep learning. The Quadro GV100, based on Nvidia’s latest Volta architecture, sports 5120 CUDA cores, 640 tensor cores, and 32GB of VRAM – producing 14.8 TFLOPS of single-precision floating point performance, 7.4 TFLOPS of double-precision performance, and an incredible 118.5 TFLOPS of tensor performance to speed up deep learning inference and training. The GV100 is designed to interface via PCIe, consumes only 250W of power, and two cards can be linked together to provide 64GB of shared memory. However, the $9,000 price point for a single card makes this something that hobbyists will likely do without, but serious deep learning researchers may still find attractive. Reportedly, Nvidia spent $3 Billion developing the GV100, which puts the $9,000 per card price point in perspective.
The 640 tensor cores Nvidia incorporated into the GV100 architecture are a new type of processor specifically designated to perform 4×4 matrix multiplications in support of deep learning operations. Each tensor core performs the multiplication of two 4×4 matrices and adds the result to a third 4×4 matrix – exactly the type of operation that consumes the vast majority of processing in a deep learning training or inference scenario, resulting in the equivalent of a 120 TFLOP super-computer on a card. When looked at from that point of view, the $9K price of the card actually looks like a bargain. It will be interesting to see where these cards get used and what new breakthroughs in AI and deep learning we might see in the future as researchers apply this technology to this fast-moving domain.