The Impact of the NVIDIA Acquisition of ARM on Graph Technologies
In this article, we will look at the recent 40 billion dollar acquisition of Arm Holdings by NVIDIA. We will then look at how this acquisition might impact the emerging market for new computer hardware designed to traverse trillion vertex enterprise knowledge graphs.
For the past two years, I have been writing about how the current generation of CISC hardware is not optimized for graph databases and fast graph algorithm execution. I have hypothesized that some companies will soon figure out that by creating custom hardware optimized for pointer chasing, they can trigger a 1,000x speedup in enterprise graph database performance. Supporting graph databases will allow these new hardware manufacturers to be more profitable while at the same time fueling the growth of the graph database industry.
What we were looking for are firms that really understand that for graph algorithms to speed up, we needed an efficient way to bind each vertex to a processing thread. The more threads you have, the faster these algorithms run. We need many small and efficient low-power cores standing by to start at any vertex in our knowledge graph and start hopping through pointers. We need lots of custom RISC cores and the fast memory subsystems to keep these cores busy.
For the last several years, the leader in designing custom RISC cores has been ARM Holdings. For those of you that have not heard of ARM, they are a British chip design company that was purchased by the Japanese conglomerate SoftBank Group back in 2016. This acquisition was always a bit of a mystery to me. It didn’t seem to fit with the SoftBank portfolio of companies, and SoftBank didn’t help ARM grow. What I didn’t expect was that NVIDIA was going to purchase ARM. On reflection, this makes sense. Let’s take a look at why.
NVIDIA got their start building video game cards. These cards are designed to generate large 2-D images as fast as possible. Image generation tasks can be best executed using parallel processing hardware. You divide an image into lots of small chunks and have different cores process each chuck in parallel. It turns out the best way to do this is to convert image data into a matrix of numbers and build chips that are optimized at doing parallel matrix algebra. These are effectively transforms of image data stored in a two-dimensional matrix. What is critical to remember here is that these transforms are parallel computing. Creating hardware and software that can quickly divide compute tasks into parallel tasks is hard, and NVIDIA conquered this challenge and became an industry leader.
One other important thing that NVIDIA did well: they generalized the process of doing matrix math to go beyond just building video game cards. They came up with a C-level library (CUDA) so that anyone that wanted to do fast matrix algebra could use their hardware. One of the people that used these libraries was Alex Krizhevsky. Alex was working under the father of Deep Learning, Geoffrey Hinton. Alex figured out that by using the NVIDIA CUDA libraries, he could dramatically speed up the incredibly complex task of training deep neural networks. On September 30, 2012, just about eight years ago, AlexNet won a prestigious image recognition contest posting more than 10.8 percentage points lower error rates than the runner-up. Since then, these Graphics Processing Units (GPUs) have moved to the center of AI development hardware.
But the world is starting to change. Although GPUs were great for doing matrix math, many other AI algorithms don’t fit nicely into a matrix. Many natural language processing tasks do what is called “one-shot encodings,” where each word is converted to a single bit in a very large, 40,000 long vector. 39,999 of the values are zero. Only one value is turned on for that word. These vectors are called “sparse” matrices because they are sparsely populated with non-zero values. That means if you are using one-shot encodings, you are moving 99.99% of zeros into and out of your GPUs. Not very efficient.
The solution is to use other non-matrix representations of this type of data. Graphs are one way to do this. Hardware from companies like DataVortex and Graphcore are currently leading the pack. Graphcore has NLP training benchmarks on sparse Transformer deep learning models like BERT that show large improvements over traditional GPUs.
So this brings us back to ARM. NVIDIA knows that in order to dominate the AI hardware industry, they need more than just GPUs. They need to do more than just manipulate matrix data. They need hardware optimized for other diverse knowledge representations like graph databases. That is why it makes total sense for NVIDIA to purchase ARM. They can expand their portfolio of hardware that targets custom highly-parallel processing problems and continue to be at the center of the AI hardware industry. It might take them a few years to execute on this strategy, but they are assembling the right foundational components.
But NVIDIA is not alone in this game. Look for Graphcore, Intel, Google, Xilinx, and a host of small startups using low-cost FPGA hardware to be players in the tera-vertex graph industry.