Graph Technologies are Helping Sunset CISC
Modern native graph databases are designed using simple pointer-hopping to quickly traverse large databases. But these operations are not efficient on Complex Instruction Set Computers (CISC). In this article I show how the graph industry is helping innovative organizations design a new generation of hardware optimized for massively parallel graph analytics.
Last month, Apple announced it is dropping the Intel x86 CPUs and moving to it’s own custom silicon CPUs built around the ARM processors. This is one of the most significant changes in the computer industry in the last 20 years. And although it is an expensive undertaking for Apple, in the long run it will also benefit many consumers that never purchase an Apple product.
If you are not familiar with the Intel x86 architecture, it is a classic example of a CISC. The current Intel x86 chips evolved from early designs by adding a few new instructions each generation. This was done by well intentioned engineers that wanted to speedup processing of an existing processor by adding a highly specialized instruction that might be useful for a single application while remaining compatibility with their current installed base.
Over the years the current Intel x86 instruction set grew to a bloated 1,505 instructions. And every additional instruction required additional silicon real estate. But like many good ideas, they are eventually toppled due to the complexity of their design and the challenges of testing their functionality. It is unreasonable to ask the Intel chip designers to continually maintain compatibility with the past and remain the lowest cost and highest performance solution for new workloads. Intel’s latest security vulnerabilities show this is an unavoidable outcome of complexity.
In place of the Intel x86 CPUs we are finding an entire new generation of CPUs built around highly simplified Reduced Instruction Set Computers (RISC). The champion of this new generation is the ARM architecture. ARM is technically a set of intellectual property around CPUs that is licensed to companies that are building their own chips. ARM chips initially became popular in mobile phones because of their low power consumption. However, in recent years ARM cores have been used in a wide variety of devices including the Raspberry Pi, the Nvidia Nano and now in high-core count CPU chips. In June an ARM powered supercomputer became the world’s fastest computer showing that ARM is having a huge impact in everything from mobile phones to supercomputers.
What is critical about the ARM/RISC philosophy of CPU design is that systems designers only need include the instructions directly related to their application. They are no longer forced to include unused silicon in every product. When you need a device that does specific functions you can now place the ARM cores in a region of your System on a Chip (SOC) architecture.
A New Generation of Graph-Optimized Hardware
So how are graph databases going to impact the chip industry? For the last two years I have been carefully following the trends in the database industry and the rise of graph database to replace the older relational models. The relational models suffered from their legacy of trying to represent relationships using slow and non-scaleable JOIN operations between tables. Graph database don’t need to do JOINS. Relationships are stored as in-memory pointers. As long as you have enough RAM in your database cluster the relationship information stays in memory and your relationship traversals can be five-orders-of-magnitude faster than JOINS on billion-row tables.
The relational era is now ending and the enterprise knowledge graph industry is being born. The ISO DM32 committees are working on a universal query language called GQL that will support these new systems. However, there are still some graph queries that are slower then we would like. Queries that need to traverse the entire graph to find similar items or to cluster similar items in groups are just some examples of these time-intensive queries. So my question has been how can we speed up these queries?
What I learned talking to Dr. Alin Deutsch from UC San Diego is that we need to have an easy way to distribute a query from a single incoming query node to a large cluster of nodes. For example if we are searching for the best product recommendation for a customer we need to look at all the customers in our system and find the most similar customers and see what products they preferred. We also need to have a solid understanding of how products are related to each other in a knowledge graph. Getting the best similar products to the recommendation page takes lots of computation over many vertices in our graph.
The central process is to bind each thread in each server of our graph cluster to a group of nodes that need to be considered. To do this in real time we need lots of cores. And that is where the ARM and RISC technologies come in to play.
Dr. Deutsch made the fast binding of vertices to threads much easier in the GSQL graph query language. His work with TigerGraph created a new innovative concept of Accumulators where the results of these queries can be stored in each node of the cluster and then the just the aggregate results returned to the query origin servers. This means that many of the patterns we learned writing Map-Reduce queries are now 100x easier to create and maintain in GSQL.
I want to emphasize that there are many other approaches being tested. We have been doing work with Xilinx testing their FPGAs to do parallel cosine similarity in their cost-effective hardware. Their paper last year on using an FPGA to calculate cosine similarity showed that FPGAs are ideal for doing these calculations in parallel. Their results showed that you can find the most 100 similar customers from a population of 10 million in around 29 msec.
We have also been looking at the innovative Graphcore IPU system. This is an impressive full-custom silicon architecture that allows over 115K threads to concurrently analyze graph data. Graphcore also took the RISC approach and designed their own cores to rapidly traverse large graphs in parallel.
Underlying all of these development is the need to migrate from single-process, single thread computation to distributed parallel computation. We have seen the shift from serial to parallel computation benefit the gaming industry building Graphics Processing Units (GPUs) and the machine learning industry using both GPUs, Tensor Processing Units (TPUs) and now the graph inspired Intelligent Processing Units (IPUs) being manufactured by Graphcore.
I also want to mention that just because Intel has a large market share of CISC processors does not mean that they will be excluded from the future of high-end graph and AI training and reasoning systems. Intel has a large R&D budget and my hope is that they will see the need for high-core count parallel processing using RISC processors. This will take a coherence of vision on the part of Intel management and it will take significant investment over a long period of time. If you are a graph fan I hope you reach out to Intel management to encourage them to build graph-optimized silicon.
The shift from CISC to RISC, from x86 to ARM and from discrete CPUs to System on a Chip are all following the same trend — specialized processors that can handle parallel workloads. But the industry can’t take advantage of these changes overnight. We will also need to adapt our software to take advantage of these new architectures. Graph databases will need to be modified to take advantage of this new generation of hardware. Which brings me to my last point.
Let’s say you are a new engineer trying to build Skynet-level cognition in your chatbots. You know that machine learning and graph knowledge representation is critical for your success. You might be wondering, what skills should I be learning? Here is my answer:
The hot innovation zone in AI is going to not be in a single field. It is going to be when our AI developers have a deep appreciation for the combination of three three technologies. I believe that only when we master all three will employees, departments and larger organizations be able to create competitive products in the highly competitive AI landscape.