The last few weeks have been busy in the Enterprise Knowledge Graph (EKG) space. This blog will review the key events and put them in context for a new generation of Graph Systems Thinkers trying to understand the big-picture trends that will dominate the computing industry for the next few years.
The first event was the announcement that TigerGraph got an additional $105M in venture capital in its Series C fundraising round. This should be no surprise since TigerGraph already has a wide lead in the EKG market because of three facts:
- It was designed from the ground up to be a scale-out graph that leverages the popular native labeled property graph (LPG) data model.
- It is highly optimized for performance being written in efficient C/C++. They have eliminated the slow legacy Java code running on VMs. Anyone that benchmarks TigerGraph against other graphs can easily see why they have taken the lead in the EKG space.
- It offers a true innovation in making it easy to write distributed MapReduce-style queries with its advanced GSQL language and Accumulators. Developers love these features.
TigerGraph has led the way by showing that distributed native labeled property graph architecture will dominate the future's EKG landscape.
The next big announcement was the announcement Katana Graph completed Series A funding raising an incredible $28.5 million. Note that this is Series A funding which is usually a lot less than $28.5 million. What is interesting about the announcement was that Intel Capital led it with support from Dell Technologies Capital. For those of you that are regular readers of my blog, you will note that I wrote about Intel’s PIUMA efforts in a prior post. I was happy to see many of the articles link to my blog in the Intel PIUMA architecture!
Although these investments are large, they pale next to the $222M in Series E Funding that went into Graphcore in January. Most people don’t think of Graphcore as an EKG company since their focus has been on accelerating machine learning workloads and lowering specific graph algorithms' costs via their C-level Poplar API. They don’t yet have a graph database with ACID transaction guarantees that EKGs need. However, Graphcore’s focus on high-core counts, innovative memory access, and sparse matrix compute is directly in line with the goals of the “embeddings everywhere” revolution that EKG/ML architects appreciate.
Graphcore also has worked with Cirrascale to provide a new cloud-based service called GraphCloud that will make it easier to benchmark these new hardware architectures. Cirrascale is making it easier to access custom graph hardware compute directly from your Jupyter Notebooks running on virtual machines anywhere on-premise or in the cloud.
The big picture here is that executives at both Intel and Dell now have a much deeper appreciation for the industry-wide disruptive impact that EKGs will have on enterprise computing. They also understand that hardware alone will not be enough. Providing a low-level C API is not enough for large organizations, even if you can deliver 1,000x performance speedups to graph queries. Data scientists will need to be able to execute fast graph queries directly on high-quality connected data in standardized GQL from their Jupyter Notebooks.
To be truly effective, Intel and Dell leadership know they need to provide their customers with a complete solution that will allow their customers to decommission their expensive legacy COBOL and relational database systems. Intel and Dell leaders appreciate that scale-out pointer hopping systems will make billion table JOIN function take their place in the annals of computer science history, but not part of a modern AI/ML-driven tech stack, where every vertex has a built-in embedding. To be competitive in the future, companies need more productive knowledgeable engineers and fewer data janitors.
We also continue to see great progress in accepting that the LPG community needs common query standards for keeping their queries portable across EKG systems. If you have not had a chance to watch the GQL video interview with Alastair Green, I would recommend it. Alastair brings high credibility to the noble (but challenging) efforts to express our complex graph algorithms in concise, expressive, and portable languages.
We should also note that Cloud Service Providers (CSPs) like Amazon AWS, Microsoft Azure, and Google still remain mostly on the sidelines in the EKG space. Although my contacts within the CSPs express a passing their interest in GQL, their business models depend on deploying scale-out component-based computer services running on commodity hardware in their data centers. And they are not opposed to locking companies into their proprietary graph query languages. CSPs will not lead us to develop the next generation portable graph query languages.
Hardware vendors are aware that Amazon AWS is getting into the custom silicon market with its Graviton chips. CSP business model is still focused on providing small, low-cost silos of computing power that can scale up when needed. They don’t focus on low-latency service agreements between their servers. EKG solutions need nanosecond scale service levels between all the nodes in their cluster. CSPs cost models excel at low-cost bulk storage and on-demand compute.
EKGs are defined by running real-time queries over large amounts of data that span many servers. The problems with CSPs deploying EKG have to do with the limits on the speed of light. Light travels about 1 foot per nanosecond. Intel PIUMA and Graphcore systems assume that your entire graph is contained in one or more racks connected by high-speed and low-latency networks. This is how the trillion-vertex graph traversals work on clusters. CSPs can’t guarantee that their virtual machines will be in the same rack or even the same building. That means that for CSPs to enter the EKG market, they will need to rethink their infrastructure.
I believe that the technical leadership at hardware companies like Intel and Dell now understand that they are competing against CSPs who are building their own hardware. Intel and Dell know that to be players in future data centers, they need to lead the development of both on-prem and cloud-based graph solutions. Next-generation EKG solutions need distinct price-performance advantages that CSPs can’t offer today. If Intel and Dell continue to fund innovative EKG startups like Katana Graph, they will be providing complete turn-key solutions for their enterprise customers building next-generation EKGs.
I want to take a minute to express my hope that more executives from tech companies will appreciate what the technical leaders at Intel and Dell are doing. They are not just focused on the speed-and-feed debate that fills so many of our hardware discussions. These discussions are quickly forgotten.
By funding companies like Katana Graph, tech leaders are talking a long-term Systems Thinking approach to building solutions for their customers. They are seeing the benefits of taking a more holistic view of the technology landscape. They are also taking a more empathetic view of their customers' challenges when creating a truly integrated Central Nervous System of their organizations. Only after we can execute cost-effective real-time queries over trillion-vertex graphs, we will be able to offer these innovative services to our customers.
On a final note, my enthusiasm for these investments is my own, and my opinions should not be interpreted as an endorsement by my employer.