My friend Arun Batchu and I have had several spirited discussions about a recent article in the July 2019 Scientific American about Network Neuroscience. The article, “How Matter Becomes Mind” is a well-written summary of the latest research in brain science and summarizes the high-level flow of information between 300 different regions of the brain. There are several topics in this article that you might find of interest to people building and promoting knowledge graphs. The use of graph theory and extensive use of the “symphony” metaphor resonates with me.
Most of the readers of these blogs are familiar with the concepts of enterprise knowledge graphs. I suspect that few are familiar with the concept of emergence in complex systems and computer science. Emergence occurs when an entity is observed to have properties its parts do not have on their own. These properties or behaviors emerge only when the parts interact in a wider whole. Although many teams building enterprise knowledge graphs don’t explicitly state that emergence is one of their goals, it is something we should keep in mind when explaining why we are building enterprise knowledge graphs.
Arun and I both derived different insights from the article based on prior experiences. It is only after we started talking together that a new set of insights emerged. And that, in summary, is emergence. Emergence is the pattern that I believe will be important as we build 100-billion vertex knowledge graphs: new insights will emerge as we integrate and query disparate data sources. These insights are difficult or impossible without an integrated view of the world.
Arun has one of the most diverse sets of interests of anyone I have ever met. He and I first started working together in 2006 and published research papers on AI and graphs back in 2008. Later in his career Arun lead the search teams at BestBuy.com with Jay Myers. Together Jay and Arun pioneered the use of advanced technologies like RDF, SPARQL, and NetKernel at BestBuy before graphs became popular. Arun and I are now working together at the Advanced Technology Collaborative within Optum and continue to map emerging technologies to wicked business problems.
When Arun read the article, he found many topics drew from his rich background in network theory, complex systems, and emergence. I, on the other hand, pulled from other areas that reflected my background in NoSQL, graph databases and distributed computing at scale. It was only after discussing these topics together that we formed a coherent theory about how to apply the patterns in the Scientific American article to predicting the strategic relevance of massive knowledge graphs to the healthcare industry.
One of the reasons that Arun and I have productive discussions is that although we share a different past, we have a rich common vocabulary. We are both huge fans of pattern languages. When I say “knowledge vs data” Arun and I both have a common visualization of the DIKW Pyramid. When I say OBDA pattern we both have a shared visualization of using ontologies and rules to harmonize data. In summary, we share a common pattern language and we use these pattern labels to have higher bandwidth discussions.
This is not to imply our knowledge graph team will agree on all our viewpoints. Emergence is about finding common ground and building on it so that information can flow between subgraphs. It is also about standardizing both the knowledge representation of abstract concepts and building queries that can be shared between teams. Generally, reasoning happens within a narrow subgraph. For example, geospatial reasoning might only touch address, city, state and zip code vertices. However, reasoning over multiple subgraphs can be much more challenging and may require orchestration between different subject matter experts.
When we build large enterprise-wide knowledge graphs, we need to deeply appreciate both diversity and harmonization. As Arun says, “We need to always be able to hold two oppositional thoughts in our minds at the same time”. Arun will casually suggest two opposing viewpoints to me and ask me how we might resolve them. This higher-order Socratic method is the hallmark of a great engineer and I am grateful for his questioning.
To build large-scale enterprise-knowledge graphs we must strive to get the diverse perspectives of different business units and stakeholders. We need to blend these requirements together to come up with coherent planning strategies. We need to combine the needs of real-time sub-10 millisecond rules and analytics with the need for deep analytical reports that may take minutes to run in our knowledge graph. We need to balance the advanced needs of experienced graph developers with the basic training needs of teams that are new to graph databases. We need to harmonize the long-term vision of building Skynet-level self-aware capability models with the short term concerns of decommissioning our legacy RDBMS and Hadoop systems.
Although brain-inspired strategies are common in the graph community, I want to urge some restraint in overusing brain metaphors. Although our brains are also graphs with implicit parallel-processing, they use a radically different graph architecture than the popular labeled-property graphs. Like many corporate knowledge graphs from Google, Facebook, LinkedIn, Amazon and Pinterest, our brains also have on the order of 100 billion vertices. The degree of our brain — number of edges per vertex - is about 10,000. Most enterprise knowledge graphs have an average degree of 10 or less. In our brain, each edge (built of axons and synapses) also encodes complex non-linear behavior that uses timed-pulses to aggregate signals. And to be quite honest, we don’t really understand the details of how knowledge is stored or how learning takes place in the brain.
Emergence is not something you can easily plan for when you propose building an enterprise knowledge graph. You can’t say to your accounting team that is trying to calculate the cost/benefit ratios of the project: We will have three new insights per month each saving us $100K/year savings if we build a knowledge graph. But you can use the brain and symphony metaphors to appeal to their sense of reason. Your accounting team and funding teams will also need to embrace oppositional thinking. They need to know that knowledge graphs are logically the right direction for many organizations that are trying to lower the cost of insights, and that we can’t accurately predict the cost savings of these future insights. Emergence evolves continually as we integrate more data into our enterprise. But the bottom line is you should not expect emergence on siloed data.