MarkLogic Acquires Smartlogic
Why enterprise ontology management matters to the EKG industry
Yesterday we saw an announcement that NoSQL database and search vendor MarkLogic will acquire the industry-leading ontology and metadata management company Smartlogic for an undisclosed amount. This announcement could significantly impact organizations building enterprise knowledge graphs (EKGs) because EKGs depend on the management and reuse of ontologies. Smartlogic has been a leader in providing ontology management tools that scale well to meet the needs of an enterprise and are tightly integrated with NLP tools such as document classification.
In this blog, I will show how ontology management is critical for finding and enriching documents in your EKG and how this acquisition could change the direction of the EKG industry.
This announcement comes as no big surprise. MarkLogic and Smartlogic have worked together on search and metadata projects for many years. Having a great search engine is essential, but adding ontology-powered semantic search features makes MarkLogic a search engine that will be difficult to match.
Many of my readers come from the new LPG graph-database world and may not be familiar with MarkLogic. Although MarkLogic positions itself as a multi-modal database (document, graph, search, and distributed key-value stores), they are not known for their library of data science-driven graph machine learning algorithms. We will not see cosine similarity and isomorphic queries come up in their key list of features and benefits. However, MarkLogic has been around for a long time, and they do many distributed computing systems like MVCC built into their infrastructure. They also support fine-grain role-based access control over all documents they manage. And here is the key: their search results can be dramatically improved with a good ontology. This is the area of semantic search.
How Semantic Search Works
If you use a keyword-only search engine, your search results depend on search keywords exactly matching the words in a document. So, for example, if you search your benefit plans for the term "pregnancy," but the documents use the phrase "maternity care," you will get zero hits. Your users will hate their search experience.
Semantic search fixes this problem by creating a catalog of all the common words and phrases your users search in a graph data structure called on an ontology. You can think of this as a network where similar words are stored as vertices connected by the edges of a graph.
The ontology stores a graph of concept vertices where each concept has one and only one preferred label per language, such as English or Spanish. But the ontology graph also has related terms called "alternate labels." These are terms such as:
- Broader terms
When documents are added to your search system, the documents are passed through a semantic enrichment pipeline. They use NLP classification algorithms (think BERT and other large-language models) to add a few dozen metadata tags to the end of the documents. The search engine also indexes these tags. Any paragraph that discusses pregnancy might have the metadata tag "maternity" added to the document as an invisible keyword.
As a user types keywords into the search interface, the keywords and phrases go through an "ontological expansion" process that adds related words to the search query. The work "maternity" is added to the word "pregnancy." As a result, documents that discuss related concepts are matched.
The net result? Instead of a 40% probability that you get the correct document at the top of your search result, the chance of success (called an F-score) jumps up to 90% for most use-cases! Your users will love this experience, and they will send you e-mails about how your fantastic search has made their lives better.
When I worked for MarkLogic and integrated Smartlogic, it usually took me about two weeks to build a pilot project that got better scores than our users had ever seen. It is just putting the right components together and having a deep appreciation for the skills required to build a robust ontology. This is where a person with a master's degree in library science can make or break a search project.
So how are these ontologies stored? Let's look at the standards for storing these structures. Many people are familiar with the OWL standards used in tools like Stanford Protege. However, a different W3C standard puts a strong focus on human language and a lower focus on expressing computer-readable rules. This is the SKOS standard. SKOS rocks when you are building taxonomies and ontologies to enrich the search experience.
Smartlogic Champions SKOS-XL Standards
Moving your ontologies in and out of Smartlogic is easy because it is built around SKOS. One of our customers' significant concerns was would their ontologies be "trapped" within one ontology management system. The good news is that Smartlogic built its entire ecosystem of ontology editing tools around an open standard called SKOS-XL. SKOS-XL is the critical SKOS standard with a slight change to allow labels to have their properties.
Smartlogic competes on high-end features that make it easy for non-technical staff to create, link and manage ontologies. Ontology linking is a crucial feature for reuse. Smartlogic ontology publishing pipelines also make it easy to have downstream search engines pull in the main ontologies and changes in the linked ontologies.
If you are working with a company selling any data management tools that lock customers into closed standards, you need to take note. The Smartlogic no-lockin strategy gives their customers lower risk and easy ways to get out if they find a better solution.
Faceted Search: How Ontologies Drive The Search Experience
If you go to your favorite online shopping site, you will frequently see how the search systems quickly allow you to narrow your search results based on the context of the thing you are searching for. Below are some screen images of my recent search for robot screws on eBay:
Note that these facts are only enabled with the keyword “screws” was present in the search bar. Every object in your ontology might have different ways to be classified. What I like about the eBay facets is the categories also show you the counts of items in that category. So there are over 27 thousand Button Head screws and over 62 thousand Flat screws. Note that the preferred labels sometimes include the word “Head” and sometimes they are left off as in Flat and Round. Still some cleanup work here!
What is interesting is that many organizations are often not aware of how their users classify their documents. Ontology management helps search specialists analyze the search logs to help add document classification. This is not a one-time event but an ongoing area of focus for helping our users get the correct data they need quickly.
Impact on the EKG Industry
Smartlogic already has high-quality connectors for the MarkLogic. But there was always the hassle of negotiating with two vendors to create a solution. MarkLogic will now have a world-class ontology management tool that could be bundled directly with their product. That could change the game's rules, forcing other EKG vendors to also bundle in these solutions.
This announcement also impacts other ontology management companies. Although I suspect that Smartlogic will continue to have tight coupling with MarkLogic, MarkLogic could also sell Smartlogic as an ontology editor at a lower price to get their foot in the door in other projects. And if you are selling ontology management into a company that is already using MarkLogic, you now have a new competitor to deal with.
Conclusion: Applying Graph Systems Thinking About Ontologies
EKG vendors need to approach large enterprise projects with a holistic Graph Systems Thinking viewpoint. In most companies today, 90% of the information about a company is locked inside of documents. You need great NLP tools to get the data out of these documents to continually enrich the ontologies that flow into your EKG. Now go find a good ontologist and add them to your team!