Auto-Generating EKG Documents

A sample workflow for autogenerating documentation for large-scale enterprise knowledge graphs. Image by the author.

As Enterprise Knowledge Graphs (EKGs) grow, support teams will see an increase in demand for documentation about your EKG and how to access it. This means you will be spending more time writing and maintaining documentation.

This blog will review some of the best practices for generating high-quality, up-to-date documentation directly from your EKG’s metadata.

In the past, high-quality documentation was created by separate teams of technical writers that had the task of interviewing IT staff and then using tools such as Microsoft Word, Adobe Illustrator, Adobe Photoshop, and other expensive proprietary documentation tools. The content would be created in these tools and then put through elaborate, centralized content management systems (CMS) that transformed the documents into searchable and bookmarkable HTML and PDF formats.

This all began to change when a new generation of documentation tools began to grow out of the Model-Driven architecture movement. The general idea is that your data model was a source-of-truth about what was in your database. By transforming this model metadata into HTML, you could make sure that your documentation was current. There are no additional technical writers needed to generate the initial EKG documentation on many projects — although having a proof-reader on the team is still a good idea! We are moving from centralized content management to decentralized models of content creation. We can still use centralized search if our intranet search tools can be notified when we check in new content into our software repositories.

Although some people call these new tools “static site generators”, I think the word “static” does not do the architecture justice. I like the term microsite because it shows how quickly a small but useful website with many dynamic features can be generated. One example of a microsite is the Python training curriculum we are developing for our local CoderDojo chapter.

Although the distributed content management revolution started with specialized data transformation tools like XSLT and XQuery, it has evolved into more accessible and integrated tools in the developer’s world. We moved from storing metadata in XML into more user-friendly formats like Markdown. Markdown can then be parsed and converted into HTML. Documentation build tools like mkdocs evolved to make this part of the deployment process. Generating high-quality documentation became part of the definition-of-done for many agile teams.

The need for high-quality Markdown to HTML transformations gave rise to a series of command-line tools that could be integrated directly into developers' build process. One of the most popular of these tools is the open-source mkdocs library. Mkdocs is incredibly easy to use. You run a single command at the command line ($ pip3 install mkdocs), and you then have four simple commands ready to use:

  1. Initialize your files (mkdocs new)
  2. Build your HTML web site (mkdocs build)
  3. View it locally in a server (mkdocs serve)
  4. Deploy it to a web server or your GitHub Pages branch of your repository (mkdocs gh-deploy)

You can control the exact look-and-feel, navigation, and search by editing a single mkdocs.yml file. Simple is good!

An interesting item to note is that mkdocs is written in Python. Although it is not the only tool to convert Markdown into HTML, the large and quickly growing community of Python developers that surround the mkdocs system have created a strong demand for new themes (presentation styling), new features, new extensions, and new plugins. Mkdocs quickly became the first choice for many teams because of the incredible community of mature open source contributors behind it.

Mkdocs rapidly differentiated itself from other documentation tools that developers wanted. It had many navigation features, lots of things like code highlighters for almost any language, and powerful microsite search options. The fantastic search feature alone was worth throwing out our old content management systems. Microsite search takes every word in your website and builds a small, compact reverse index in a JSON file. When users type in the search bar, an in-browser auto-complete script matches each keystroke with a drop-down list of every document that contains that word. If users can’t find what they are looking for in the navigation system, they can quickly find the information they need using the microsite search. This is a new level of self-service that keeps developer teams focused on writing code, not answering support questions.

Note that there are limitations here. If you have hundreds of thousands of documents, the index's load time will slow your search response time. But for our projects that contain under 10,000 documents, we have not seen performance problems.

One of the other things that Mkdocs enabled was the creation of themes that are themselves extensible. One of the highest quality mkdocs themes is built around Google’s Material Design Language user interface components. Technically, Material is more than a User Interface library; it is a full language for expressing user interface concepts. It includes standard components that are both intuitive, and because of their popularity, familiar to users.

The mkdocs-material theme itself has become a mini-platform of new features. One of these is a plugin based on the mermaid JavaScript library that allows you to convert a written description of a drawing directly into the drawing. These tools are part of a long legacy of auto-layout drawing tools that go back to Bell Labs in the 1980s. When I was writing papers for the Bell Labs Technical Journal, I would create my drawings in a language called “dot” which was then rendered as images in our papers. The syntax is a bit different, but the concept is the same. You don’t need complex drawing tools to create simple technical diagrams.

You can try these tools out with the Mermaid Live Editor.

Now let’s take a look at how the models render. Here is an input of a simple model for a clinical setting:

graph LR
d1((Physician))
--- HAS_PATIENT -->
p1((Patient))
p1((Patient))
--- HAS_CONDITION -->
c1((Condition))

This says to draw an LR (Left to Right) graph layout that shows the relationship between a Physician, a Patient, and a Condition. Here is what that layout generates:

Simple LPG graph layout created by the mermaid automatic layout tools. Image by the author.

The double parenthesis tells the layout engine to use a circle in the layout. What is critical is that you don’t have to fret about the layout and the placement of the items in the diagram as an author. This is done for you. All you need to do is extract the metadata from your EKG to generate the diagram.

Mermaid also has its own way of modifying the colors, fonts, and other elements in its own theme directives.

The drawings are fine for small diagrams. When you reach a certain size of about a dozen objects, you begin to lose control over how the diagram is drawn. Tools like dot and GraphViz allow you to specify constraints as to how this layout was done. Although the Mermaid plugin is excellent, it will take some time before all the advanced layout tools from dot get moved over.

The hard work by the ISO/IEC Joint Technical Committee on the GQL language shows that we will have a standardized way to represent our labeled property graph data models and a standardized language to describe our graph queries in the near term future. This means that we will describe our business models and business logic in portable ways that will not trap us into a single database vendor or a single cloud service provider. Once the GQL Working Draft is published, we expect to see a flood of new tools to make it easy to automatically generate documentation and AI-based micro-reasoner training curriculum directly from our graph metadata.

I also want to tie this autogeneration of documentation with some of the natural language processing (NLP)community work. My work on using GPT-3 to generate STEM content has opened my eyes to an entirely new auto-content generation era. Specifically the work concerning converting a high-level English-language description of a scene or drawing into a detailed rendering. For example, OpenAI’s DALL-E project allows you to write an English language narrative of a scene, and it then generates a high-quality image of that scene. These new tools will clearly disrupt the stock image industry. They will also enable your IDE to convert a short description of a diagram into a high-quality searchable image for you! It may even allow you to modify the diagram by changing the markdown-like syntax for the diagram.

None of these tools and frameworks would be possible without the generous contribution of the open-source community. These are the developers that created Markdown, mkdocs, material, and mermaid. For this, we must all be grateful. I hope we all support these developers by sponsoring their work.

Distinguished Engineer with an interest in knowledge graphs, AI and complex systems. Big fan of STEM, Arduino, robotics, DonkeyCars and the AI Racing League.