Here is a good example of how to quickly create an outline for any topic in Wikipedia using simple graph analytics:
I like the way the author (Caleb Jones from Disney), broke the process down into discrete steps:
- Point your http client (mini web crawler) to any wikipedia page.
- Gather three levels of links (using a stop list to skip over reference pages). Note that if your http client supports XPath the query is just $page//a[not(@href=$stoplist)]. Simple is good!
- Put the links into a graph and filter out the links that have a lower Page Rank (inbound link count). The Page Rank function might even be an out-of-the box function in your graph library.
- Use a graph community detection algorithm or tool like Gephi to find “communities” of concepts.
- Tweek the clustering algorithms to get a reasonable number of communities (5–10 subtopics)
- Color code the communities and add labels to each community (something that is still a mostly manual process today)
The list of “labeled graph communities” provides a first level outline for your topic. You can repeat the steps above for each community to get a second level outline. Note that all of these steps are not yet fully automated. However, by breaking these steps down into a series of REST services I think they could be streamlined. This is an excellent example of how to quickly build concept maps to give you a broad overview of a new topic and show the relationships of this topic to other concepts. This can be done today using your own laptop/desktop without the need for a team of AI/Deep Learning/NLP researches and a rack of GPUs. Let’s not make this more difficult than it is.
My hope (prediction?) is that in a few years every database, search engine, word processor, smart speaker and ontology editor will have a “plug-in” that allows us to quickly suggest related concepts from concept these concept graphs. This should just be another variation of the MarkLogic suggest function.
One product I am using (Smartlogic’s Semaphore Ontology Editor) already has an API for a side panel widget for adding these “suggestions” [disclaimer: my wife works there]. These real-time suggestions can have a positive impact on the productivity of anyone building taxonomies and ontologies.
So if I am writing a whitepaper about graph databases (a real use-case) my word processor (or my presentation tool) should be able to suggest an initial outline for me. This should be as easy as saying: “Alexa — write me a white-paper outline on graph databases”.