Lost in [Knowledge] Space

A Knowledge Space is your status in a graph of learning concepts

9 min readOct 12, 2019

**Lost in Space** was a favorite TV show of mine when I was a kid. Most of the robots were protective and looked out for their charges. The robot is a good metaphor for AI-powered learning systems. The rendering style was done by Deep-Style.io.

Welcome to your personal Teaching Assistent Chatbot! I am hear to help you learn whatever concepts you want to master. Let me show you are around. In this part of our enterprise leaning management graph, we have a network of learning concepts. Each concept is linked to related concepts in our system’s knowledge graph. Concepts are also linked to one or more pieces of Learning Content. To get started, just tell me your learning goals. I will generate an adaptive quiz to see what concepts you already know and we will mark which Concepts you have mastered. I will then recommend new Content to you based on how others have recommended them. If you have any questions, your AI instructor will be available 24 hours a day. Just click the teacher chat icon in the lower right corner. Have fun!

The fictional dialog above may be in our near-term future. I can’t tell you how soon it will be before every school has an AI-powered learning chatbot. But it is coming. The content recommendation features of these systems are based on a well-known Knowledge Space management architecture being used today by hundreds of millions of students. These systems are integrating AI and recommendation engines directly into their learning management systems (LMS). Many of the students are in China, where there is a strong focus on using AI in education.

In this blog, I want to introduce you to the concept of Knowledge Spaces and how they use graph databases to build recommendation engines for learning. In the spirit of Data Structures + Algorithms= Programs approach, we will first focus on the data structures and then discuss the recommendation algorithms for suggesting appropriate content for each student.

Knowledge Spaces for Education

First, let’s review our definition of “Knowledge Spaces.” The concept of a Knowledge Space originated in 1985 through the work of mathematical psychologists like Jean-Paul Doignon and Jean Claude Falmange. They wrote the landmark paper “Spaces for the Assessment of Knowledge.” They came up with the idea of breaking large and complex topics into fine-grain concepts and then connecting them in a dependency graph, which they called a Knowledge Space. Many of us now call this simply a Concept Graph. In their original terminology, each student lived in their own Knowledge Space that stored where they were in the mastery of various concepts. The job of the learning management system is to suggest appropriate learning content for the student and to monitor what types of content worked and what types don’t work for each learner.

The example Knowledge Spaces I will show in this blog will be tiny. In practice, each course might be broken up into tens of thousands of concepts. It takes considerable effort to build concept graphs. However, once we create a concept graph for a course, it could be reused in many educational settings. The formats for storing and connecting these concept graphs is another interesting topic we might discuss in future blogs.

Concept Dependency Graphs

The critical fact about concepts in any LMS is that they are almost always dependant on other concepts. You can only say you have mastered the concept of arithmetic after you have learned about addition, subtraction, multiplication, and division. So to learn most concepts, you must first have an understanding of the concepts they depend on. This graph is called a dependency graph.

A concept dependency graph. To say that we understand arithmetic, we have to first learn their dependant concepts of addition, subtraction, multiplication, and division.

The connections in the diagram above are not directed. However, to be precise, we might create two bi-directional links for each concept. One might be called depends-on, and the other might be called enables. We could even associate a weight with these connections to indicate how vital mastery of one concept is to another concept.

Personalized Knowledge Spaces

Now that we know that each course can be represented in a directed graph of concepts, we want to indicate where each student is in their learning progress. To analyze a student’s knowledge, the LMS can allow a student to show their familiarity with any concept. Just asking students if they understand a concept and allowing them to answer yes or no is called a self-assessment. Alternatively, the LMS can store a list of questions that the student should answer correctly. Correct answers will indicate mastery of a concept. We can then create a personal knowledge space for each student. The concepts the students know can be marked with a number to indicate their mastery level (0 to 1.0). To keep this simple, we will color the mastered concepts green. Concepts that are enabled by the green concepts, but are not yet mastered we can color with yellow. Concepts that depend on concepts the student has not yet mastered can be marked with a red color.

We can then build a three-color dependency graph like the one below.

A personalized knowledge space for an individual student. The concepts that are already mastered are colored green. The concepts that the student should focus on are labeled in yellow. The concepts that depend on any concept that has not been mastered are indicated in red.

We can now see that the job of suggesting what to learn is much easier. We generate a report of the concepts in yellow from the graph. But there is one problem. We can’t only recommend a set of nodes in a graph. We need to recommend learning content related to these concepts.

The middle zone labeled the Zone of Learning, is a well-studied area in education. It is technically called the Zone of Proximal Development or (ZPD). In the past, it has been difficult to accurately represent this area in computer systems for millions of students. Graph databases that scale across multiple computers have made this problem tractable. AI systems can now leverage these databases.

Adding Content to a Concept Graph

Although the words “concept” and “content” are similar, they are entirely different things. Concepts are just simple data structures in a graph. Content is some entity that has been created by one or more persons that are used to teach concepts. Content can be things like web pages, Wikipedia articles, blog posts, books, chapters, a section of the chapter, sample code, GitHub repositories, videos, or even interactive animations. Content can also be items created by fellow students in your class. Each content node in our graph can link to many different types of content. Your learning style may be very different, so you may prefer videos to textbooks. The LMS system will need to monitor your progress with different types of content and change its recommendations based on your prior success.

We can now take the next step by adding a second “layer” of information to our graph. These are nodes that hold pointers to the various content associated with each concept. An example of linking concepts to content is shown below:

Each concept may link to many types of concent. The content may have many different forms. They may be books, Wikipedia pages, videos, labs, mentors, or even other students. Because the content nodes are of a different type, we draw them as blue squares. Concepts remain green circles.

There is one other type of content that might be helpful. That is content that describes other content. You can think of this as meta-content. For example, a book might have a star rating system (1 to 5 stars) created by readers or a list of recommendations by other people. Sometimes meta-content is conditional. For example, it might suggest that you need to have a specific math background to be able to understand an article or book. You can visualize these recommendations as additional properties or additional links between content. The number of “claps” for this blog is an excellent example of meta-content. Meta-content is vital for us to recommend the right material for the right student.

We now have all the data structures in place to start to recommend the appropriate content for each student based on their current knowledge. Now let’s leave the knowledge representation world and dive into the world of recommendation algorithms.

Recommendation Algorithms for Learning Content

At its core, the heart of our first-generation AI teaching assistant is a simple recommendation engine. It suggests the right content to help you reach your learning objectives. After the graph of concepts, content, meta-content, and all their relationships are loaded into the learning graph, we can use a standard recommendation engine that works with graph data.

The type of recommendation algorithms we might use comes from a class of algorithms called Collaborative Filtering algorithms. Collaborative filtering is a way of predicting what learning content is appropriate for you based on what others like you have preferred. Collaborative filtering works best when we have a lot of data about what prior students liked and didn’t like.

Here is a very high-level outline of the recommendation algorithm:

For a given student’s learning objective, create a list of all students with a similar background and similar learning objectives.
From this collection of students, find out what content they liked most.
Return the content that has the highest ratings. We sort the items by the rating score, and the users and page through the results, much like search engine results.

There are a few things to note. The more data we have, the better the prediction. Schools that have lots of historical recommendations will have much better results than a school that is just starting up.

Finding similar students can also be a compute-intensive problem. When I first started building recommendation engines in the 1990s, finding similar customers in a product recommendation system could only be calculated overnight using relational database SQL queries and batch processing systems. The recommendations would then be stored in lists that would be displayed the next day for users to view.

Today the process of finding similar items from extensive collections is usually done using a graph database and graph traversal algorithms. If you had a million students, you could use an algorithm called Cosine Similarity to find the most similar students in a few seconds if you have enough core processors in your system.

Cosine similarity algorithms are in a class of algorithms called Embarrassingly Parallel algorithms. Once you have a summary of the critical elements of each student, called the target entity, you hand these elements off to a cluster of thousands of independent servers. Each server will compare your target with students on that server. The server will then return the top 100 or so student match scores, and the results can be merged, ranked, and returned to the user.

The good news is that similarity algorithms are well understood. In the future, we will have custom hardware built on FPGAs or custom silicon to make these calculations fast and affordable — even for high school IT department budgets! This is the type of research going on at labs around the world today.

Now Just What to Learn But How to Learn

An excellent AI-driven instructor will also have information not just about what content is relevant; it can also advise you on how you should learn. For example, studies show that using a mentor to help you in your educational goals can dramatically increase your chances of achieving your educational objectives. Having supportive peers that are also working on similar problems also has been statically shown to improve your odds of success. A good AI-powered LMS should also take into account if you learn better using text, images, or video presentation of materials.

Unanswered Questions for AI-Driven LMS

So hopefully, you now have a clear image of how to build an AI-driven chatbot that will use your personal Knowledge Space to recommend the appropriate content. Here are a few questions that come to mind when we think about how schools will build these systems:

What learning recommendations can be public? What should be private?
Should the student’s identity that wrote a negative recommendation be kept anonymous?
Should average test scores of specific courses be public?
What is the role of AI in the automated assessment process? Can assessments be shared between schools? Can they be anonymized?
Will the recommendations be biased? If a student has similar demographics to an underrepresented group, will your recommendations be less ambitious?

In summary, I think that most LMS systems today are only beginning to incorporate predictive AI features into their software. Many of LMS vendors are being held back by their dependency on relational databases. And if you are a vendor trying to educate your customers on your products, perhaps to stay competitive, you need to think about leveraging modern graph databases in your development tools.