Predictive Graph Models for the Success of Coding Clubs

10 min readAug 23, 2020

A graph model of a coding club that can be used to discuss strategic priorities for club sustainability.

All models are wrong. Some models are useful.
— George Box
The key to artificial intelligence has always been the representation.
— Jeff Hawkins

I have been working with various coding clubs such as CoderDojos for almost eight years. Some of these clubs have been successful and have been hosting weekly coding clubs for many years. But many come and go within a few years or months or even weeks. They are often dependant on a single parent’s leadership. When that parent’s kids graduate from a school, the club is left without an organizer, and the club disappears.

The COVID pandemic has put additional strain on clubs that lack the leadership to transition to an online format. When I am asked to help online clubs come up with a strategy, I have a pretty good mental model of what they need to do to survive. This blog will help the reader understand these models. I hope they can be useful for creating sustainable coding clubs, both online and with actual face-to-face meetings.

In my day job, I work with a large number of smart Data Scientists that spend a lot of time thinking about how to build predictive models. Predictive models usually return some probability score, such as a number between 0.0 and 1.0, that reflect the odds that some events will occur. We can use these same processes to build a model that can predict if a coding club will be sustainable. We can also build models to predict if individual students will reach their programming goals and if coding club service providers will develop services that meet the needs of their clubs. The latest trend is not just to capture the features of a single entity like a person or a club but to create an interaction graph between the critical entities in any organization.

Although there are many complex Systems-thinking challenges in these models, I think we can start with simple models and continue to grow them in complexity as a team matures in their ability to comprehend the models. These models are also great ways to structure discussions about the priorities of these clubs for strategic planning purposes. They are also an excellent topic for people interested in Complex Adaptive Theory and the future of online coding clubs.

Model 1: A Simple Predictive Model

First, let’s start with the simplest model of learning without a mentor. This model consists of students and learning content.

For our modeling purposes, content is anything students need to learn something: books, web pages, Wikipedia articles, videos, tutorials, sample programs, etc. Anything we can put in front of a student to help them learn.

In this model, there are no facilitators. There are no mentors, no teachers, no intelligent agents recommending content, no peers, and no coding clubs. It is just a student and content. What percent of students can learn to code without these facilitators? My guess is under 10%. Every once in a while, I run into people that taught themselves to code just by using a search engine to find an online tutorial, and they take off from there without any assistance. It does happen, but these events are rare. My guess is these students are heavily influenced by friends and family that have shown them that coding is fun and gives them great powers to control the world around you.

The reason that most kids can’t do this on their own is that learning to code can be frustrating. If the student runs into a simple bug in their program, and they don’t have the background to find and fix the error, they become frustrated. Frustrated students go on to another subject area that has more short-term rewards.

There is also an underlying assumption that the student has a specific learning goal, and that content is about reaching that goal. We will omit that relationship, for now, to keep our model simple.

Mathematically, the probability of success function would be something like this:

One of the simplest formulas for calculating the probability that a student would learn coding just with access to content. The probability of success only uses features of motivation and access to high-quality content.

In reality, there are other factors such as does the student have a working computer, a quiet place to learn and supportive family and friends. There are dozens of attributes of learning content that we could also model, such as the quality of content, the adaptability of online learning materials, interactivity, and fitness to the learning goals of the student. But for now, let’s keep things simple and add another dimension to our model: the mentor.

Model 2: Adding the Mentor

Next, let’s add one level of complexity. Let’s add a mentor to our model:

Adding a mentor to the model. We can also model the properties of the relationships between the mentor and the student and the mentor’s knowledge of the learning content.

Studies have shown that adding about one hour of mentoring per week has a significant positive impact on the probability of a student reaching their coding goals. Mentors help students fix bugs, learn debugging techniques, and keep students on track. They can also tell stories about their journey from a novice programmer, through apprentice, and toward full-time software developer. These stories help students have a clear mental model of a potential career path.

The factors that determine the effectiveness of a mentor can be broken into three parts:

The relationship between the mentor and the student. Mentor-student relationships include the listening skills of the mentor if they communicate well and if the mentor has a good mental model of the concepts the students know, the concepts they don’t know, and the concepts they are ready to learn. See my article on Knowledge Spaces for more on this topic.
The relationship between the mentor and the content. Does the mentor know the training content? Do they know the computer language being taught? Are they familiar with the typical challenges students have when learning this content?
The inherent motivation and organizational skills of the mentor. Do they enjoy mentoring? Are they organized enough to prep for the mentoring session? Do they know how to manage Zoom for virtual mentoring sessions?

The predictive equation now gets a little more complicated.

Probability of student success model when a mentor factor is added. The weight is determined by how much the students need a mentor to stay on track.

In practice, the relationship numbers are many small features that can be measured with assessments and then multiplied by a weight that can be calculated using machine learning. The actual process behind calculating these weights is beyond the scope of this article. For now, it is sufficient to know that factors can be measured, and a model of success for the student can be derived. The accuracy of the prediction will be determined by how accurate the assessments are and the amount of training data we have to build the model. To be useful, typical models will need a minimum of tens of thousands of assessments that tie inputs to measured outputs of success for any student.

Model 3: Adding a Coding Club

Mentors are not the only facilitator. Other factors include surrounding the students with other students trying to reach similar goals and using techniques like project-based learning and social constructivism to help students learn skills like working in groups, team problem solving, and shared task management. Many of these skills can only be learned when working in groups. So now, let’s introduce the coding club to our model.

Adding a coding club to our predictive model: how social interactions facilitate learning.

On the left of the data model, we have now added a new element: The Coding Club. Coding clubs provide a social context for students that want to learn how to code. In a physical classroom, the connections between a student and their classmates are clear and visible. Students can visibly see the other students working and staying focused on their learning tasks. In project-based learning, students work in teams and learn how others write and debug code. Students are usually careful observers of other behaviors, and they learn quickly from observation of the more senior students on a project.

One cautionary story. When we move to virtual classrooms, we often lose this classroom dynamic. Student-student observations are difficult to reproduce on virtual calls. We need to replace these interactions with new methods to facilitate learning.

Note that the Coding Club also influence the mentor. Coding Clubs can provide ways that mentors watch each other. They learn from other mentors what skills they need. They learn what content is useful and techniques for explaining difficult concepts. They learn to use little stories an metaphors to teach specific concepts.

At this point, it might not be useful to the reader to see the exact equation for predicting a student’s success in terms of the individual features of the model. It is sufficient to say that we will take the features for any student in the context of the coding club, the mentor, the content, and put them in a graph data structure. We will then train a model to calculate the relative weights of each feature and sum them together to come up with a final probability score.

We are moving to a more abstract version of our model to predict the success of a coding student reaching their educational objectives with a mentor-assisted coding club.

Model 4: Building a Detailed Model of A Coding Club

After working with coding clubs for many years, I have in my mind, a long list of attributes of coding clubs that we can gather for each club. Some of these are shown in the next model:

A Coding Club Detailed Model: Attributes of the coding club included their leadership, their facilities such rooms and buildings, their hardware such as Arduino kits and robots, their way of matching students to coding groups and mentors, their ability to recruit and retain mentors.

From the figure caption, you can see descriptions of each of the blue connectors to the coding club. In the ideal world, the success of a coding club would be highly correlated to the success of their students. Students that had high achievements would refer others to the coding clubs. The same should also work with mentors. If mentors have a great experience mentoring, they would tell others that they had a great experience. They would each have a high likelihood of recommending the club, and the club would then get a high net promoter score. This Net Promoter Score is the most critical way that world-of-mouth spreads when a coding club does things right.

There are also many sub-features of leadership:

Are there active marketing programs to recruit new mentors?
Is there staff in place to train mentors as they are getting started?
Are there programs in place to contact mentors and encourage them to return?
Is there funding to purchase computers and robotics equipment?
Do leaders assign management tasks to staff with appropriate skills?

Model 5: The Coding Club Service Provider

Many Coding Clubs are part of larger organizations. Some clubs adopt the guidelines of an international organization like the CoderDojo Foundation so they can benefit from a recognized branding of these organizations. They are, in effect, a franchisee of a larger nonprofit organization. These clubs can then call themself a local chapter of an international organization if they commit to a consistent level of quality, such as:

required background checks for mentors
no fees charged to students
and a maximum of 3 to 1 student to mentor ratio

CoderDojo also provides training and best practice guidelines for their clubs.

The coding clubs may also be part of local organizations such as a company, a college, or a school. The clubs use the organization’s facilities at low or no costs because the club’s mission is aligned with the mission of the organization.

Coding clubs may also be structured as local project organizations that have a mission to support local coding clubs in a community. These organizations are often funded by organizations that want to support STEM education in their region or city. We call these organizations Coding Club Sponsors or Coding Club Service Providers.

Example of a coding club service provider. These groups sponsor coding clubs for different organizations with different missions and different areas of focus. They provide best practices, accounting services, content development, and startup services for various coding clubs.

In this model, we can see that the Coding Club Service providers are helping the coding club with services that are common to many other clubs. They also may provide a list of best practices, financial services, mentor background check services, access to mentors, laptop stickers, badges, and t-shirts as well as startup services for new clubs. Because these service providers work with many coding clubs, they see recurring patterns when clubs run into problems, and they look for long-term sustainable solutions for their clubs.

Now you might ask, can we really build these predictive models? The answer will come down to getting accurate training data. Right now there is no organization that is tracking the success of coding clubs and their sustainability. So there is no way to really know the best weights for the predictive models. What I can say is that my anecdotal evidence is that local club leadership is the most important factor in building these models. It would have a high weight in calculating the overall success of the sustainability of most coding clubs. As we gather more data we will have better ways of estimating the relative weights in the probability equations.