Understanding Graph Embeddings

A sample of customer data in a knowledge graph and the embedding vector attached to the graph.

Mowgli’s Walk

The context for our story about Mowgli’s walk
Moglie sees a tiger on the path. What should he do? Run back to the village or proceed down the path.

What Are Graph Embeddings?

  1. Graph embeddings are data structures used for fast comparison of similar data structures. Graph embeddings that are too large take more RAM and longer to compute a comparison. Here smaller is often better.
  2. Graph embeddings compress many complex features and structures of the data around a vertex in our graph including all the attributes of the vertex and the attributes of the edges and vertices around the main vertex. The data around a vertex is called the “context window” which we will discuss later.
  3. Graph embeddings are calculated using machine learning algorithms. Like other machine learning systems, the more training data we have, the better our embedding will embody the uniqueness of an item.
  4. The process of creating a new embedding vector is called “encoding” or “encoding a vertex”. The process of regenerating a vertex from the embedding is called “decoding” or generating a vertex. The process of measuring how well an embedding does and finding similar items is called a “loss function”.
  5. There may not be “semantics” or meaning associated with each number in an embedding. Embeddings can be thought of as a low-dimensional representation of an item in a vector space. Items that are near each other in this embedding space are considered similar to each other in the real world. Embeddings focus on performance, not explainability.
  6. Embeddings are ideal for “fuzzy” match problems. If you have hundreds or thousands of lines of complex if-then statements to build cohorts, graph embeddings provide a way to make this code much smaller and easier to maintain.
  7. Graph embeddings work with other graph algorithms. If you are doing clustering or classification, graph embeddings can be used as an additional tool to increase the performance and quality of these other algorithms.

Nearness In Embedding Space

Given any two points on a map, we can create a formula for calculating the distance between the points.

Word Embedding Analogies

Examples of word embeddings for the concepts of royalty and gender.

How Are Graph Embedding Stored?

An illustration of a vertex embedding for a subgraph of a graph.

Size of Embeddings

No Semantics with Each Value

Any Vertex Can Have an Embedding

Calculating the Context Window of an Embedding

Embeddings vs. Hand-coded Feature Engineering

Tradeoffs of Creating Embeddings

Homogeneous vs. Heterogeneous Graphs

How Enterprise Knowledge Graph Embeddings Are Calculated

  1. Graph convolutional neural networks (GCN)
  2. Random walks

Graph Convolutional Neural Networks (GCN)

Random Walk Algorithms

Conclusion

--

--

--

Distinguished Engineer with an interest in knowledge graphs, AI, and systems thinking. Big fan of STEM, Arduino, robotics, DonkeyCars, and the AI Racing League.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to use Deep Learning for Time-Series Data

People looking through the face of a large clock tower from inside the tower.

Capstone Data Acquisition and Cleaning

Plagiarism Detection with AWS SageMaker

Product Optimization with DAR.WIN

Journey into data science and machine learning

Exploring the Right Way to Play Wordle

You Got a Friend in Me — A Predictive Model of Ideal Friendships

From Data to Insights

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dan McCreary

Dan McCreary

Distinguished Engineer with an interest in knowledge graphs, AI, and systems thinking. Big fan of STEM, Arduino, robotics, DonkeyCars, and the AI Racing League.

More from Medium

What the heck is a “graph database”?

Painless Explainability for NLP/Text Models with LIME and ELI5

Human-In-The-Loop Systems — All You Need To Know

Growing AI Fast with ML-Ops: Breaking the barrier between research and production