My First Knowledge Graph

This tutorial is designed to give users their first introduction to the concept of a knowledge graph.

Knowledge Graph

Download this tutorial as a RAI notebook by clicking here.

Goal

The goal of this tutorial is to introduce the concept of a knowledge graph and practice some simple Rel that will allow you to create your first knowledge graph.

What is a Knowledge Graph?

Knowledge graphs allow you to organize information and define relationships between pieces of data. Informational items can connect to other items if there is a relationship between them. Where no connection exists between two items, it means that we are not aware of a relationship between the items. A picture makes it easier to understand and visualize.

simple graph diagram

Imagine each numbered circle represents an item of information. This picture represents a special kind of graph, where each circle represents a “node” (what we’ve been referring to as an item of information) and each arrow represents an “edge” (the connection between two items of information). We’ll learn more about nodes and edges, but you can keep this diagram in mind as a mental model.

Rel

Rel is RAI’s declarative modeling language. It allows us to describe graphs like this in a straightforward way. For example, in the figure, the 1 node has an edge pointing to the 2 node and the 4 node, and the 3 node has an edge pointing to the 4 node. Here’s what these relationships can look like in Rel:

install
def node = {1; 2; 3; 4}

def edge = (1, 2)
def edge = (1, 4)
def edge = (3, 4)

Let’s take a moment to look at the syntax of the Rel we wrote. We started by defining the nodes we have in the graph in the relation node, represented by a list of values. Next, we’ll think about our edge expressions.

There is an edge between nodes 1 and 2, and between nodes 1 and 4. There is also an edge between nodes 3 and 4. Notice there is no expression stating def edge = (2, 3). The absence of that expression corresponds to the fact that there is no edge pointing from 2 to 3. When modeling the diagram with Rel, we express only the relationships that we see.

An interesting aspect of the code above is that we defined edge three times. In Rel this means the relation edge is the union of all the separate definitions for that relation. That is, all three contribute to describing one relation, in this case edge. This way of expressing relations stands in contrast to imperative languages (like Python or C) where a subsequent definition overwrites (not amends) the previous definitions.

What we have done here is construct a very simple knowledge graph. We can now start asking questions of our data, which we refer to as queries. Let’s ask, for instance, about all edges (x, y) originating from node 1 (x=1). To do so, we define a relation called output that will show the results of our query:

query
def output(x, y) = edge(x, y) and x = 1

Relation: output

12
14

Our query returns two results: (1, 2) and (1, 4). In other words, our query tells us that node 1 connects to node 2, and node 1 connects to node 4. Is this correct? Look at the diagram. You can see it matches up with what we expect!

We can modify the query to test for other connections. For example, we get nothing back when we change x to 2, because 2 does not point to another node.

query
def output(x, y) = edge(x, y) and x = 2

Relation: output

Our first knowledge graph, then, demonstrates how we describe relationships between nodes and edges. This gets more interesting when we start to associate nodes and edges with real-world data.

Scenario

Let’s imagine that an airline wants to run an advertisement campaign showing that it is the best airline to choose for both winter and summer travel. Since the Olympics are well known for having winter and summer events, they decide to use Olympic athletes in their campaign. To begin, the airline wants to identify those Olympic athletes who are best known: those have participated in the most number of Olympic Games overall. Next, following the theme of its campaign, the airline wants to identify those athletes who have participated in both Winter and Summer Games.

The airline wants to script a humorous competition between a well-known Olympic athlete who has competed in either Summer or Winter Games and another athlete who has participated in both Winter and Summer Games. The campaign script shows one athlete labeling the other as “lazy” during one of the seasons, while they give it their all no matter the time of year. The pitch is that no matter the season, everyone knows the airline of choice.

Let’s explore how our airline can model this scenario with a knowledge graph, starting by defining nodes as entities and the relationships between the entities as edges.

Entities

In our simple diagram above, each bubble represented an entity. Obviously, an entity representing just a number isn’t very interesting. Instead, we want to associate real-world data with an entity. If we think about our advertising scenario, we can intuitively identify two types of entities: Olympic Games and athletes. As well as having a type, entities can also have properties like name, age, and other information associated with them.

Let’s summarize everything we just learned about entities:

TermDescription
EntitiesRepresented by nodes in a knowledge graph.
Entity TypesDefine the kind of entity we’re dealing with. With our Olympics example, these are athletes and Olympic Games.
Entity PropertiesInformation associated with an entity. In the case of Olympic Games, this information would include the name of the game, the year of the event, the host city, and whether the game is summer or winter

We should note here that we are using a simplified construction of entities in this tutorial. To learn more about how entities can be constructed in Rel, see our Entities Concept Guide.

Modeling Our Knowledge Graph in Rel

When we describe entities and the relationships between them, we are building a model of our data. As we do so, we’re describing a knowledge graph. The easiest way to start is by writing some things that we know.

Olympic Games

Let’s begin with entities. First, we’ll define an entity that describes Olympic Games. We’ll use the Rel module syntax to help us organize the facts we know about recent Summer and Winter Olympic Games. A module lets us specify an entity and its attributes within an identifiable block, such as game["Tokyo 2020"] which represents the Tokyo 2020 Olympic Games and contains all our information about it. We’ll learn more about modules in later tutorials, but for now let’s get used to organizing our code with them.

We begin by listing some recent Summer Games, including definitions of year, location, and type (which, in this example, is always “Summer”) :

install
module game["Tokyo 2020"]
def city = "Tokyo"
def type = "Summer"
def year = 2021
end

module game["Rio 2016"]
def city = "Rio de Janeiro"
def type = "Summer"
def year = 2016
end

module game["London 2012"]
def city = "London"
def type = "Summer"
def year = 2012
end

module game["Beijing 2008"]
def city = "Beijing"
def type = "Summer"
def year = 2008
end

module game["Athens 2004"]
def city = "Athens"
def type = "Summer"
def year = 2004
end

As you can see, a module is started with the keyword module and closed with the keyword end.

In our diagram, we can see these are the entities we have created to start building our knowledge graph.

Olympics Knowledge Graph

We can also visualize the entity properties we have defined in a similar way, where we treat the entity property as a special type of edge between the entity and its value.

Olympics Knowledge Graph

We will continue adding data as we go along. Next, we use the same syntax to model the two most recent Winter Games:

install
module game["PyeongChang 2018"]
def city = "PyeongChang"
def type = "Winter"
def year = 2018
end

module game["Sochi 2014"]
def city = "Sochi"
def type = "Winter"
def year = 2014
end
Olympics Knowledge Graph

So far, we have specified facts about recent Olympic Games, organized these as modules, and defined the game entity. Notice once again how we’ve reused game, just as we defined edge multiple times in our simple bubble example. That is, all of the definitions of game are unioned. We’ve specified a collection of game entities, identified by a name such as “Tokyo 2020” or “Athens 2004”.

Athletes

Next, we create a module for athletes. This module includes definitions of names and sports for each athlete. Although in the real world, there are thousands of Olympic athletes, for illustration purposes, we’re going to use a short list. Again, the Rel module syntax helps us organize our thinking around modeling with entities expressed in easy-to-recognize blocks.

install
module athlete["Allyson Felix"]
def name = "Allyson Michelle Felix"
def sport = "Athletics"
end

module athlete["Eddy Alvarez"]
def name = "Eduardo Cortes Alvarez"
def sport = {"Baseball"; "Short track speed skating"}
end

module athlete["Tom Daley"]
def name = "Thomas Robert Daley"
def sport = "Diving"
end

module athlete["Simone Biles"]
def name = "Simone Arianne Biles"
def sport = "Gymnastics"
end

We can see that our knowledge graph is growing as we define more and more entities. We’ve specified facts for two entities: game (purple nodes) and athlete (green nodes).

Olympics Knowledge Graph

Some athletes may play in multiple sports. Eddy Alvarez, for example, competed in both baseball and short track speed skating. But this fact doesn’t change how we specify the attribute sport. In the case of athletes who have competed in multiple sports, we can use the operator ; to enter multiple values and the {} to group these values in a collection.

Participation

Now we have defined all our entities, we want to model athlete participation in Olympic Games as this is the relationship we are interested in. To do so, we write a relation called participated_in that captures athlete participation. This relation creates the edges in our knowledge graph that connect athlete nodes with the relevant Olympic Game nodes. Once again, we use the operator ; to enter multiple values and the {} to group these values in a collection.

install
def participated_in = {
("Eddy Alvarez", "Tokyo 2020");
("Eddy Alvarez", "Sochi 2014");

("Allyson Felix", "Tokyo 2020");
("Allyson Felix", "Rio 2016");
("Allyson Felix", "London 2012");
("Allyson Felix", "Beijing 2008");
("Allyson Felix", "Athens 2004");

("Tom Daley", "Tokyo 2020");
("Tom Daley", "Rio 2016");
("Tom Daley", "London 2012");
("Tom Daley", "Beijing 2008");

("Simone Biles", "Tokyo 2020");
("Simone Biles", "Rio 2016")
}
Olympics Knowledge Graph

When the two entities are related, we can find additional insight by asking questions – queries – about specific entity properties as well as connections between entities. It’s easy to find a collection of athletes who have participated in the “Tokyo 2020” Games. But with the sport attribute conveniently defined in the athlete entity, we can refine our query to find only athletes performing gymnastics in the “Tokyo 2020” Games.

query
def output(person) =
athlete(person, :sport, "Gymnastics")
and participated_in(person, "Tokyo 2020")

Relation: output

"Simone Biles"

Here, we use the variable person as a placeholder for an athlete who is a gymnast: athlete(person, :sport, "Gymnastics") and participated in the “Tokyo 2020” Olympics: participated_in(person, "Tokyo 2020").

This example represents one “hop” on a knowledge graph, where we start at the game node and jump via the edge participated_in to all neighboring nodes of type athlete that fulfill our specific condition. This type of query is characteristic of working with knowledge graphs.

Querying Our Knowledge Graph in Rel

Now, let’s return to our advertising scenario and apply our knowledge of queries. We want to find:

  1. Athletes in our list who have participated in the most number of Olympic Games.
  2. Athletes who have participated in both Summer and Winter Games.

We can use the information in the knowledge graph to find the most appropriate athletes for the campaign from our data.

Most Experienced Athlete

To solve the first question, we first define a list of athletes coupled with the number of Olympic Games they’ve participated in. We’ll call this relation experience.

install
def experience[person] = count[game : participated_in(person, game)]

Here, we use a built-in function called count to aggregate over game, counting the number of Games per athlete. Now, let’s ask for the results:

query
experience

Relation: output

"Allyson Felix"5
"Eddy Alvarez"2
"Simone Biles"2
"Tom Daley"4

While we can see by inspection who has participated in the most Games, we can imagine getting a long list of data if our data set were bigger. So let’s write the Rel to find this answer for us, building on the experience definition above:

query
def most_experienced_athlete = argmax[experience]
def output = most_experienced_athlete

Relation: output

"Allyson Felix"

We can see that from our limited data set, Allyson Felix has participated in five Olympic Games, and is by far our most experienced athlete. In this snippet, we use another built-in function called argmax. This function chooses from the list above the athlete associated with the largest number of Olympic Games, which is stored in the relation experience.

Most Diverse Athlete

Now, let’s ask who is the most diverse athlete. That is, from the list of athletes above, who has participated in both Winter and Summer Games?

query
def most_diverse_athlete(person) =
participated_in(person, g1) and game(g1, :type, "Winter")
and participated_in(person, g2) and game(g2, :type, "Summer")
from g1, g2

def output = most_diverse_athlete

Relation: output

"Eddy Alvarez"

We get one result back and we see that Eddy Alvarez is our most diverse athlete. He has participated in both Winter and Summer Games: Sochi in 2014 and Tokyo in 2020. Let’s see how!

In the code above, we define a relation, most_diverse_athlete that returns those athletes, if any, who have participated in both Summer and Winter Games. The query is a little more complex than examples we’ve seen so far, so let’s look closely at it. It’s a navigational query, as it traverses a graph, finding the athletes from the connections we specify.

The variable g1 refers to Olympic Games with the property :type = "Winter" and g2 refers similarly to Olympic Games with the property :type = "Summer". We’re asking Rel to find us the athletes who have participated in both g1 and g2 Games – that is, athletes who have participated in both Winter and Summer Games. The from g1, g2 part of the query specifies that we don’t want the specific values of g1 and g2 in the result. For our purposes, it is sufficient that they exist.

By creating and querying our knowledge graph, we have solved the business problem for our fictional airline. They can run their humorous commercial with two popular athletes to show that they are the best airline for year-round travel. Allyson Felix will claim that she is the most well traveled, flying to multiple destinations on many occasions. Eddy Alvarez can poke fun at her accomplishments and suggest she is lazy, since she travels for only for one half of the year while he, like our airline, travels during both the winter and summer seasons. The airline gets its lighthearted campaign, showing that they are the ones to fly with no matter the time of year.

Why Use Knowledge Graphs?

When we build a knowledge graph, by the very nature of stating or drawing the relationships between our data, we enable the answering of complex and powerful questions, and make it easy to add information and expand on our understanding of the relationships between data.

Because complex relationships can be described – nodes that are related to other nodes that themselves are related to other nodes – we’re now able to ask more complicated questions. In visual form such as a diagram, you can use your finger to trace the paths. With Rel, these relations can be written logically in expressions, and we can construct queries that return answers to our questions.

While our examples have represented one “hop” on a knowledge graph, we can start to imagine how discovering relationships through further hops along the graph can be done. In future tutorials we will show that this is possible because Rel allows us to describe queries across multiple hops, and the Rel engine figures out how to traverse the graph. Expressing queries across interesting, complex, knowledge graphs is what Rel does best.

Summary

This tutorial has introduced us to the concept of knowledge graphs, entities, and edges. We have learned how to talk about relationships between pieces of data, and how to express these using Rel. We understand the value of using knowledge graphs to ask complex questions of our data, and learned how to ask questions by writing simple queries with Rel.