REL
REL CONCEPTS
Relational Knowledge Graphs
Elements of an RKG

# Elements of a Relational Knowledge Graph

This concept guide explains what a relational knowledge graph (RKG) is. It also covers the semantic implication of graph components, the steps to build an RKG, and how to run simple queries over it.

## Introduction

A relational knowledge graph (RKG) is a knowledge graph represented in accordance with the relational data model. What does this mean? A relational knowledge graph represents each component of a knowledge graph (node, edge, and hyperedge) in the form of relations. In this way a relational knowledge graph can be seen as a relational database composed of multiple relations, which fulfill specific roles.

To make full use of the relational data model, the relational knowledge graph is stored in a fully normalized form and each data point contains semantic meaning. This data modeling strategy is called Graph Normal Form (GNF). See the Graph Normal Form guide for more details.

This guide focuses on the schema and the semantic meaning of each graph component. Understanding these foundations is essential to successfully building a relational knowledge graph that represents your domain of interest.

This guide also illustrates each graph component discussion by object-role modeling (ORM) diagrams. For more details see Schema Visualization.

Here is an overview of all graph components, their definitions, and how they appear in Rel:

Graph ComponentDefinitionExample
graphA module of relations containing labeled nodes, edges, and hyperedges.graph
nodeA value or identifier representing a concept in the graph. Each node can be referenced by a unary (arity-1) relation named by a node label.NodeLabel(node)
edgeA pair of nodes in the graph. Each edge can be referenced by a binary (arity-2) relation named by an edge label.edge_label(source_node, target_node)
hyperedgeA tuple of three or more nodes in the graph. Each hyperedge can be referenced by a ternary or larger (arity-3+) relation named by a hyperedge label.hyperedge_label(node_1, node_2, node_3)

Those familiar with labeled-property graph (LPG) diagrams will recognize two additional graph component names, “node property” and “edge property.” In Rel, these are just special cases of edges and hyperedges respectively.

LPG ComponentDefinitionExample
node propertyA value node connected to an entity node via an edge. The edge label is the property name. In a Rel graph, property values are nodes in their own right.Name(person, “Alice”)
edge propertyA value connected to two or more nodes via a hyperedge. The hyperedge label is the property name. In a Rel graph, edge property values are nodes in their own right.employs_since(company, person, 2020)

## Graph

A relational knowledge graph is a knowledge graph where each component (node, edge, and hyperedge) of the graph is described by a relation.

In Rel, a relational knowledge graph is defined in the scope of a single module. This module acts as a container that consists of relations that represent the different graph components. The names of the relations act as labels for the graph components, grouping together sets of nodes, edges, or hyperedges.

For example, you can declare a module CompanyGraph as follows:

module CompanyGraph

//Defining nodes
def Person = . . .

//Defining edges
def employs = . . .

//Defining hyperedges
def employs_since = . . .
end

Data defined in modules can then be queried either by adding module_name: before the defined elements needed or by using the with <module> use <relation> syntax. You will find a detailed example of how to populate and query a graph at the end of this guide.

## Nodes

Each node represents a “thing” or “noun” in your schema. A node can represent a physical thing. For example, in retail, a “product” you hold in your hand would be represented as a node. Alternatively, a node may represent an abstract but well-understood concept, such as a “customer” or “supplier”.

Some nodes do not translate to “things” at all. Nonetheless, treating these abstractions as nodes is useful for understanding how your data are connected.

Nodes in a relational knowledge graph are either entity nodes or value nodes. Only entity nodes are required to be referenced with labeled node relations. Value nodes are usually used to represent object properties.

🔎

As a rule of thumb, entity nodes are represented by graph-unique keys, and value nodes contain human-readable data. More information on how to represent data using entity and value types can be found in Things Not Strings in Rel in the Graph Normal Form guide.

### Entity Nodes

The sort of people, objects, or concepts described above are represented as entity nodes.

Defining Entity Nodes

For example, you can declare an entity type Person by specifying the schema of the identifying attributes of a person who is an employee in a company:

entity type Person = String

This specifies that the person’s name (String) identifies them.

Every entity type declaration creates an entity constructor relation ^Person(id..., node) that maps the identifiers, id..., to a unique key, node, which will be used to reference the entity node. The data types of the identifiers are specified in the entity type declaration (see code above). The leading caret ^ (^Person) indicates that it is a constructor.

Populating Entity Nodes

The constructor ^Person can be used to populate the graph with concrete persons. For example:

def Person = {
^Person["Alice Zhao"];
^Person["Bob Yablonsky"];
^Person["Ava Nguyen"]
}

The relation Person now contains the node keys of the three person nodes. Here you can use point-free syntax, where expression ^Person[id...] evaluates to the unique node key.

🔎

Typically, Rel graph nodes are uppercase, and multiword labels are linked directly with no spaces (UpperCamelCase).

What Should an Entity Node Represent?

To decide which aspects of your domain should be modeled as an entity is ultimately a design decision and up to the developer.

Generally speaking, the following concepts are great examples of entity nodes:

• Objects with many properties.
• Abstract concepts that have nontrivial substructures.
• Entities in an Entity Relationship (ER) diagram.
• Primary keys in a relational table.
• Subjects in the RDF triplestore (opens in a new tab).

### Value Nodes

A node that represents data — like a node property value — is represented as a value node.

Defining Value Nodes

For example, you can define a new value type Quarter that represents three-month periods:

value type Quarter = Date, Date

A value type is defined by specifying the schema. Here, Quarter is defined by two Date values: a start date and an end date.

Similar to entity declarations, the value type declarations create a constructor relation ^Quarter(id..., value), whose name starts with a caret, ^. The identifying attributes id... map to the value, value, which captures all the information in id... in one value. For more details, see Value Types.

Populating Value Nodes

Unlike entity nodes, value nodes do not need to be explicitly defined in the graph. It is sufficient to just link to them when creating the edges of the graph. In Edges, you will see how to ensure that nodes have the appropriate value type.

You can also model value nodes in the same way as entity nodes, by defining the relation Quarter:

def Quarter = {
^Quarter[2022-01-01, 2022-03-31];
^Quarter[2022-04-01, 2022-06-30]
}

Here again, the relation ^Quarter can be used to populate the node.

🔎

The name of the value node doesn’t need to match the name of the underlying value type but it is usually recommended if there is no modeling reason not to do so. For instance:

def FiscalQuarter = {
(^Quarter[2022-01-01, 2022-03-31])
}

This would also be a valid value node declaration.

What Should a Value Node Represent?

As with entities, deciding what should be modeled as a value node is ultimately a design decision and up to the developer.

Generally speaking, the following concepts are great examples of value nodes:

• Properties of an object.
• Attributes of an entity in an ER diagram.
• Columns of a relational table that are not primary or foreign keys.
• Objects in the RDF triplestore (opens in a new tab).
• Concepts with very few properties.

## Edges

If nodes are the “things” of your graph, edges describe “relationships” between things. They are the verbs that connect your nouns. Edges connect entity nodes to each other, to value nodes, and can even connect value nodes to each other.

### Edges Between Entity Nodes

You can add an edge between two entity nodes by adding a relation to the graph.

In the company example, you can define the edge employs between the entity nodes Person and Company, and populate it with the node identifiers ^Person and ^Company:

def employs = {
(^Company["RAI"], ^Person["Ava Nguyen"]);
(^Company["RAI"], ^Person["Alice Zhao"]);
(^Company["Microsoft"], ^Person["Alice Zhao"]);
(^Company["Microsoft"], ^Person["Bob Yablonsky"]);
}
🔎

Typically, Rel graph edges are lowercase, and multiword labels are linked with underscores _ (snake_case). Strings can also be used as edge names, for example, :"employed by".

Edge definitions are inherently directional, from source to target. A binary edge may be made undirected by making the underlying relation symmetric. This can be done by using transpose:

def spouse = transpose[spouse]

### Node Properties

Adding an entity node property is equivalent to connecting an entity and a value node via an edge. The process is very similar to connecting two entity nodes. However, instead of the edge’s target node containing an entity node identifier, it contains the property value itself.

Here’s how to populate the property born_on for each Person node:

def born_on = {
(^Person["Alice Zhao"], 1982-03-15);
(^Person["Bob Yablonsky"], 1991-11-22);
(^Person["Ava Nguyen"], 1979-09-09)
}
🔎

For property values, you can safely omit a value type declaration as long as the property name uniquely identifies the data. Value nodes are leaf nodes in the graph.

If the node property has a value type, add the edge just as you would an edge between two entity nodes. For example, you can declare a value type String for the node property Name as such:

value type Name = String

Then you can add a ^Name node property to each entity node ^Person as follows:

def has_name = {
(^Person["Alice Zhao"], ^Name ["Alice Grace Zhao"]);
(^Person["Bob Yablonsky"], ^Name ["Bob Matthew Yablonsky"]);
(^Person["Ava Nguyen"], ^Name ["Ava Marie Nguyen"])
}
🔎

The purple bar in the ORM diagram above indicates a uniqueness constraint. Here the purple bar above the edge has_name means “each Person has at most one Name.”

The purple dot indicates a mandatory role constraint. In this case it means “each Person has at least one Name.”

Combining the constraints means “each Person has exactly one Name.”

### Edges Between Value Nodes

For most applications, source nodes will be entity nodes. However, in some circumstances, it makes sense to link value nodes together. For example, consider the link between a Date value type and a Year value type:

In Rel, you can define this edge as follows:

def year(date, year) {
^Date(year, _, _,date)
}

In the company example, you can connect the properties of dates. For example:

def start_month(quarter_node, month) {
(^Quarter[start_date, _, quarter_node], date_month[start_date, month])
from start_date
}

## Hyperedges

A hyperedge connects three or more nodes — usually multiple entity nodes — with each other. However, any mix of entity or value nodes is possible.

Hyperedges are relations with composite keys that are made up of two or more individual keys. Hyperedge relations — as any relation — may or may not have a value column. If they do, each composite key points to only one value, which is located in the last column in the relation. If the hyperedge relation describes a boolean-like relationship — one that can be answered with true or false — only the composite key is stored in the relation and no value column exists.

One common use case for hyperedges is to store higher dimensional data like embeddings.

### Edge Properties

Just as nodes can have properties, so can edges. Edge properties are represented in Rel using hyperedges.

In the sentence analogy, if the source node is the subject, and the target node is the direct object, an edge property is the indirect object. When speaking about your data in natural language, edge properties are often objects of prepositions like “since,” “on,” “for,” “by,” etc.

In Rel, an edge property is a value node connected to two (or more) entity nodes via a ternary edge.

For example, ”RelationalAI has employed Alice since 2020.” This can be understood as:

1. A binary edge employs connecting the Company entity node for “RelationalAI” and the Person entity node for “Alice.”
2. The ternary edge (hyperedge) employs_since connecting the “RelationalAI” and “Alice” entity nodes to a value node, a Year with the value 2020. The value node is the edge property.

Here is an ORM diagram for this relationship. Note that the ORM diagram can be read across the set of role boxes, just like a sentence: ”Company employs Person since Date.”

Defining and Populating Edge Properties

Hyperedges are defined similarly to binary edges. For the example above, the edge property employs_since can be defined as:

def employs_since = {
(^Company["RAI"], ^Person["Ava Nguyen"], 2019-12-01);
(^Company["RAI"], ^Person["Alice Zhao"], 2020-05-10);
(^Company["Microsoft"], ^Person["Alice Zhao"],2021-08-24);
(^Company["Microsoft"], ^Person["Bob Yablonsky"], 2018-02-28)
}

What Should an Edge Property Represent?

Edge properties can represent a range of information about the graph. They often reflect a condition or degree under which the edge is valid.

The following are good examples of edge properties:

• Reliability (as a weight).
• Information source.
• Time or location.
• Time- or location-dependent datum, as a quaternary (arity-4) hyperedge.

### Optional Edge Properties and Incomplete Data

Note that the hyperedge example employs_since will only connect entities for which there is a value for the edge property. If there is no property value for a particular edge, this relation will not exist. In this example, an employs edge will still connect the nodes. In the vast majority of cases, this is the graph connectivity you want to represent. However, if it is important to the schema to represent that the edge property is missing, you can use Rel’s Missing type.

### Hyperedges of Any Size

Edge properties are just one subset of hyperedges. Rel is agnostic to the dimension of a hyperedge or what combination of entity and value nodes are contained in the hyperedge.

Consider the following example fact: ”Alice met Bob at the Strange Loop conference in 2021.” This fact can be readily modeled as an arity-4 hyperedge met_at_place_in_year:

def met_at_place_in_year(person1, person2, conference, year) {
person1 = ^Person["Alice Zhao"],
person2 = ^Person["Bob Yablonsky"],
conference = ^Conference["Strange Loop"],
year = 2021
}

### Reification

While Rel supports hypergraphs of any dimension, graph traversal algorithms are optimized for binary graphs. Reification is a process by which hyperedges can be transformed into binary relationships. Reifying a hyperedge is a two-step process:

1. Define a new reified entity using the key of the hyperedge relation (every node but the last, target node):

entity type ReifiedEntity = String, Date
def ReifiedEntity(e) {
exists(s, d :
e = ^ReifiedEntity[s, d]
and a_hyperedge(s, d, _)
)
}
2. Connect a new labeled edge reified_edge from the new reified entity ReifiedEntity to the target node t:

def reified_edge(source_node, target_node) {
exists(s, d :
source_node = ^ReifiedEntity[s, d],
target_node = a_hyperedge[s, d]
)
}

New labeled edges can also be defined to connect to each node in the hyperedge key. The example ”Company employs Person since Date” represented below can be reified following the two steps.

1. Define and populate a new node (Employment) that captures the relationship ”Company employs Person“:

entity type Employment = String, String
def Employment(employment) {
exists(company, person :
employment = ^Employment[company, person]
and employment_since(company, person, _)
)
}
2. Connect the reified Employment node to each node (Company, Person, and Date) of the employs_since hyperedge. This requires defining three new labeled edges: has_employer, has_employee, and employment_since. For example:

def has_employer(employment, company) {
exists(person:
employment = ^Employment[company, person]
and employment_since(company, person, _)
)
}

def has_employee(employment, person) {
exists(company:
employment = ^Employment[company, person]
and employment_since(company, person, _)
)
}

def employment_since(employment, dt) {
exists(company, person:
employment = ^Employment[company, person]
and employment_since(company, person, dt)
)
}

Here is the ORM diagram representation of the reification:

The new edges has_employer, has_employee, and employment_since are all binary. The entity node label Employment may not be obvious in the source data, but the construction of hyperedges can point to natural places where new concepts, and new node labels, can be generated.

### Embeddings

Vectors, matrices, or any tensors can be represented as relations with arity 2, 3, or $n+1$, respectively, where $n$ is the rank of the tensor.

The schema of the underlying relation — representing the embedding — is (key_graph..., key_tensor..., value). The graph-related key key_graph... identifies the node or edge associated with the embedding. The tensor indices are captured by key_tensor.... Both together make up the composite key of the relation. Note that both keys may be composite keys themselves, which is indicated by ....

It’s good practice to store embeddings in a separate module, rather than as hyperedges. A single graph may have many sets of embeddings associated with it.

The following example initializes a 10-dimensional vector embedding for the Person entities with zeros:

def mygraph_embedding:Person(person, dimension, value) {
person = mygraph:Person,
dimension = range[1, 10, 1],
value = 0.0
}

Here, the graph-related key is the entity ID person and the tensor index is dimension.

Similarly, a 10x10 matrix embedding can be defined for an edge with uniformly distributed random values between 0 and 1 using the Threefry pseudorandom number generator random_threefry_float64:

def mygraph_embedding:employed_by(person, company, index1, index2, value) {
mygraph:employed_by(person, company),
index1 = range[1, 10, 1],
index2 = range[1, 10, 1],
value = random_threefry_float64[index1, index2]
}

Here, both the graph-related key (person, company) and the matrix indices (index1, index2) are composite keys.

Modeling this way provides great flexibility. Any kind of embedding ranging from a single value, vector, or matrix to an abstract tensor can be realized. Thanks to the high similarities between GNF and sparse matrix representations, embedding in Rel are stored in the spare COO (opens in a new tab) format and don’t have to be dense.

## Building and Querying a Graph

### Building a Graph

With the insights on the elements of a relational knowledge graph you can now build a simple graph and query it. Here is an ORM diagram of the knowledge graph you are about to build:

#### Defining the Schema

The first step is to define the schema. You can define the types of the value and entity nodes as follows:

// model

// entity nodes
entity type Company = String
entity type Person = String

// value nodes
value type Name = String

You can define the schema of the edges as well:

// model

module CompanyGraph

// entity nodes
bound Company = Entity
bound Person = Entity

// node attribute / edge: has_name
bound has_name = Company, Name
bound has_name = Person, Name

// edge: born_on
bound born_on = Person, Date

// edge: employs
bound employs = Company, Person

// edge attribute / hyperedge: employs_since
bound employs_since = Company, Person, Date
end

For more details on the bound syntax, see Bound Declarations in the Rel Reference manual.

#### Populating the Graph

Now that you have defined the schema, you can insert some data into your database.

It is best practice to organize data with modules and group information together. When building a knowledge graph, it makes sense to insert all the data into a module where the module represents the knowledge graph.

Updating the data within a module requires two steps:

1. Defining the data within a temporary module (company_graph).
2. Storing the data in a base relation (CompanyGraph), which persists in the database.
// write query

// defining the data within a temporary company_graph module
module company_graph

// entity node: Company
def Company = {
^Company["RAI"];
^Company["Microsoft"]
}

// entity node: Person
def Person = {
^Person["Alice Zhao"];
^Person["Bob Yablonsky"];
^Person["Ava Nguyen"]
}

// edge: has_name
def has_name = {
(^Company["RAI"], ^Name["RelationalAI"]);
(^Company["Microsoft"], ^Name["Microsoft Corporation"]);
(^Person["Alice Zhao"], ^Name ["Alice Grace Zhao"]);
(^Person["Bob Yablonsky"], ^Name ["Bob Matthew Yablonsky"]);
(^Person["Ava Nguyen"], ^Name ["Ava Marie Nguyen"])
}

// edge: born_on
def born_on = {
(^Person["Alice Zhao"], 1982-03-15);
(^Person["Bob Yablonsky"], 1991-11-22);
(^Person["Ava Nguyen"], 1979-09-09)
}

// edge: employs
def employs = {
(^Company["RAI"], ^Person["Ava Nguyen"]);
(^Company["RAI"], ^Person["Alice Zhao"]);
(^Company["Microsoft"], ^Person["Alice Zhao"]);
(^Company["Microsoft"], ^Person["Bob Yablonsky"]);
}

// hyperedge: employs_since
def employs_since = {
(^Company["RAI"], ^Person["Ava Nguyen"], 2019-12-01);
(^Company["RAI"], ^Person["Alice Zhao"], 2020-05-10);
(^Company["Microsoft"], ^Person["Alice Zhao"],2021-08-24);
(^Company["Microsoft"], ^Person["Bob Yablonsky"], 2018-02-28)
}
end

// storing the data in the CompanyGraph base relation
def insert:CompanyGraph = company_graph


You can find a detailed explanation of the code above in My First Knowledge Graph.

### Querying a Graph

Querying a relational knowledge graph allows you to find specific entities. In the following examples you will see a query based on attributes, an aggregation query, and a query with conditions.

What Company Does Alice Zhao Work For?

// read query

def output(company_name) = {
exists(company :
CompanyGraph:employs(company, ^Person["Alice Zhao"])
and CompanyGraph:has_name(company, company_name)
)
}

The def output statement in the () indicates what elements will be displayed in the output. In this case it is one value: the name of the company fulfilling the statement on the right-hand side of the equal sign. The right-hand side of the definition can be verbalized as “there exists a company such that the company employs Alice Zhao and the company has a name.”

The code and CompanyGraph:has_name(company, company_name) connects the entity hash with the entity name, making the outputs more readable.

🔎

Notice that employs and has_name are both preceded by CompanyGraph:. This indicates that the information will have to be retrieved from the CompanyGraph module.

How Many Employees Work at Each Company?

// read query

def output(company_name, head_count) {
exists(company :
and company_name = CompanyGraph:has_name[company]
)
}

This query uses the count relation.

The output will display two values: the name of the company and its headcount. The right-hand side of the definition can be verbalized as “there exists a company such that the company has a name and the headcount is the number of employs edges for that company.”

🔎

Note that company_name = CompanyGraph:has_name[company] is the equivalent of CompanyGraph:has_name(company, company_name) in the previous query.

Who Was Hired at RelationalAI After 2020?

// read query

with CompanyGraph use Company, Person, employs_since, has_name

def RAI_hires_after_2020(person_name in Name, dt in Date) {
exists(company in Company, person in Person :
employs_since(company, person, dt)
and has_name(company, ^Name["RelationalAI"])
and dt > 2019-12-31
and has_name(person, person_name)
)
}
def output = RAI_hires_after_2020

The statement def RAI_hires_after_2020(person_name in Name, dt in Date) indicates that the definition RAI_hires_after_2020 has a pair of values, person_name and dt, where person_name is of property node Name and dt is of property node Date.

The right-hand side of the definition can be verbalized as “there exists a company and a person such that a company employs a person since a date, the name of the company is RAI, the start date happens after 2019-12-31, the person has a name.”

🔎

Starting the queries with a with <module> use <relation> statement is another way of retrieving the information stored in the example module. In this case, the module is CompanyGraph and the relations are the ones required for the query definition: Company, Person, employs_since, and has_name.

## Summary

A relational knowledge graph represents each component of a knowledge graph (nodes, edges, and hyperedges) in the form of relations. Those relations are defined within modules and the data populating them are stored in based relations, creating a graph.