Graph Schema

This concept guide explains how to define the graph schema of a relational knowledge graph in Rel.

Introduction

A graph schema defines the nodes, edges and hyperedges of a relations, as well as the logical rules applied to them. Defining a schema provides information about the nature of the data and describes how the data is organized. It makes no statement about the size of the data. In particular, it does not indicate the number of occurrences there will be for each category. For example, if your schema has the category Company, it will not give any indication of the amount of companies in your dataset.

A graph schema can be represented by an ORM diagram. Below is an ORM diagram for a Company example. Parts of it are used throughout this guide to illustrate the concepts presented.

💡

You can find the Rel code expressing the complete schema in the example section.

This guide explains the meaning of the elements in the diagram and associates Rel code to each one. See the Schema Visualization guide to learn more about ORM and LPG diagrams.

Defining the schema is a three step process, involving data types, data schema, and semantic constraints.

Step	Rel Code Example	ORM Representation
Data Types	`entity type Node1Label = String bound Node1Label = Entity value type Node2Label = Int`
Data Schema	`bound edge_label = Node1Label, Node2Label`
Semantic Constraints	`// Functional dependency constraint ic Unique_Node2(nd1, nd2A, nd2B) { node2label(nd1, nd2A) and node2label(nd1, nd2B) implies nd2A = nd2B }`

Step

Rel Code Example

ORM Representation

Data Types

entity type Node1Label = String
bound Node1Label = Entity
value type Node2Label = Int

Data Schema

bound edge_label = Node1Label, Node2Label

Semantic Constraints

 
// Functional dependency constraint
ic Unique_Node2(nd1, nd2A, nd2B) {
  node2label(nd1, nd2A)
  and node2label(nd1, nd2B)
  implies nd2A = nd2B
}

ORM graph schema - step1 - uniqueness constraint

Defining the Data Types

The first step in defining graph schema is to write the definitions of the data types for each node of the relational knowledge graph.

The following example shows how to do this in Rel:

// Define an entity node.
entity type Node1Label = String
bound Node1Label = Entity
 
// Define a value node.
value type Node2Label = Int

Defining the nodes requires entity type and value type declarations. It is preferable to introduce a bound declaration for a relation (here Node1Label) that matches the entity name. The purpose of this relation is to hold all the entities that have this label.

🔎

The entity type declaration defines an entity label. This declaration does not introduce a new data type. All entities are represented by a hash/UUID value that has the data type Entity.

In the example above, the identifying property of the Node1Label entities is a String, and the corresponding entity node (defined by the bound declaration) holds concrete instances of this entity type.

For the value node Node2Label, the value type declaration is sufficient. It creates the constructor ^Node2Label and the corresponding type relation Node2Label, which can also act as the value node. No bound declaration is needed. Unlike entity types, each value type is a separate data type. See the Rel reference for details.

Both entity and value type declarations create a constructor relation, which starts with a caret ^, that will be used to populate the graph.

🔎

It is advisable to use UpperCamelCase to name entity and value types.

How to designate entity and value nodes is a decision left to the developer. More information on how to represent data using entity and value types can be found in the Graph Normal Form guide.

Following the Company example, you can express definitions of the data types for the Person and Name nodes in Rel like this:

// Entity node: `Person`
entity type Person = String
bound Person = Entity
 
// Value node: `Name`
value type Name = String

In ORM, entity and value nodes are represented as rounded boxes, distinguished by solid and dashed borders respectively.

Defining the Data Schema

The second step is to define the data schema of the relational knowledge graph. This will specify the data types involved in the various edge and hyperedge relations.

You can do this in Rel as follows:

bound edge_label = Node1Label, Node2Label

The data schemas of the relations in the relational knowledge graph are defined by bound declarations. Bound declarations apply to relations specified on the left-hand side (here edge_label). The right-hand side indicates the required data schema for the relation (here (Node1Label, Node2Label)).

Bound declarations generate integrity constraints (ICs), which in the case of schema rules are type constraints. Integrity constraints ensure that the data populating the graph are clean and comply with the rules stated in the schema.

🔎

It is advisable to use lowercase to name edges. Multiword labels can be linked with underscores _ (snake_case).

In the Company example, you can define the schema for the edge has_name, which describes the Name property of the entity Person, as follows:

// Edge: `has_name`
bound has_name = Person, Name

The bound declaration requires any relation with the name has_name to have arity 2 and contain only tuples whose first element is Person and the second is Name.

Note that in Rel the schema of hyperedges is defined straightforwardly by stating the nodes involved:

from ::std::datetime import Date
 
// Edge: `employs_since`
bound employs_since = Company, Person, Date

It is advantageous to represent names as values of a special value type Name and not simply as Strings. This ensures that the property refers only to a name, and not to something else that is also represented by a String.

In ORM, edges are represented as a series of role boxes that connect two or more nodes. The number of role boxes in the diagram is the same as the arity of the edge relation. Defining schema rules is equivalent to adding role boxes between nodes.

Expressing Semantic Constraints

The third step when defining a graph schema is to express semantic constraints. You can do this by declaring integrity constraint (ICs). Here are a few commonly used constraints between nodes and edges.

In ORM, semantic constraints are represented by purple icons.

Constraints Between Nodes

Some semantic constraints, such as mandatory role, functional dependency, and subtype constraints, apply to the connection between nodes.

Mandatory Role Constraints

You can express mandatory role constraints in Rel as follows:

ic a_must_have_b(a) {
    A(a)
    implies
    exists(b: B(b) and edge_x(a, b))
}

🔎

This IC means ”A has at least one B“.

Those nodes are linked by an edge in edge_x.

Following the Company example, you can express the mandatory role constraint “A Person has at least one Name” in Rel like this:

ic person_must_have_name(pers) {
    Person(pers)
    implies
    exists(name: Name(name) and has_name(pers, name))
}

In ORM, a mandatory role constraint is illustrated by a large dot on the role box connector.

Functional Dependency Constraints

A functional dependency constraint is a uniqueness constraint that expresses a many-to-one connection. Here is how to express it in Rel:

ic A_has_one_B(a, b1, b2) {
    edge_x(a, b1) and edge_x(a, b2)
    implies
    b1 = b2
}

🔎

This IC means ”A has at most one B“.

It is also possible the write this IC using the built-in relation function:

ic edge_x_is_a_function {
    function(edge_x)
}

This will check that the initial element A of the tuple in the relation edge_x maps to only one value for each B.

Following the Company example, you can express the mandatory role constraint “A Person has at most one Name” in Rel like this:

ic person_has_one_name(pers, name1, name2) {
    has_name(pers, name1) and has_name(pers, name2)
    implies
    name1 = name2
}

🔎

Combining a mandatory and a functional dependency constraint expresses that a Person has exactly one Name.

In ORM, a functional dependency constraint is illustrated by a purple bar above the first role box.

ORM graph schema - uniqueness constraint

Exclusive-or Subtype Constraints

You can use the keyword xor to express exclusive or subtype constraints in Rel, as shown in the following example:

ic xor_subtype(x) {
    Supertype(x)
    implies
    Subtype1(x) xor Subtype2(x)
}

🔎

This IC means ”Subtype1 and Subtype2 are mutually exclusive”.

Following the Company example, you can express the exclusive or subtype constraint ”Team and Department are mutually exclusive” like this:

ic xor_suborg(x) {
    Suborganization(x)
    implies
    Department(x) xor Team(x)
}

In ORM, an exclusive or subtype constraint is illustrated by a dot over an “X” in a large circle.

💡

The subtype symbol above is linked to purple arrows. A purple arrow links two nodes that have a subtype relation.

Constraints Between Edges

Constraints can also involve multiple edge relations, allowing you to express more complex dependencies. Common examples are value-comparison and subset constraints, which are discussed in more detail below.

Value-Comparison Constraints

The following example shows how to express value-comparison constraints in Rel:

ic edge_x_sup_edge_y(a, b1, b2) {
    edge_x(a, b1) and edge_y(a, b2)
    implies
    b1 > b2
}

🔎

This IC means “the value of B for edge_x is superior to the value of B for edge_y“.

Following the Company example, you can express the value-comparison constraint “A Person must be employed_since a date that is more recent than the date they were born, born_on” in Rel as follows:

ic employs_since_sup_born_on(pers, dt1, dt2) {
    employs_since(pers, dt1) and born_on(pers, dt2)
    implies
    dt1 > dt2
}

In ORM, a value-comparison constraint is illustrated by a comparison operator in a large circle.

ORM graph schema - comparison constraint

💡

The value-comparison arrow starts and points to role boxes on the same side of the edges the constraint imposes on.

Subset Constraints

The following example shows how to express subset constraints in Rel:

ic edge_y_subset_edge_x(a, b) {
    edge_y(a, b)
    implies edge_x(a, b)
}

🔎

This IC means “every pair (a, b) in edge_y is also a pair in edge_x“.

Following the Company example, you can express the subset constraint “A Person who has been employed_since a Date at a Company is a Person employed at that Company” in Rel like this:

ic emp_since_sub_employs(comp, pers) {
  employs_since(comp, pers, _)
  implies employs(comp, pers)
}

The underscore (_) indicates that the last element of the tuple for employs_since may contain any value. The expression inside the braces effectively means “the set of pairs from the first two columns of employs_since is a subset of the set of pairs in employs.”

A subset constraint is symbolized by a ⊆ symbol in a large circle.

💡

The subset arrows goes from and to a junction of two role boxes. This shows that the constraint is on that pair of nodes.

Example

The ORM diagram below represents the complete schema for the Company example.

In order to express this schema in Rel, follow the steps presented at the beginning of this guide. The schema is expressed within a module called CompanyGraph.

First you define the data types for the value node Name and the entity nodes Company, Person, Suborganization, Department, and Team. Note that the value node Date is not defined at this stage. This is because Date is a Rel data type.

Next you define the data schema for the hyperedge employs_since and the edges has_name, born_on, is_member_of, employs, and has_suborg.

Finally, you express the semantic constraints between nodes and between edges.

// model
 
module CompanyGraph
    from ::std::datetime import Date
 
    // STEP 1 - Data types of nodes
    // Entity nodes
    entity type Company = String
    bound Company = Entity
 
    entity type Person = String
    bound Person = Entity
 
    entity type Suborganization = String
    bound Suborganization = Entity
 
    entity type Department = String
    bound Department = Entity
 
    entity type Team = String
    bound Team = Entity
 
    // Value nodes
    value type Name = String
 
    // STEP 2 - Data schema of edges and hyperedges
    // Edge / Node attribute
    bound has_name = Person, Name
 
    bound born_on = Person, Date
 
    bound is_member_of = Person, Suborganization
 
    bound employs = Company, Person
 
    bound has_suborg = Company, Suborganization
 
    // Hyperedge / Edge attribute
    bound employs_since = Company, Person, Date
 
    // STEP 3 - Semantic constraints
    // Mandatory constraints
    ic company_must_have_person(comp) {
        Company(comp)
        implies
        exists(pers: Person(pers) and employs(comp, pers))
    }
 
    ic person_must_have_name(pers) {
        Person(pers)
        implies
        exists(name: Name(name) and has_name(pers, name))
    }
 
    // Functional dependency constraints
    ic person_has_one_name(pers, name1, name2) {
        has_name(pers, name1) and has_name(pers, name2)
        implies
        name1 = name2
    }
 
    ic person_has_one_DOB(pers, dt1, dt2) {
        born_on(pers, dt1) and born_on(pers, dt2)
        implies
        dt1 = dt2
    }
 
    ic company_person_has_one_hiring_date(comp, pers, dt1, dt2) {
        employs_since(comp, pers, dt1) and employs_since(comp, pers, dt2)
        implies
        dt1 = dt2
    }
 
    // Exclusive or subtype constraint
    ic team_dept_exclor(x) {
       Suborganization(x)
       implies
       Team(x) xor Department(x)
    }
 
    // Value-comparisons constraint
    ic employs_since_sup_born_on(comp, pers, dt1, dt2) {
        employs_since(comp, pers, dt1) and born_on(pers, dt2)
        implies
        dt1 > dt2
    }
 
    // Subset constraint
    ic emp_since_sub_employs(comp, pers) {
        employs_since(comp, pers, _)
        implies employs(comp, pers)
    }
 
end

Summary

Defining the schema of a relational knowledge graph involves three steps:

Define the data types for the entity and value nodes of the relational knowledge graph.
Define the schema rules creating the edges and hyperedges of the relational knowledge graph.
Express logical constraints between nodes and between edges.

Graph Schema

Introduction

Defining the Data Types

Defining the Data Schema

Expressing Semantic Constraints

Constraints Between Nodes

Mandatory Role Constraints

Functional Dependency Constraints

Exclusive-or Subtype Constraints

Constraints Between Edges

Value-Comparison Constraints

Subset Constraints

Example

Summary

See Also