The Lehigh University Benchmark

This how-to guide demonstrates how to express reasoning using the Lehigh University Benchmark (LUBM).

Goal

The Lehigh University Benchmark (LUBM) (opens in a new tab) is a popular benchmark in the Web Ontology Language (OWL) domain designed to test the reasoning capabilities of a system. In this guide, you’ll learn how to express OWL constructs such as classes (opens in a new tab) and properties (opens in a new tab), as well as advanced concepts like hierachies and equivalent classes, in Rel.

You’ll build a knowledge graph of the data in the LUBM and use Rel to:

Express concepts from the data as entity types and value types.
Model properties and hierarchies as relations in Graph Normal Form (GNF).
Express prescriptive reasoning concepts like equivalent classes or transitive properties.
Translate LUBM test queries from SPARQL to Rel.
Validate the query results using integrity constraints.

The Dataset

The LUBM defines a knowledge graph on synthetic data representing universities and their constituents, including students, faculty members, organizations, and departments.

For example:

Every university in the LUBM knowledge graph consists of 15 to 25 departments.
Each department has 7 to 10 full professors, one of which is the head of the department, 10 to 14 associate professors, and 8 to 11 assistant professors.
Every faculty member teaches one or two courses
Every department has three to four graduate students and 8 to 14 undergraduate students per faculty member.

The LUBM knowledge graph defines an ontology (opens in a new tab) that describes the various entities in the data and how they relate to one another. The benchmark tests a system’s ability to express hierarchies of entities, transitive and inverse properties, and other ontological concepts. A detailed profile of the LUBM data is available here (opens in a new tab).

🔎

Note: In this how-to guide, you’ll work with data from just one of the universities in the LUBM dataset.

There are nine CSV files in the LUBM dataset:

University.csv.
Department.csv.
Faculty.csv.
ResearchGroup.csv.
Publication.csv.
Course.csv.
Graduatestudent.csv.
Undergraduatestudent.csv.
edges.csv.

The data for each entity class are stored in separate CSV files, such as Faculty.csv or Course.csv. Columns in the CSV files represent entity properties, such as a faculty member’s name or a student’s email address. Each CSV file for an entity class contains an id column. Entities in the LUBM dataset are identified with Uniform Resource Identifiers (URIs) (opens in a new tab). The edges.csv file stores relationships between entities in the knowledge graph.

In the next section, you’ll import these CSV files into a RAI database.

Data Import

The following code declares a module called configs that contains configuration relations for each of the nine CSV files in the LUBM dataset:

// write query
 
def path = "s3://relationalai-documentation-public/lubm/sf0.1/"
 
module configs
    def edges:path = concat[path, "edges.csv"]
 
    def university:path = concat[path, "University.csv"]
    def university:syntax:header {
        (1, :id); (2, :name); (3, :type); (4, :uri)
    }
 
    def department:path = concat[path, "Department.csv"]
    def department:syntax:header {
        (1, :id); (2, :name); (3, :type); (4, :uri)
    }
 
    def faculty:path = concat[path, "Faculty.csv"]
    def faculty:syntax:header {
        (1, :id); (2, :name); (3, :email); (4, :type);
        (5, :telephone); (6, :uri); (7, :research)
    }
 
    def researchgroup:path = concat[path, "ResearchGroup.csv"]
    def researchgroup:syntax:header {
        (1, :id); (2, :type); (3, :uri)
    }
 
    def publication:path = concat[path, "Publication.csv"]
    def publication:syntax:header {
        (1, :id); (2, :name); (3, :type); (4, :uri)
    }
 
    def course:path = concat[path, "Course.csv"]
    def course:syntax:header {
        (1, :id); (2, :name); (3, :type); (4, :uri)
    }
 
    def graduate_student:path = concat[path, "Graduatestudent.csv"]
    def graduate_student:syntax:header {
        (1, :id); (2, :research_assistant); (3, :name);
        (4, :email); (5, :type); (6, :telephone); (7, :uri)
    }
    def graduate_student:schema = (:research_assistant, "boolean")
 
    def undergraduate_student:path = concat[path, "Undergraduatestudent.csv"]
    def undergraduate_student:syntax:header {
        (1, :id); (2, :research_assistant); (3, :name);
        (4, :email); (5, :type); (6, :telephone); (7, :uri)
    }
    def undergraduate_student:schema = (:research_assistant, "boolean")
end
 
def insert:lubm_csv[config] = load_csv[configs[config]]

Each configuration relation in the configs module defines the path to and the header information for each CSV file in the LUBM dataset. For instance, the configs:undergraduate_student relation sets the path to the Undergraduatestudent.csv file, defines the order and names of seven fields in the CSV file header, and sets the datatype for the :research_assistant field to "boolean".

The load_csv[] relation loads the CSV data, and insert stores the data in a base relation called lubm_csv. This relation is in GNF: Values are indexed by the name, column, and row number of their corresponding CSV file.

For example, here are the first three rows of CSV data for the University class as they appear in the lubm_csv:university relation:

// model
 
@inline
def top_csv[n, CSV](col, row, val) {
    CSV(col, row, val)
    and top[n, second[CSV]](_, row)
}

// read query
 
def output = top_csv[3, lubm_csv:university]

🔎

Note: The top_csv relation is installed and persisted in the database so that it can be reused in subsequent queries.

The relation top_csv mimics top in the Standard Library. Instead of returning the top n tuples, however, top_csv returns all tuples corresponding to the top n rows of the CSV file.

The table structure of the CSV file can be viewed using the table relation:

// read query
 
def output = ::std::display::table[top_csv[3, lubm_csv:university]]

Now that you’ve imported the LUBM data, the next step is to represent the LUBM ontology as a knowledge graph.

Ontology

OWL ontologies have a universal class (opens in a new tab) called Thing to which all entities belong. You can encode this in Rel by defining an entity type called Thing inside of a module called LUBM:

// model
 
module LUBM
    entity type Thing = String
end

Instances of Thing are constructed via strings. Entities in the LUBM are identified with URIs, so it makes sense to use the URI string. For instance, ^Thing["http://www.University0.edu"] creates a Thing instance representing the university with URI http://www.University0.edu.

⚠

Note: In this guide, you’ll add more relations to the LUBM module with multiple module declarations. This helps facilitate discussion of the code. In practice, however, you would typically write the LUBM module in a single declaration.

In order to model the LUBM ontology in Rel, you need to transform the lubm_csv data into relations involving Thing instances.

Data Transformation

In this section, you’ll write a template module called Transform that transforms the data in the lubm_csv base relation into relations involving entity and value types.

Entities

The Transform:filename_to_entity relation builds Thing instances from the URI strings in the id column for a given CSV file:

// model
 
@outline
module Transform[CSV]
    def filename_to_entity(csv_name, e) {
        e = LUBM:^Thing[CSV[csv_name, :id, _]]
    }
end

⚠

Important: The Transform module is parameterized so that it can be used on any relation with a structure similar to lubm_csv. This means that the Transform module must be annotated with the @outline annotation.

The relation filename_to_entity relates entities to the name of the CSV file that their URI came from. In other words, filename_to_entity:university is the set of all entities from the University.csv file:

// read query
 
@inline
def T = Transform[lubm_csv]
 
def output = top[5, T:filename_to_entity:university]

You can use filename_to_entity to create a Thing entity relation containing the entity hashes for every URI in the LUBM dataset:

// model
 
module LUBM
    with Transform[lubm_csv] use filename_to_entity
 
    def Thing(e) { filename_to_entity(_, e) }
end

Next, you’ll transform the values from the columns of each CSV file into value types and assign them to the right entity.

Properties

Each entity corresponds to a row of data in one of the LUBM CSV files. Each column in the row represents a property of that entity. For example, University entities have :id, :name, :type, and :uri properties.

The Transform:assign_property relation assigns entities to values for their properties:

// model
 
@outline
module Transform[CSV]
    @ondemand
    def assign_property[property_name, PropertyType](e, v) {
        exists(csv_name, property_val, filepos, id :
            CSV(csv_name, property_name, filepos, property_val)
            and CSV(csv_name, :id, filepos, id)
            and LUBM:^Thing(id, e)
            and PropertyType(property_val, v)
        )
    }
end

This property assignment not only associates a property value with its entity, but also gives the property value a semantic meaning. It does so by first converting the property value into a value type via v = PropertyType[property_val]. Later in this guide, you’ll define value types for the various properties that entities can have.

Here’s an example of how to use assign_property to give the URI values semantic meaning and assign them to their corresponding entities:

// read query
 
@inline
def T = Transform[lubm_csv]
 
value type Uri = String
def has_uri = T:assign_property[:uri, ^Uri]
 
def output = top[5, has_uri]

The relation assign_property is useful for transforming string- and numeric-valued properties into value types, but it isn’t ideal for properties with boolean values.

Boolean Properties

The :research_assistant property is boolean. It can either have the value boolean_true or boolean_false. It’s idiomatic in Rel to store only facts that are true, rather than define a boolean value type and track true and false values.

To facilitate this, the Transform:assign_boolean_property relation contains the entity keys only for entities that have the property with the name property_name:

// model
 
@outline
module Transform[CSV]
    @ondemand
    def assign_boolean_property(property_name, e) {
        exists(csv_name, filepos, id:
            CSV(csv_name, property_name, filepos, boolean_true)
            and CSV(csv_name, :id, filepos, id)
            and LUBM:^Thing(id, e)
        )
    }
end

For example, the following query displays the hashes of five entities with the :research_assistant property:

// read query
 
@inline
def T = Transform[lubm_csv]
 
def output = top[5, T:assign_boolean_property:research_assistant]

Now that you have relations for transforming data into entity and value types, you can write a relation to link data together with edges.

Edges

The final relation to install in the Transform module transforms the data in the edges CSV file into edges in the LUBM knowledge graph:

// model
 
@outline
module Transform[CSV]
    def make_edge[edge_name](e_source, e_target) {
        exists(source_id, target_id, filepos:
            CSV:edges(:TYPE, filepos, edge_name)
            and CSV:edges(:SOURCE, filepos, source_id)
            and CSV:edges(:TARGET, filepos, target_id)
            and LUBM:^Thing(source_id, e_source)
            and LUBM:^Thing(target_id, e_target)
        )
    }
end

Edges in the lubm_csv relation are defined with two tuples:

(:SOURCE, filepos, source_id) tuples define the source ID of an edge. filepos is the row number in the CSV file and source_id is the URI of the source entity.
(:TARGET, filepos, target_id) tuples define the target ID for an edge.

A :SOURCE tuple and a :TARGET tuple with the same filepos describe the same edge, which points from the source entity towards the target entity.

Edges have names. For example, "takesCourse" edges relate student entities to course entities for courses in which the student is enrolled:

// read query
 
@inline
def T = Transform[lubm_csv]
 
def output = top[5, T:make_edge["takesCourse"]]

In Edge Definitions, you’ll use make_edges to link the LUBM data together. But first, you must define the various entities and properties that make up the nodes of the LUBM knowledge graph.

Entity Definitions

Everything to do with the LUBM knowledge graph is declared in a module called LUBM. You can start by defining entity relations for each of the entity classes in the LUBM dataset:

// model
 
module LUBM
    with Transform[lubm_csv] use filename_to_entity
 
    def GraduateStudent = filename_to_entity[:graduate_student]
    def UndergraduateStudent = filename_to_entity[:undergraduate_student]
    def Course = filename_to_entity[:course]
    def University = filename_to_entity[:university]
    def Department = filename_to_entity[:department]
    def Publication = filename_to_entity[:publication]
    def Faculty = filename_to_entity[:faculty]
    def ResearchGroup = filename_to_entity[:researchgroup]
end

Every entity relation in LUBM is a subset of Thing:

// read query
 
def output = subset[LUBM:GraduateStudent, LUBM:Thing]

Each entity relation also represents a distinct class of entities. In other words, the entity relations in LUBM are disjoint:

// read query
 
def output = disjoint[LUBM:GraduateStudent, LUBM:UndergraduateStudent]

The next step is to pair entities in LUBM with their properties.

Property Definitions

In Data Import you defined a configs module with the header syntax for each CSV file in the LUBM dataset.

There are eight different properties that entities may have:

:id.
:type.
:uri.
:name.
:email.
:telephone.
:research.
:research_assistant.

Add seven value types to the LUBM module to represent all of the properties, except :research_assistant. You can also define seven property relations that pair entity hashes with the values of their properties:

// model
 
module LUBM
    with Transform[lubm_csv] use assign_property
 
    value type Id = String
    value type Type = String
    value type Uri = String
    value type Name = String
    value type Email = String
    value type Telephone = String
    value type Research = String
 
    def has_id = assign_property[:id, ^Id]
    def has_type = assign_property[:type, ^Type]
    def has_uri = assign_property[:uri, ^Uri]
    def has_name = assign_property[:name, ^Name]
    def has_email = assign_property[:email, ^Email]
    def has_telephone = assign_property[:telephone, ^Telephone]
    def researches = assign_property[:research, ^Research]
end

Here, :research_assistant is a boolean property, so you can use the boolean_property relation to get all of the entity hashes for which the property is boolean_true:

// model
 
module LUBM
    with Transform[lubm_csv] use assign_boolean_property
 
    def ResearchAssistant = assign_boolean_property[:research_assistant]
end

Just like the LUBM entity relations, ResearchAssistant is a subset of Thing:

// read query
 
def output = subset[LUBM:ResearchAssistant, LUBM:Thing]

However, ResearchAssistant isn’t disjoint from all of the other entity relations. For example, some GraduateStudent entities are also members of ResearchAssistant:

// read query
 
def grad_research_assistants = intersect[LUBM:GraduateStudent, LUBM:ResearchAssistant]
def output = top[5, grad_research_assistants]

You’ll add more hierachical relationships between entities later in this guide.

Edge Definitions

You can use the Transform:make_edge relation to define edges in the LUBM graph:

// model
 
module LUBM
    with Transform[lubm_csv] use make_edge
 
    def takes_course = make_edge["takesCourse"]
    def member_of = make_edge["memberOf"]
    def sub_organization_of = make_edge["subOrganizationOf"]
    def undergraduate_degree_from = make_edge["undergraduateDegreeFrom"]
    def graduate_degree_from = make_edge["mastersDegreeFrom"]
    def phd_degree_from = make_edge["doctoralDegreeFrom"]
    def publication_author = make_edge["publicationAuthor"]
    def works_for = make_edge["worksFor"]
    def advisor = make_edge["advisor"]
    def head_of = make_edge["headOf"]
    def teacher_of = make_edge["teacherOf"]
end

Each edge relation is a binary relation named after the type of edge in the LUBM dataset. For example, the takes_course relation contains takesCourse edges of the form (e1, e2) where e1 is an entity hash representing a student who takes the course represented by the e2 hash:

// read query
 
def output = top[5, LUBM:takes_course]

Entity hashes aren’t very useful to humans. In the next section, you’ll see how to display user-friendly strings for entities and value types.

Displaying Entities

When you inspect query results, it’s more useful to see string identifiers for entities, rather than their hash. To that end, you can define a show relation in the LUBM module for displaying entity and value types:

// model
 
module LUBM
    def show[e in LUBM:Thing](str) { LUBM:has_uri(e, LUBM:^Uri[str]) }
    def show[name in LUBM:Name](str) { LUBM:^Name(str, name) }
    def show[email in LUBM:Email](str) { LUBM:^Email(str, email) }
    def show[tel in LUBM:Telephone](str) { LUBM:^Telephone(str, tel) }
end

Here, show is split into four definitions. The first maps an entity hash in LUBM:Thing to its URI in the LUBM dataset. The remaining definitions for show map Name, Email, and Telephone values to the strings used to construct them.

🔎

Note: There are seven value types defined in the LUBM module, but only three of them — Name, Email, and Telephone — are handled by the show relation.

You may want to write a show definition for every entity and value type in your model. In this guide, show is only defined for the value types used in the LUBM test queries.

For example, the following query displays the email addresses of five entities:

// read query
 
def output(uri, email) {
    exists(e, v:
        top[5, LUBM:has_email](_, e, v)
        and LUBM:show(e, uri)
        and LUBM:show(v, email)
    )
}

Now that all of the entities, properties, and edges have been implemented, you can apply the rules related to the reasoning capabilities that the LUBM tests.

Reasoning

In OWL, reasoning is the process of inferring new facts about an individual based on the ontology and the data. One of the main goals of the LUBM benchmark is to test the reasoning capability of the underlying system.

This section shows how some of the main reasoning requirements of the LUBM ontology can be realized in Rel. In particular, it covers:

Entity Hierarchies

Every entity in the LUBM is a member of Thing. In other words, classes of entities like UndergraduateStudent and Course are all subentities of Thing.

There are many such hierarchies in the LUBM ontology. For example, entities in the Faculty class have a :type property with four possible values:

// read query
 
def output = LUBM:Faculty . LUBM:has_type

You can define new subentities based on these types:

// model
 
module LUBM
    def AssistantProfessor(e) { Faculty(e) and has_type(e, ^Type["AssistantProfessor"]) }
    def AssociateProfessor(e) { Faculty(e) and has_type(e, ^Type["AssociateProfessor"]) }
    def FullProfessor(e) { Faculty(e) and has_type(e, ^Type["FullProfessor"]) }
    def Lecturer(e) { Faculty(e) and has_type(e, ^Type["Lecturer"]) }
end

Here are the URIs of some AssistantProfessor entities:

// read query
 
def output(uri) {
    exists(e:
        top[5, LUBM:AssistantProfessor](_, e)
        and uri = LUBM:show[e]
    )
}

The following are other hierarchies from the LUBM ontology that need to be modeled:

University, Department, and ResearchGroup are subentities Organization.
Student and Faculty are subentities of Person.
AssistantProfessor, AssociateProfessor, and FullProfessor are subentities of Professor.

In Rel, you can model these three hierarchies as the union of previously defined entity relations:

// model
 
module LUBM
    def Organization = University; Department; ResearchGroup
    def Person = GraduateStudent; UndergraduateStudent; Faculty
    def Professor = AssistantProfessor; AssociateProfessor; FullProfessor
end

🔎

Here, ; acts as the union operator. For more information about ; see Semicolon in the Rel Reference manual.

Organization is defined as the union of University, Department, and ResearchGroup. It contains all the universities, departments, and research groups defined earlier. Person and Professor are defined similarly.

Equivalent Classes

In OWL, some classes are defined with necessary and sufficient conditions. These classes are called equivalent classes. In the LUBM ontology, there are two equivalent classes needed for answering the LUBM queries:

Every UndergraduateStudent is a Student, and any Person who takes_course is also a Student.
A Chair is any Person who is the head_of a Department.

You can model the Student and Chair classes in Rel as follows:

// model
 
module LUBM
    def Student(e) {
        UndergraduateStudent(e)
        or (Person(e) and takes_course(e, _))
    }
 
    def Chair(e) { Person(e) and head_of(e, _) }
end

Here are the URIs of five entities in the Chair class:

// read query
 
def output(uri) {
    exists(e:
        top[5, LUBM:Chair](_, e)
        and LUBM:show(e, uri)
    )
}

Edge Hierarchies

Just like entity relations, edge relations may have a hierarchy. For instance, the LUBM ontology defines a superproperty called degree_from that contains all of the undergratuate_degree_from, graduate_degree_from, and phd_degree_from edges.

You can model this in Rel as the union of the three edge relations:

// model
 
module LUBM
    def degree_from {
        undergraduate_degree_from;
        graduate_degree_from;
        phd_degree_from
    }
end

The LUBM ontology also defines some subproperties. The two subproperties needed for the LUBM test queries are:

works_for, which is a subproperty of member_of.
head_of, which is a subproperty of works_for.

This makes sense, because a person who works_for an organization is also a member_of that organization, and the person who is the head_of an organization must also work_for that organization.

Here’s how to model this in Rel:

// model
 
module LUBM
    def member_of = works_for
    def works_for = head_of
end

⚠

Important: In Rel, = doesn’t work as an assignment operator like it does in many other languages. In this context, def member_of = works_for states that member_of contains works_for, but doesn’t guarantee the converse. The same observation applies to def works_for = head_of. See Multiple Definitions in the Rel Reference manual for more details.

In Data Consistency, you’ll add integrity constraints that enforce this hierarchy on any new data.

Transitive Properties

If organization A is a sub_organization_of organization B, and organization B is a sub_organization_of organization C, then organization A should also be a sub_organization_of organization C. That is, the sub_organization_of relation should be transitive, and indeed the LUBM ontology requires this.

While the CSV data do contain sub_organization_of, they don’t contain all of the edges required for sub_organization_of to be transitive. You can see this when you check the number of tuples in sub_organization_of:

// read query
 
def output = count[LUBM:sub_organization_of]

You can infer which tuples are missing and add them to sub_organization_of using Rel’s dot join operator:

// model
 
module LUBM
    def sub_organization_of(x, y) {
        (sub_organization_of.sub_organization_of)(x, y)
    }
end

Checking the count of tuples in sub_organization_of again reveals that the number of tuples has nearly doubled:

// read query
 
def output = count[LUBM:sub_organization_of]

Inverse Properties

The LUBM ontology defines a property has_alumnus that links University entities with Person entities that earned a degree from that University.

The relation has_alumnus isn’t one of the edge types in the edges.csv file you imported earlier. You must infer it from the degree_from property described in the data. In particular, has_alumnus is the inverse of degree_from. That is, if (person, university) is a tuple in degree_from, then (university, person) is a tuple in has_alumnus.

You can accomplish this in Rel using the built-in transpose relation:

// model
 
module LUBM
    def has_alumnus = transpose[degree_from]
end

Data Consistency

The rules specified in the LUBM module describe the LUBM ontology, but there’s nothing in place to ensure that future changes to the data maintain this description. You can enforce the ontology on all database transactions by installing integrity constraints in the database.

⚠

Warning: You can install integrity constraints in the LUBM because it’s not parameterized. Integrity constraints in parameterized modules are currently not supported.

For example, the following integrity constraint ensures that the degree_from relation maps Person entities to University entities:

// model
 
module LUBM
    ic degree_from_type(e1, e2) {
        degree_from(e1, e2)
        implies
        Person(e1) and University(e2)
    }
end

Here, degree_from_type checks that all pairs (e1, e2) in the degree_from relation contain a Person entity in the first position and a University entity in the second position. However, this integrity constraint doesn’t ensure that degree_from only contains tuples of length two.

🔎

Note: To learn more about writing integrity constraints, including how to enforce a relation’s arity, see the Integrity Constraints concept guide.

You can also write an integrity constraint to ensure that has_alumnus is always the inverse of degree_from:

// model
 
module LUBM
    ic degree_from_inverse(person, university) {
        degree_from(person, university)
        iff
        has_alumnus(university, person)
    }
end

The keyword iff is short for “if and only if.” It’s used here to check that for every pair in degree_from the inverse pair exists in has_alumnus and vice versa.

You can also ensure that head_of is a subset of works_for:

// model
 
module LUBM
    ic head_works_for(person, organization) {
        head_of(person, organization)
        implies
        works_for(person, organization)
    }
end

The three integrity constraints described here illustrate how to enforce rules on every transaction. They do not completely enforce the LUBM ontology. With the right integrity constraints, though, you can guarantee that every transaction maintains the structure described by the LUBM ontology.

Test Queries

The original 14 LUBM test queries are specified in SPARQL. In this section, you’ll write the test queries in Rel.

Each query is presented with its original explanation quoted from the LUBM SPARQL queries (opens in a new tab) document. The original SPARQL query is shown here so that it can be compared to Rel.

Query 1 presents two ways to translate the SPARQL query into Rel. The first shows how to translate the SPARQL query line-by-line into Rel, and the second shows how to write the Rel query more idiomatically. The remaining queries present only the idiomatic translation.

Additionally, each query is visualized as a subgraph in the LUBM knowledge graph. The diagram for each query is adapted from the diagrams provided by the authors (opens in a new tab) of the LUBM.

Query 1

This query bears large input and high selectivity. It queries just one class and one property and does not assume any hierarchy information or inference.

Query 1 retrieves graduate students (GraduateStudent) who take the course (takesCourse) with the ID "http://www.Department0.University0.edu/GraduateCourse0".

🔎

Note: In the description above, the terms GraduateStudent and takesCourse refer to their concepts in the LUBM ontology. You can model these in Rel with the LUBM:GraduateStudent and LUBM:takes_course relations.

In terms of the LUBM knowledge graph, Query 1 asks for nodes ?X that are of type GraduateStudent and are the source node in a takesCourse edge with a specific target entity. The following diagram illustrates this:

Here’s what the original SPARQL query looks like:

  PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
  SELECT ?X
  WHERE
  {?X rdf:type ub:GraduateStudent .
  ?X ub:takesCourse
  http://www.Department0.University0.edu/GraduateCourse0}

The query returns the IDs for all GraduateStudent entities that satisfy the query.

In Rel, you can write Query 1 as follows:

// model
 
def answer:q1(x) {
    exists(course:
        LUBM:GraduateStudent(x)
        and LUBM:takes_course(x, course)
        and LUBM:has_id(course, LUBM:^Id["http://www.Department0.University0.edu/GraduateCourse0"])
    )
}

There are a few things to note about the Rel definition above:

The first three lines in the body of the definition are translated almost directly from the three lines of the WHERE clause in the SPARQL query.
The answer:q1 relation contains entity hashes, whereas the SPARQL query returns URI identifiers. Entity hashes are the preferred way to identify entities in Rel.
answer:q1 is installed so that you can compare the Rel results with the test query answer set that you imported in the Data Import section.

Here are the answers Rel computed for Query 1:

// read query
 
def output(uri) {
    exists(e:
        answer:q1(e)
        and LUBM:show(e, uri)
    )
}

You can write Query 1 more compactly in Rel, although the code no longer translates line-by-line from SPARQL. For example, you can bind the e_student parameter to LUBM:GraduateStudent inside of the definition’s header, and you could leverage LUBM:^Thing to get the entity hash for the course:

// read query
 
// Alternative way to write Query 1 in Rel
def q1_alt(x in LUBM:GraduateStudent) {
    LUBM:takes_course(x, LUBM:^Thing["http://www.Department0.University0.edu/GraduateCourse0"])
}
 
def output(uri) {
    exists(e: q1_alt(e) and uri = LUBM:show[e])
}

You can see that q1_alt contains the same four GraduateStudent entities as answer:q1.

Query 2

This query increases in complexity: Three classes and three properties are involved. Additionally, there is a triangular pattern of relationships between the objects involved.

Query 2 retrieves the list of graduate students (GraduateStudent), universities (University), and departments (Department), such that the graduate student is a member of a department (memberOf), where this Department is a part of the University (subOrganization) from which the graduate student obtained an undergraduate degree (undergraduateDegreeFrom).

The result of Query 2 is the set of all triples ?X, ?Y, and ?Z that have the following structure in the LUBM knowledge graph:

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y, ?Z
    WHERE
    {?X rdf:type ub:GraduateStudent .
    ?Y rdf:type ub:University .
    ?Z rdf:type ub:Department .
    ?X ub:memberOf ?Z .
    ?Z ub:subOrganizationOf ?Y .
    ?X ub:undergraduateDegreeFrom ?Y}

You can write Query 2 in Rel as follows:

// model
 
def answer:q2(x in LUBM:Student, y in LUBM:University, z in LUBM:Department) {
    LUBM:member_of(x, z)
    and LUBM:sub_organization_of(z, y)
    and LUBM:undergraduate_degree_from(x, y)
}

Query 3

This query is similar to Query 1 but class Publication has a wide hierarchy.

Query 3 retrieves the list of publications (Publication), where the author (PublicationAuthor) is "http://www.Department0.University0.edu/AssistantProfessor0".

The result of Query 3 is the set of all ?X that have the following structure in the LUBM knowledge graph:

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Publication .
    ?X ub:publicationAuthor
        http://www.Department0.University0.edu/AssistantProfessor0}

Here’s how to write Query 3 in Rel:

// model
 
def answer:q3(x in LUBM:Publication) {
    LUBM:publication_author(x, LUBM:^Thing["http://www.Department0.University0.edu/AssistantProfessor0"])
}

Query 4

This query has small input and high selectivity. It assumes subClassOf relationship between Professor and its subclasses. Class Professor has a wide hierarchy. Another feature is that it queries about multiple properties of a single class.

Query 4 retrieves the ID, name, email address, and telephone of professors (Professor) who work for (workFor) the department "http://www.Department0.University0.edu".

The result of Query 4 is the set of all ?X, Y1, Y2, and Y3 that have the following structure in the LUBM knowledge graph:

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y1, ?Y2, ?Y3
    WHERE
    {?X rdf:type ub:Professor .
    ?X ub:worksFor <http://www.Department0.University0.edu> .
    ?X ub:name ?Y1 .
    ?X ub:emailAddress ?Y2 .
    ?X ub:telephone ?Y3}

Here’s how to write Query 4 in Rel:

// model
 
def answer:q4(x in LUBM:Professor, y1 in LUBM:Name, y2 in LUBM:Email, y3 in LUBM:Telephone) {
    LUBM:works_for(x, LUBM:^Thing["http://www.Department0.University0.edu"])
    and LUBM:has_name(x, y1)
    and LUBM:has_email(x, y2)
    and LUBM:has_telephone(x, y3)
}

Query 5

This query assumes subClassOf relationship between Person and its subclasses and subPropertyOf relationship between memberOf and its subproperties. Moreover, class Person features a deep and wide hierarchy.

Query 5 retrieves the list of people (Person) who are members of (memberOf) the department "http://www.Department0.University0.edu".

The result of Query 5 is the set of all ?X that have the following structure in the LUBM knowledge graph:

Here’s the SPARQL query:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Person .
    ?X ub:memberOf <http://www.Department0.University0.edu>}

You can write the query above in Rel as follows:

// model
 
def answer:q5(p in LUBM:Person) {
    LUBM:member_of(p, LUBM:^Thing["http://www.Department0.University0.edu"])
}

Query 6

This query queries about only one class. But it assumes both the explicit subClassOf relationship between UndergraduateStudent and Student and the implicit one between GraduateStudent and Student. In addition, it has large input and low selectivity.

Query 6 retrieves all the individuals that are of class Student. The result of Query 6 is the set of all ?X that have the following structure in the LUBM knowledge graph:

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X WHERE {?X rdf:type ub:Student}

You can write the query above in Rel as follows:

// model
 
def answer:q6 = LUBM:Student

Query 7

This query is similar to Query 6 in terms of class Student but it increases in the number of classes and properties and its selectivity is high.

Query 7 retrieves the list of students (Student) and courses (Course) such that these students take the courses (takesCourse) and that these courses are taught by faculty "http://www.Department0.University0.edu/AssociateProfessor0".

The result of Query 7 is the set of all ?X and ?Y with the following structure in the LUBM knowledge graph:

Here’s the SPARQL query:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y
    WHERE
    {?X rdf:type ub:Student .
    ?Y rdf:type ub:Course .
    ?X ub:takesCourse ?Y .
    <http://www.Department0.University0.edu/AssociateProfessor0>,
        ub:teacherOf, ?Y}

You can write the query above in Rel as follows:

// model
 
def answer:q7(x in LUBM:Student, y in LUBM:Course) {
    LUBM:takes_course(x, y)
    and LUBM:teacher_of(LUBM:^Thing["http://www.Department0.University0.edu/AssociateProfessor0"], y)
}

Query 8

This query is further more complex than Query 7 by including one more property.

Query 8 retrieves the list of students (Student), departments (Department), and email addresses (emailAddress) of the students, such that the student is a member of the department (memberOf) and the department is a suborganization of (subOrganizationOf) the university "http://www.University0.edu".

The result of Query 8 is the set of all ?X, ?Y, and ?Z with the following structure in the LUBM knowledge graph:

Here’s what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y, ?Z
    WHERE
    {?X rdf:type ub:Student .
    ?Y rdf:type ub:Department .
    ?X ub:memberOf ?Y .
    ?Y ub:subOrganizationOf <http://www.University0.edu> .
    ?X ub:emailAddress ?Z}

In Rel, you can write the query above as follows:

// model
 
def answer:q8(x in LUBM:Student, y in LUBM:Department, z in LUBM:Email) {
    LUBM:member_of(x, y)
    and LUBM:sub_organization_of(y, LUBM:^Thing["http://www.University0.edu"])
    and LUBM:has_email(x, z)
}

Query 9

Besides the aforementioned features of class Student and the wide hierarchy of class Faculty, like Query 2, this query is characterized by the most classes and properties in the query set and there is a triangular pattern of relationships.

Query 9 retrieves the list of students (Student), faculty (Faculty), and courses (Courses) such that these students take the courses (takesCourse) taught by their advisors (teacherOf).

The result of Query 9 is the set of all ?X, ?Y, and ?Z with the following structure in the LUBM knowledge graph:

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y, ?Z
    WHERE
    {?X rdf:type ub:Student .
    ?Y rdf:type ub:Faculty .
    ?Z rdf:type ub:Course .
    ?X ub:advisor ?Y .
    ?Y ub:teacherOf ?Z .
    ?X ub:takesCourse ?Z}

Here’s how to write the query above in Rel:

// model
 
def answer:q9(x in LUBM:Student, y in LUBM:Faculty, z in LUBM:Course) {
    LUBM:advisor(x, y)
    and LUBM:teacher_of(y, z)
    and LUBM:takes_course(x, z)
}

Query 10

This query differs from Query 6, 7, 8 and 9 in that it only requires the (implicit) subClassOf relationship between GraduateStudent and Student, i.e., subClassOf relationship between UndergraduateStudent and Student does not add to the results.

Query 10 retrieves the list of the students (Student) who take (takesCourse) the course "http://www.Department0.University0.edu/GraduateCourse0". The result of Query 10 is the set of all ?X with the following structure in the LUBM knowledge graph:

Here’s the SPARQL query:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Student .
    ?X ub:takesCourse
    <http://www.Department0.University0.edu/GraduateCourse0>}
        ?X ub:takesCourse ?Z}

You can write the query above in Rel as follows:

// model
 
def answer:q10(x in LUBM:Student) {
    LUBM:takes_course(x, LUBM:^Thing["http://www.Department0.University0.edu/GraduateCourse0"])
}

Query 11

Query 11, 12 and 13 are intended to verify the presence of certain OWL reasoning capabilities in the system. In this query, property subOrganizationOf is defined as transitive. Since in the benchmark data, instances of ResearchGroup are stated as a suborganization of a Department individual and the later suborganization of a University individual, inference about the subOrgnizationOf relationship between instances of ResearchGroup and University is required to answer this query. Additionally, its input is small.

Query 11 retrieves the list of research groups (ResearchGroup) that are suborganizations of (subOrganizationOf) the university "http://www.University0.edu".

The result of Query 11 is the set of all ?X with the following structure in the LUBM knowlege graph:

You can write Query 11 in SPARQL as follows:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:ResearchGroup .
    ?X ub:subOrganizationOf <http://www.University0.edu>}

Here’s how to write Query 11 in Rel:

// model
 
def answer:q11(x in LUBM:ResearchGroup) {
    LUBM:sub_organization_of(x, LUBM:^Thing["http://www.University0.edu"])
}

Query 12

The benchmark data do not produce any instances of class Chair. Instead, each Department individual is linked to the chair professor of that department by property headOf. Hence this query requires realization, i.e., inference that that professor is an instance of class Chair because he or she is the head of a department. Input of this query is small as well.

Query 12 retrieves the list of chairs (Chair) and their departments (Department) such that the chair works for (worksFor) the department and the department is a suborganization of (suborganizationOf) the university "http://www.University0.edu".

The result of Query 12 is the set of all ?X and ?Y with the following structure in the LUBM knowledge graph:

This is what Query 12 looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y
    WHERE
    {?X rdf:type ub:Chair .
    ?Y rdf:type ub:Department .
    ?X ub:worksFor ?Y .
    ?Y ub:subOrganizationOf <http://www.University0.edu>}

In Rel, you can write Query 12 as follows:

// model
 
def answer:q12(x in LUBM:Chair, y in LUBM:Department) {
    LUBM:works_for(x, y)
    and LUBM:sub_organization_of(y, LUBM:^Thing["http://www.University0.edu"])
}

Query 13

Property hasAlumnus is defined in the benchmark ontology as the inverse of property degreeFrom, which has three subproperties: undergraduateDegreeFrom, mastersDegreeFrom, and doctoralDegreeFrom. The benchmark data state a person as an alumnus of a university using one of these three subproperties instead of hasAlumnus. Therefore, this query assumes subPropertyOf relationships between degreeFrom and its subproperties, and also requires inference about inverseOf.

Query 13 retrieves the list of people (Person) who are alumni (hasAlumnus) of university "http://www.University0.edu".

The result of Query 13 is the set of all ?X with the following structure in the LUBM knowlege graph:

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Person .
    <http://www.University0.edu> ub:hasAlumnus ?X}

Here’s how to write the query above in Rel:

// model
 
def answer:q13(x in LUBM:Person) {
    LUBM:has_alumnus(LUBM:^Thing["http://www.University0.edu"], x)
}

Query 14

This query is the simplest in the test set. This query represents those with large input and low selectivity and does not assume any hierarchy information or inference.

Query 14 retrieves the list of all the undergraduate students (UnderGraduateStudent).

The result of Query 14 is the set of all ?X with the following structure in the LUBM knowledge graph:

Here’s how to write Query 14 in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE {?X rdf:type ub:UndergraduateStudent}

You can write Query 14 in Rel as follows:

// model
 
def answer:q14(x) = LUBM:UndergraduateStudent(x)

Query Validation

In this section you’ll write Rel code that checks that the query results computed by the system match the answers provided in the LUBM benchmark. After you import the CSV files containing the benchmark answers into a base relation called answer_ref, you’ll perform the validation in two steps:

Convert the answer relation to Graph Normal Form (GNF) so that it matches the structure of the answer_ref base relation.
Confirm that all of the results match the reference answers using integrity constraints.

Data Import

The following declarations import the reference answers to the test queries and store them as a base relation called answers_ref:

// write query
 
def path_answer = "s3://relationalai-documentation-public/lubm/answers/"
 
module answer_configs
    def q1:path = concat[path_answer, "answers_query1.txt"]
    def q1:syntax:header = (1, :student_id)
 
    def q2:path = concat[path_answer, "answers_query2.txt"]
    def q2:syntax:header {
        (1, :student_id); (2, :university_id); (3, :department_id)
    }
    def q2:syntax:delim = '\t'
 
    def q3:path = concat[path_answer, "answers_query3.txt"]
    def q3:syntax:header = (1, :publication_id)
 
    def q4:path = concat[path_answer, "answers_query4.txt"]
    def q4:syntax:header {
        (1, :prof_id); (2, :prof_name); (3, :prof_email); (4, :prof_tele)
    }
    def q4:syntax:delim = '\t'
 
    def q5:path = concat[path_answer, "answers_query5.txt"]
    def q5:syntax:header = (1, :person_id)
 
    def q6:path = concat[path_answer, "answers_query6.txt"]
    def q6:syntax:header = (1, :student_id)
 
    def q7:path = concat[path_answer, "answers_query7.txt"]
    def q7:syntax:header = (1, :student_id); (2, :course_id)
    def q7:syntax:delim = '\t'
 
    def q8:path = concat[path_answer, "answers_query8.txt"]
    def q8:syntax:header {
        (1, :student_id); (2, :department_id); (3, :student_email)
    }
    def q8:syntax:delim = '\t'
 
    def q9:path = concat[path_answer, "answers_query9.txt"]
    def q9:syntax:header {
        (1, :student_id); (2, :faculty_id); (3, :course_id)
    }
    def q9:syntax:delim = '\t'
 
    def q10:path = concat[path_answer, "answers_query10.txt"]
    def q10:syntax:header =(1, :student_id)
 
    def q11:path = concat[path_answer, "answers_query11.txt"]
    def q11:syntax:header = (1, :researchgroup_id)
 
    def q12:path = concat[path_answer, "answers_query12.txt"]
    def q12:syntax:header = (1, :chair_id); (2, :dept_id)
    def q12:syntax:delim = '\t'
 
    def q13:path = concat[path_answer, "answers_query13.txt"]
    def q13:syntax:header = (1, :person_id)
 
    def q14:path = concat[path_answer, "answers_query14.txt"]
    def q14:syntax:header = (1, :undergraduate_student_id)
end
 
def insert:answer_ref[id] = load_csv[answer_configs[id]]

There’s a bit of a problem now, though. The answers_ref relation is in GNF, but the answers relation is not.

Data Normalization

Take a look at the structure of the answer_ref:q7 relation:

// read query
 
def output = top_csv[3, answer_ref:q7]

The output represents the first three rows of the reference answers CSV file. Each tuple corresponds to a single “cell” in the CSV file, indexed by its column name and row number. The first two elements of each tuple are the column name and row number. The third element is the value at that column and row position. Note that the row numbers begin with 2 because the first row — the header row — is schema, not data.

In order to compare the query results in Rel to the reference query answers, you need to transform the answer relation into the same structure as the answer_ref relation. To achieve this, first you define a relation answer_column_order, which captures the column names of all the queries.

// model
 
module answer_column_order
    def q1 = (:student_id, 1)
    def q2 = (:student_id, 1); (:university_id, 2); (:department_id, 3)
    def q3 = (:publication_id, 1)
    def q4 = (:prof_id, 1); (:prof_name, 2); (:prof_email, 3); (:prof_tele, 4)
    def q5 = (:person_id, 1)
    def q6 = (:student_id, 1)
    def q7 = (:student_id, 1); (:course_id, 2)
    def q8 = (:student_id, 1); (:department_id, 2); (:student_email, 3)
    def q9 = (:student_id, 1); (:faculty_id, 2); (:course_id, 3)
    def q10 = (:student_id, 1)
    def q11 = (:researchgroup_id, 1)
    def q12 = (:chair_id, 1); (:dept_id, 2)
    def q13 = (:person_id, 1)
    def q14 = (:undergraduate_student_id, 1)
end

Next, you define answer_gnf, which converts all the query results in the answer relation into GNF.

// model
 
def answer_gnf[q](column_name, index, text) {
    exists(val:
        answer_column_order[q].(
            pivot[x... : enumerate[answer[q]](index, x...)]
        )(column_name, val)
        and text = LUBM:show[val]
    )
}

Here, answer_gnf uses the Standard Library relations enumerate and pivot to convert high-arity relations to high-cardinality relations. You can use the dot join operator to assign column names to each value in the query results. The entity hashes contained in the answer relation are converted to the correct strings using the show relation.

Answer Validation

You can validate the query answers using two integrity constraints.

The first integrity constraint, all_answers_correct, checks that all found answers are present in the reference answers. The second integrity constraint, all_answers_found, checks that all reference answers have been indeed found.

Because the row numbers in answer_ref do not necessarily match the indices of the tuples in answer_gnf, the integrity constraints check that for every query, query, and every individual result, indexed by i, in answer_ref (answer_gnf) there exists a result in answer_gnf (answer_ref), which is indexed by j.

// read query
 
ic all_answers_correct(query, i) {
    answer_gnf(query, _, i, _)
    implies
    exists(j:
        forall(col where answer_ref(query, col, j, _):
            answer_gnf[query, col, i] = answer_ref[query, col, j]
        )
    )
}
 
ic all_answers_found(query, i) {
    answer_ref(query, _, i, _)
    implies
    exists(j:
        forall(col where answer_gnf(query, col, j, _):
            answer_gnf[query, col, j] = answer_ref[query, col, i]
        )
    )
}

These queries run without triggering any integrity constraint violations, which confirms that the results are correct and complete.

Summary

Rel is a powerful modeling language when it comes to expressing complex ontologies and their data. In particular, the OWL ontology of the LUBM benchmark can be modeled naturally in Rel.

In this guide, you saw how to:

Import the LUBM dataset as a base relation.
Transform the base data to a relational knowledge graph.
Model entity and edge hierarchies.
Infer new facts in the knowledge graph through reasoning.
Translate the LUBM test queries from SPARQL to Rel.
Transform query results to Graph Normal Form.
Write integrity constraints to validate the query results.

The Lehigh University Benchmark

Goal

The Dataset

Data Import

Ontology

Data Transformation

Entities

Properties

Boolean Properties

Edges

Entity Definitions

Property Definitions

Edge Definitions

Displaying Entities

Reasoning

Entity Hierarchies

Equivalent Classes

Edge Hierarchies

Transitive Properties

Inverse Properties

Data Consistency

Test Queries

Query 1

Query 2

Query 3

Query 4

Query 5

Query 6

Query 7

Query 8

Query 9

Query 10

Query 11

Query 12

Query 13

Query 14

Query Validation

Data Import

Data Normalization

Answer Validation

Summary

See Also