Skip to content
Rel
HOW-TO GUIDES
Modeling and Reasoning: LUBM

Modeling and Reasoning: The Lehigh University Benchmark

This how-to guide demonstrates how to express reasoning using the Lehigh University Benchmark (LUBM).

Goal

This guide shows you how to express reasoning aspects like hierarchy and equivalence in Rel.

This guide uses the Leigh University Benchmark (LUBM) — a very popular benchmark in the Web Ontology Language (OWL) domain — to test the reasoning capabilities of a system. You will learn how OWL constructs like concepts/classes, relationships (properties) along with their hierarchies, and equivalent classes translate into Rel.

Preliminaries

This how-to guide assumes that you are familiar with the Rel language. The following guides cover concepts useful — but not required — to understand this how-to guide:

Introduction

In general, a Knowledge Graph (KG) consists of an ontology (schema) and data. The ontology describes the domain of interest including the different kinds of entities and how they relate to each other. Reasoning is an integral part of working with an ontology. It is through reasoning that new relationships and concepts are discovered.

LUBM is a KG that consists of a university domain ontology, customizable and repeatable synthetic data, and a set of test queries.

With Rel, you can use relations, entities, and integrity constraints to model the LUBM ontology, data, and queries.

This guide is organized as follows:

  • Data Import: In this first section, you import the synthetic data and the actual answers for the 14 test queries.
  • Conceptual Modeling: This section explains how the LUBM ontology and data are represented in Rel using entities and reasoning.
  • LUBM Queries: This section illustrates the 14 queries in Rel and how they compare to SPARQL.
  • Query Validation: The final section validates the results of the 14 queries.

Before you get to work, it’s useful to briefly go over the LUBM ontology.

It has 43 concepts and 25 object properties. For this guide, the reference data pertain to a single university.

  • This university has approximately 15 to 25 departments.
  • Each department has full professors (7 to 10), associate professors (10 to 14), and assistant professors (8 to 11).
  • One of the full professors is the head of the department.
  • Every faculty member (full professor, associate professor, assistant professor) teaches courses (1 to 2).
  • Every department has graduate students (3 to 4 per faculty member) and undergraduate students (8 to 14 per faculty member). A detailed profile of the data is presented here.

Data Import

The data is loaded as shown below using load_csv and insert. You can find more information on data import in the CSV Import how-to guide. First, you define the CSV data loading configurations, and then you load the data:

// update

// data location
def path = "s3://relationalai-documentation-public/lubm/sf0.1/"

def configs = {

    (:edges, :path, concat[path, "edges.csv"]);

    (:university, :path, concat[path, "University.csv"]);
    (:university, :syntax, :header,
        {(1, :id); (2, :name); (3, :type); (4, :uri)}
    );

    (:department, :path, concat[path, "Department.csv"]);
    (:department, :syntax, :header,
        {(1, :id); (2, :name); (3, :type); (4, :uri)}
    );

    (:faculty, :path, concat[path, "Faculty.csv"]);
    (:faculty, :syntax, :header,
        {(1, :id); (2, :name); (3, :email); (4, :type);
        (5, :telephone); (6, :uri); (7, :research)}
    );

    (:researchgroup, :path, concat[path, "ResearchGroup.csv"]);
    (:researchgroup, :syntax, :header,
        {(1, :id); (2, :type); (3, :uri)}
    );

    (:publication, :path, concat[path, "Publication.csv"]);
    (:publication, :syntax, :header,
        {(1, :id); (2, :name); (3, :type); (4, :uri)}
    );

    (:course, :path, concat[path, "Course.csv"]);
    (:course, :syntax, :header,
        {(1, :id); (2, :name); (3, :type); (4, :uri)}
    );

    (:graduate_student, :path, concat[path, "Graduatestudent.csv"]);
    (:graduate_student, :syntax, :header,
        {(1, :id); (2, :research_assistant); (3, :name);
        (4, :email); (5, :type);(6, :telephone); (7, :uri)}
    );

    (:undergraduate_student, :path, concat[path, "Undergraduatestudent.csv"]);
    (:undergraduate_student, :syntax, :header,
        {(1, :id); (2, :research_assistant); (3, :name);
        (4, :email); (5, :type); (6, :telephone); (7, :uri)}
    )
}

def csv[i] = load_csv[configs[i]]
def insert[:lubm_csv] = csv

You can load the data with the command load_csv[]. insert[] indicates that you want to store the data as a base relation. For more details on base relations, see the Updating Data: Working with Base Relations concept guide.

Similarly, you define config relations to load the 14 LUBM query answer files below. You also define a relation answer_ref to encapsulate the results of the 14 LUBM queries. Lastly, you use insert to add the load_csv results to this relation:

// update

// location of the reference answer
def path_answer = "s3://relationalai-documentation-public/lubm/answers/"

def answer_configs = {

    (:q1, :path, concat[path_answer, "answers_query1.txt"]);
    (:q1, :syntax, :header, {(1, :student_id)});

    (:q2, :path, concat[path_answer, "answers_query2.txt"]);
    (:q2, :syntax, :header,
        {(1, :student_id); (2, :university_id); (3, :department_id)}
    );
    (:q2, :syntax, :delim, '\t');

    (:q3, :path, concat[path_answer, "answers_query3.txt"]);
    (:q3, :syntax, :header, {(1, :publication_id)});

    (:q4, :path, concat[path_answer, "answers_query4.txt"]);
    (:q4, :syntax, :header,
        {(1, :prof_id); (2, :prof_name); (3, :prof_email); (4, :prof_tele)}
    );
    (:q4, :syntax, :delim, '\t');

    (:q5, :path, concat[path_answer, "answers_query5.txt"]);
    (:q5, :syntax, :header, {(1, :person_id)});

    (:q6, :path, concat[path_answer, "answers_query6.txt"]);
    (:q6, :syntax, :header, {(1, :student_id)});

    (:q7, :path, concat[path_answer, "answers_query7.txt"]);
    (:q7, :syntax, :header, {(1, :student_id); (2, :course_id)});
    (:q7, :syntax, :delim, '\t');

    (:q8, :path, concat[path_answer, "answers_query8.txt"]);
    (:q8, :syntax, :header,
        {(1, :student_id); (2, :department_id); (3, :student_email)}
    );
    (:q8, :syntax, :delim, '\t');

    (:q9, :path, concat[path_answer, "answers_query9.txt"]);
    (:q9, :syntax, :header,
        {(1, :student_id); (2, :faculty_id); (3, :course_id)}
    );
    (:q9, :syntax, :delim, '\t');

    (:q10, :path, concat[path_answer, "answers_query10.txt"]);
    (:q10, :syntax, :header, {(1, :student_id)});

    (:q11, :path, concat[path_answer, "answers_query11.txt"]);
    (:q11, :syntax, :header, {(1, :researchgroup_id)});

    (:q12, :path, concat[path_answer, "answers_query12.txt"]);
    (:q12, :syntax, :header, {(1, :chair_id);(2, :dept_id)});
    (:q12, :syntax, :delim, '\t');

    (:q13, :path, concat[path_answer, "answers_query13.txt"]);
    (:q13, :syntax, :header, {(1, :person_id)});

    (:q14, :path, concat[path_answer, "answers_query14.txt"]);
    (:q14, :syntax, :header, {(1, :undergraduate_student_id)});
}

def answers_csv[i] = load_csv[answer_configs[i]]
def insert[:answer_ref] = answers_csv

You have now finished importing the data needed for this guide. Next, you will learn how to properly model the data you have just imported.

Conceptual Modeling

Conceptual modeling is the process of capturing the ontological schema and the data. This guide shows you how to represent the LUBM ontology in Rel. In particular, you will focus on the following aspects:

Creating Entities

Based on the concepts defined in the LUBM ontology and the generated synthetic data, you can now define 16 concepts using entities, along with the overarching Thing entity type:

// install

entity GraduateStudent
graduate_student_from_id = lubm_csv:graduate_student[:id, _]

entity UndergraduateStudent
undergraduate_student_from_id = lubm_csv:undergraduate_student[:id, _]

entity Course
course_from_id = lubm_csv:course[:id, _]

entity University
university_from_id = lubm_csv:university[:id, _]

entity Department
department_from_id = lubm_csv:department[:id, _]

entity Publication
publication_from_id = lubm_csv:publication[:id, _]

entity Faculty
faculty_from_id = lubm_csv:faculty[:id, _]

entity ResearchGroup
researchgroup_from_id = lubm_csv:researchgroup[:id, _]

As shown above, you defined eight of the 16 entity types. The constructors (for example, department_from_id) take the id from the CSV data (for example, lubm_csv:department) and generate entity keys for the corresponding entity type (for example, Department). The remaining entity types are defined in the Reasoning section. For more details on entities and their construction, see the Entities concept guide.

Below is the constructor department_from_id mapping the department id to the Department entity keys:

// query

department_from_id
Loading department-from-id...

The entity key is unique for each entity (instance). This is similar to having an Internationalized Resource Identifier (IRI) assigned to every individual of a concept in OWL.

Assigning Entity Attributes

After you have created the entities, you need to assign the attributes (i.e., data properties in OWL) to the entities.

To do so, you can define a relation (for example, graduate_student) using Graph Normal Form that holds all attributes for a given entity type (for example, GraduateStudent).

In particular, after the assignments below, you will have a relation graduate_student that holds the triplet (e, :email, v) saying graduate student e has the email v:

// install

// Helper relation to assign entities to attributes
@inline
def assign_entity_attributes[CSV, MAP, column](e, col, v) =
    CSV[:id, row].MAP(e) and
    CSV(column, row, v) and
    col = column
    from row

def graduate_student = assign_entity_attributes[
    lubm_csv:graduate_student,
    graduate_student_from_id,
    {:id; :research_assistant; :name; :email; :type; :telephone; :uri}
]

def undergraduate_student = assign_entity_attributes[
    lubm_csv:undergraduate_student,
    undergraduate_student_from_id,
    {:id; :research_assistant; :name; :email; :type; :telephone; :uri}
]

def course = assign_entity_attributes[
    lubm_csv:course,
    course_from_id,
    {:id; :name; :type; :uri}
]

def university = assign_entity_attributes[
    lubm_csv:university,
    university_from_id,
    {:id; :name; :type; :uri}
]

def department = assign_entity_attributes[
    lubm_csv:department,
    department_from_id,
    {:id; :name; :type; :uri}
]

def publication = assign_entity_attributes[
    lubm_csv:publication,
    publication_from_id,
    {:id; :name; :type; :uri}
]

def faculty = assign_entity_attributes[
    lubm_csv:faculty,
    faculty_from_id,
    {:id; :name; :email; :type; :telephone; :uri, :research}
]

def researchgroup = assign_entity_attributes[
    lubm_csv:researchgroup,
    researchgroup_from_id,
    {:id; :type; :uri}
]

You defined a helper relation, assign_entity_attributes, which maps each entity to its attribute values. You can read the definition of assign_entity_attributes as follows:

  • CSV: relation holding the imported CSV data.
  • MAP: entity constructor.
  • attr: attribute name.

The variables above can be understood as “input” variables, and a collection of tuples (e, attr, v) will be evaluated, mapping the attribute attr of the entity e to its value v.

Again, consider the relation graduate_student. It is defined by using the helper relation assign_entity_attributes. The relation lubm_csv:graduate_student holds the imported CSV data, graduate_student_from_id is the constructor for the entity type GraduateStudent, and {:id; :research_assistant; :name; :email; :type; :telephone; :uri} are the attributes you want to assign to graduate students. Below, all attributes of a graduate student with id = "http://www.Department0.University0.edu/GraduateStudent10" are shown:

// query

graduate_student[e]
from e
where graduate_student(e, :id, "http://www.Department0.University0.edu/GraduateStudent10")
Loading grad-student-attributes...

All other entities are defined in a similar fashion.

Linking Entities

OWL uses object properties to link two individuals that belong to two different concepts. In Rel, this translate to defining binary relations that connect the corresponding entities with each other.

You can start by define the overarching entity type Thing. It is very useful to do this before defining the connections between entities in order to generalize over all entity types when mapping the :id information from the edges table to the corresponding entity.

Generally, every individual in OWL is of type Thing. Here, you initially define the relation thing followed by the definition of entity Thing. The relation thing is the union of the following relations: undergraduate_student, graduate_student, course, university, department, publication, faculty, researchgroup. The ; acts as the union operator. See the Semicolon section of the Language Reference for more details.

Next, the entity Thing is defined using the relation first which selects the first element of the relation thing. In our case the first element of the thing relation are the entity keys.

// install

// In the LUBM OWL ontology, every concept is a subclass of "Thing":
def thing =
    course; department; faculty; graduate_student;
    publication; researchgroup; undergraduate_student; university

def Thing = first[thing]

A detailed explanation on thing and how to reason about it is given below in the Class/Entity Hierarchy section.

Now you are ready to establish the connections between the entities:

// install

// One relation table for all OWL Object properties
def edges(predicate, e_source, e_target) =
    lubm_csv:edges(:TYPE, pos, predicate) and
    lubm_csv:edges(:SOURCE, pos, a) and
    lubm_csv:edges(:TARGET, pos, b) and
    thing(e_source, :id, a) and
    thing(e_target, :id, b)
    from a, b, pos

/* Defining each individual relation from edges relation */
def takes_course = edges["takesCourse"]
def member_of = edges["memberOf"]
def sub_organization_of = edges["subOrganizationOf"]
def undergraduate_degree_from = edges["undergraduateDegreeFrom"]
def graduate_degree_from = edges["mastersDegreeFrom"]
def phd_degree_from = edges["doctoralDegreeFrom"]
def publication_author = edges["publicationAuthor"]
def works_for = edges["worksFor"]
def advisor = edges["advisor"]
def head_of = edges["headOf"]
def teacher_of =  edges["teacherOf"]

The edges relation is an arity-3 relation; it includes all binary edges that connect two entities as specified by edges. Each individual edge is keyed by its name (for example, “takesCourse”).

Here, you can see that using thing is quite handy as you can generalize over all entity types and don’t need to explicitly state each individual entity type.

Finally, each individual edge is then defined from the edges relation and brought into the global scope for easier use.

A sample subset is shown below:

// query

top[5, takes_course]
Loading takes-course...

Reasoning

In OWL, reasoning is the process of inferring new facts about an individual based on the ontology and the data (explicitly stated facts). One of the main goals of the LUBM benchmark is to test the reasoning capability of the underlying system.

This section shows how some of the main reasoning requirements of the LUBM ontology can be realized in Rel. In particular, it covers:

Class/Entity Hierarchy

As shown earlier in Linking Entities, Thing is a superclass/superentity that encapsulates all the individual classes/entities.

In the LUBM ontology, there are many such hierarchies. For example, AssistantProfessor, AssociateProfessor, and FullProfessor are the subentities of Faculty, which was directly derived from the CSV data. This hierarchy can be modeled in Rel in the following way:

// install

def assistant_professor[x] = faculty[x], faculty(x, :type, "AssistantProfessor")
def associate_professor[x] = faculty[x], faculty(x, :type, "AssociateProfessor")
def full_professor[x] = faculty[x], faculty(x, :type, "FullProfessor")

def AssistantProfessor = first[assistant_professor]
def AssociateProfessor = first[associate_professor]
def FullProfessor = first[full_professor]

The code above first defines assistant_professor by extracting the faculty members based on the :type “AssistantProfessor”. The relationsassociate_professor and full_professor are similarly defined from the faculty relation. The unary relations AssistantProfessor, AssociateProfessor, and FullProfessor are then defined by capturing the entity keys from the aforementioned relations using the relation first.

Here are some entity keys of AssistantProfessor:

// query

last[top[5, AssistantProfessor]]
Loading assistant-professor...

Here are their attributes:

// query

table[
    assistant_professor[e] for e in last[top[5, AssistantProfessor]]
]
Loading assistant-professor-attributes...

The following are other hierarchies from the LUBM ontology that need to be modeled:

  • University, Department, and ResearchGroup are subentities of the Organization relation.
  • Student, and Faculty are subentities of Person.
  • AssistantProfessor, AssociateProfessor, and FullProfessor are subentities of Professor.

In Rel, you can model these three hierarchies as follows:

// install

def organization = university; department; researchgroup
def person = graduate_student; undergraduate_student; faculty
def professor = assistant_professor; associate_professor; full_professor

def Organization = first[organization]
def Person = first[person]
def Professor = first[professor]

As you can see, organization is defined as the union of university, department, and researchgroup. Here, ; acts as the union operator. For more information about ; see the language reference. The relation organization contains all the universities, departments and research groups defined earlier. Similarly,person and professor are defined.

Equivalent Class

In OWL, some concepts are defined with necessary and sufficient conditions. These concepts/classes are called equivalent classes. In the LUBM ontology, there are two equivalent classes needed for answering the LUBM queries:

  • Every UndergraduateStudent is a Student, and any Person who takes_course is also a Student.
  • A Chair is any Person who is the head_of a Department.

In Rel, they can be modeled in the following way:

// install

def student[x] = person[x], takes_course(x, _)
def student = undergraduate_student
def chair[x] = person[x], head_of(x, _)

def Student = first[student]
def Chair = first[chair]

Initially, you define a student relation as a person who takes_course (any course). Next, you specify that any undergraduate_student is a student. chair is also defined as a person who is the head_of the department. Lastly, the corresponding unary entity relations Student and Chair are defined.

Note that the generated data has no explicitly defined Chair instance. However, you can use an integrity constraint to assert and check the equivalent assertion with respect to the Chair entity type as follows:

// install

ic chair_equivalence(e) {
    Person(e) and head_of(e, _) iff Chair(e)
}

A sample output of the chair relation with id = http://www.Department13.University0.edu/FullProfessor7 is shown below:

// query

chair[e] from e where chair(e, :id, "http://www.Department13.University0.edu/FullProfessor7")
Loading chair...

Edge Hierarchy

Similar to entities (classes in OWL), edge relations can also have a hierarchy. The LUBM ontology defines one of the superproperties as:

  • degree_from is a super property of undergraduate_degree_from, under_graduate_degree_from, and phd_degree_from.

In Rel, this can be modeled as:

// install

def degree_from = undergraduate_degree_from; graduate_degree_from; phd_degree_from

The degree_from relation captures all the entities that are linked with undergraduate_degree_from, graduate_degree_from, and phd_degree_from by performing a union over these individual edge relations.

Moreover, in the LUBM ontology some of the subproperties are defined as follows:

  • head_of is a subproperty of works_for.
  • works_for is a subproperty of member_of.

In Rel, these subproperties can be modeled as:

// install

def works_for = head_of
def member_of = works_for

The definition above states that the entities related to head_of are also related to works_for; the reverse meaning, however, doesn’t apply. For instance, a person can work for an institution and not be the head of it. Likewise, the member_of relation is defined.

Transitive and Inverse Properties

In the LUBM ontology, sub_organization_of is a transitive property; has_alumnus is an inverse property of degree_from, which is a superset of undergraduate_degree_from, graduate_degree_from, and phd_degree_from, as discussed above in the Edge Hierarchy section. In Rel, these properties can be modeled as shown below:

// install

def sub_organization_of = sub_organization_of.sub_organization_of
def has_alumnus = transpose[degree_from]

As demonstrated above, the relation sub_organization_of initially only connected Department with Universityentities and ResearchGroup with Department entities. After the recursive definition, the relation sub_organization_of will also directly connect Researchgroup with Department entities.

The has_alumnus relation captures the entities in the inverse direction of the degree_from relation.

Data Consistency Verification

The following integrity constraints are used to verify the facts associated and derived from the data and the LUBM ontology:

// install

ic degree_from_type(p, u) {
    degree_from(p, u) implies Person(p) and University(u)
}

In the code above, degree_from_type(p, u) verifies that degree_from is a binary relation that connects Person with University entities.

// install

ic degree_from_inverse(p, u) {
    degree_from(p, u) iff has_alumnus(u, p)
}

In the second integrity constraint shown above, degree_from_inverse(p, u) verifies that the degree_from and has_alumnus relations are inversely related using the iff relation.

The third integrity constraint head_works_for checks that head_of is a relation that holds two entities that are also present in the works_for relation.

// install

ic head_works_for {
    head_of ⊆ works_for
}

This affirms that head_of is a proper subset of the works_for relation.

LUBM Queries

This section shows how the 14 LUBM benchmark queries can be written and verified in Rel. The beginning of each query quotes their original explanation/purpose. More information about the queries can be found here.

Additionally, this section shows and compares how these queries are written in SPARQL and Rel. It focuses on demonstrating how similar the SPARQL queries can be written in Rel. You will see that SPARQL queries can basically be translated line by line into Rel.

As stated earlier in Assigning Entity Attributes, it is important to remember that the entity key for each entity (instance) is unique within the database. This is similar to having an Internationalized Resource Identifier (IRI) assigned to every individual in OWL.

Queries 1, 4, and 6 are discussed in more detail as these three queries cover all the key aspects. The logic behind the remaining queries can easily be understood from these examples.

Query 1

This query bears large input and high selectivity. It queries just one class and one property and does not assume any hierarchy information or inference.

Query 1 retrieves graduate students (GraduateStudent) who take the course (takesCourse) with the ID "http://www.Department0.University0.edu/GraduateCourse0".

🔎

In the description above, the term “GraduateStudents” refers to the concept in the OWL ontology. This is just to keep the illustration of the Query 1 close to the LUBM ontology.

Query 1

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:GraduateStudent .
    ?X ub:takesCourse
    http://www.Department0.University0.edu/GraduateCourse0}

In Rel, the query above is written as follows:

// install

def answer[:q1] = x :
    graduate_student(e_x, :id, x)
    and course(e_y, :id, "http://www.Department0.University0.edu/GraduateCourse0")
    and takes_course(e_x, e_y)
    from e_x, e_y

Note that the identifying :id of the graduate student is used here instead of the entity key e_x. Entity keys are generated by Rel and are the preferred way to identify students within Rel. However, identifying students by their entity keys won’t let you compare the query results with the reference answers, which use the :id attribute instead.

Another interesting point is using the relation answer to collect the answers of all LUBM queries. To know which answers belong to which query, you can key the answers by their query ID:.

// query

answer[:q1]
Loading query-1...

Query 2

This query increases in complexity: Three classes and three properties are involved. Additionally, there is a triangular pattern of relationships between the objects involved.

Query 2 retrieves the list of graduate students (GraduateStudent), universities (University), and departments (Department), such that the graduate student is a member of a department (memberOf), where this Department is a part of the University (subOrganization) from which the graduate student obtained an undergraduate degree (undergraduateDegreeFrom).

Query 2

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y, ?Z
    WHERE
    {?X rdf:type ub:GraduateStudent .
    ?Y rdf:type ub:University .
    ?Z rdf:type ub:Department .
    ?X ub:memberOf ?Z .
    ?Z ub:subOrganizationOf ?Y .
    ?X ub:undergraduateDegreeFrom ?Y}

In Rel, the query above is written as follows:

// install

def answer[:q2](x, y, z) {
    graduate_student(e_x, :id, x)
    and university(e_y, :id, y)
    and department(e_z, :id, z)
    and member_of(e_x, e_z)
    and sub_organization_of(e_z, e_y)
    and undergraduate_degree_from(e_x, e_y)
    from e_x, e_y, e_z
}

The output of the Query 2 is shown below:

// query

answer[:q2]
Loading query-2...
🔎

There is no output for LUBM Query 2.

Query 3

This query is similar to Query 1 but class Publication has a wide hierarchy.

Query 3 retrieves the list of publications (Publication), where the author (PublicationAuthor) is "http://www.Department0.University0.edu/AssistantProfessor0".

Query 3

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Publication .
    ?X ub:publicationAuthor
        http://www.Department0.University0.edu/AssistantProfessor0}

In Rel, the query above is written as follows:

// install

def answer[:q3](x) {
    publication(e_x, :id, x)
    and faculty(e_y, :id, "http://www.Department0.University0.edu/AssistantProfessor0")
    and publication_author(e_x, e_y)
    from e_x, e_y
}

The output of the Query 3 is shown below:

// query

answer[:q3]
Loading query-3...

Query 4

This query has small input and high selectivity. It assumes subClassOf relationship between Professor and its subclasses. Class Professor has a wide hierarchy. Another feature is that it queries about multiple properties of a single class.

Query 4 retrieves the ID, name, email address, and telephone of professors (Professor) who work for (workFor) the department "http://www.Department0.University0.edu".

Query 4

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y1, ?Y2, ?Y3
    WHERE
    {?X rdf:type ub:Professor .
    ?X ub:worksFor <http://www.Department0.University0.edu> .
    ?X ub:name ?Y1 .
    ?X ub:emailAddress ?Y2 .
    ?X ub:telephone ?Y3}

In Rel, the query above is written as follows:

// install

def answer[:q4](x, y1, y2, y3) {
    professor(e_x, :id, x)
    and professor(e_x, :name, y1)
    and professor(e_x, :email, y2)
    and professor(e_x, :telephone, y3)
    and department(e_z, :id, "http://www.Department0.University0.edu")
    and works_for(e_x, e_z)
    from e_x, e_z
}

All requested attributes associated with the Professor entity is accessed via the attribute relation professor, which contains all attributes that relate to any professor. The query result is captured in the answer[:q4] relation. A sample output of Query 4 is shown below:

// query

top[5, answer[:q4]]
Loading query-4...

Query 5

This query assumes subClassOf relationship between Person and its subclasses and subPropertyOf relationship between memberOf and its subproperties. Moreover, class Person features a deep and wide hierarchy.

Query 5 retrieves the list of people (Person) who are members of (memerOf) the department "http://www.Department0.University0.edu".

Query 5

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Person .
    ?X ub:memberOf <http://www.Department0.University0.edu>}

In Rel, the query above is written as follows:

// install

def answer[:q5](x) {
    person(e_x, :id, x)
    and department(e_y, :id, "http://www.Department0.University0.edu")
    and member_of(e_x, e_y)
    from e_x, e_y
}

A sample output of Query 5 is shown below:

// query

top[5, answer[:q5]]
Loading query-5...

Query 6

This query queries about only one class. But it assumes both the explicit subClassOf relationship between UndergraduateStudent and Student and the implicit one between GraduateStudent and Student. In addition, it has large input and low selectivity.

Query 6 retrieves all the individuals (instances) that are of class Student.

Query 6

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X WHERE {?X rdf:type ub:Student}

In Rel, the query above is written as follows:

// install

def answer[:q6](x) {
    student(_, :id, x)
}

This query uses _ (underscore) as an anonymous existential quantified variable, which avoids giving the variable a name and keeps the query compact. See the language reference for more information. A sample output of Query 6 is shown below:

// query

top[5,  answer[:q6]]
Loading query-6...

Query 7

This query is similar to Query 6 in terms of class Student but it increases in the number of classes and properties and its selectivity is high.

Query 7 retrieves the list of students (Student) and courses (Course) such that these students take the courses (takesCourse) and that these courses are taught by faculty "http://www.Department0.University0.edu/AssociateProfessor0".

Query 7

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y
    WHERE
    {?X rdf:type ub:Student .
    ?Y rdf:type ub:Course .
    ?X ub:takesCourse ?Y .
    <http://www.Department0.University0.edu/AssociateProfessor0>,
        ub:teacherOf, ?Y}

In Rel, the query above is written as follows:

// install

def answer[:q7](x, y) {
    student(e_x, :id, x)
    and course(e_y, :id, y)
    and takes_course(e_x, e_y)
    and teacher_of(e_z, e_y)
    and faculty(e_z, :id, "http://www.Department0.University0.edu/AssociateProfessor0")
    from e_x, e_y, e_z
}

A sample output of Query 7 is shown below:

// query

top[5, answer[:q7]]
Loading query-7...

Query 8

This query is further more complex than Query 7 by including one more property.

Query 8 retrieves the list of students (Student), departments (Department) and email addresses (emailAddress) of the students, such that the student is a member of a department (memberOf) and the department is a suborganization of (subOrganizationOf) the university "http://www.University0.edu".

Query 8

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y, ?Z
    WHERE
    {?X rdf:type ub:Student .
    ?Y rdf:type ub:Department .
    ?X ub:memberOf ?Y .
    ?Y ub:subOrganizationOf <http://www.University0.edu> .
    ?X ub:emailAddress ?Z}

In Rel, the query above is written as follows:

// install

def answer[:q8](x, y, z) {
    student(e_x, :id, x)
    and department(e_y, :id, y)
    and member_of(e_x, e_y)
    and sub_organization_of(e_y, e_o)
    and organization(e_o, :id, "http://www.University0.edu")
    and student(e_x, :email, z)
    from e_x, e_y, e_o
}

A sample output of Query 8 is shown below:

// query

top[5, answer[:q8]]
Loading query-8...

Query 9

Besides the aforementioned features of class Student and the wide hierarchy of class Faculty, like Query 2, this query is characterized by the most classes and properties in the query set and there is a triangular pattern of relationships.

Query 9 retrieves the list of students (Student), faculty (Faculty), and courses (Courses) such that these students take the courses (takesCourse) taught by their advisors (teacherOf).

Query 9

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y, ?Z
    WHERE
    {?X rdf:type ub:Student .
    ?Y rdf:type ub:Faculty .
    ?Z rdf:type ub:Course .
    ?X ub:advisor ?Y .
    ?Y ub:teacherOf ?Z .
    ?X ub:takesCourse ?Z}

In Rel, the query above is written as follows:

// install

def answer[:q9](x, y, z) {
    student(e_x, :id, x)
    and faculty(e_y, :id, y)
    and course(e_z, :id, z)
    and advisor(e_x, e_y)
    and teacher_of(e_y, e_z)
    and takes_course(e_x, e_z)
    from e_x, e_y, e_z
}

A sample output of Query 9 is shown below:

// query

top[5, answer[:q9]]
Loading query-9...

Query 10

This query differs from Query 6, 7, 8 and 9 in that it only requires the (implicit) subClassOf relationship between GraduateStudent and Student, i.e., subClassOf relationship between UndergraduateStudent and Student does not add to the results.

Query 10 retrieves the list of the students (Student) who take (takesCourse) the course "http://www.Department0.University0.edu/GraduateCourse0".

Query 10

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Student .
    ?X ub:takesCourse
    <http://www.Department0.University0.edu/GraduateCourse0>}
        ?X ub:takesCourse ?Z}

In Rel, the query above is written as follows:

// install

def answer[:q10](x) {
    student(e_x, :id, x)
    and course(e_y, :id, "http://www.Department0.University0.edu/GraduateCourse0")
    and takes_course(e_x, e_y)
    from e_x, e_y
}

A sample output of Query 10 is shown below:

// query

answer[:q10]
Loading query-10...

Query 11

Query 11, 12 and 13 are intended to verify the presence of certain OWL reasoning capabilities in the system. In this query, property subOrganizationOf is defined as transitive. Since in the benchmark data, instances of ResearchGroup are stated as a suborganization of a Department individual and the later suborganization of a University individual, inference about the subOrgnizationOf relationship between instances of ResearchGroup and University is required to answer this query. Additionally, its input is small.

Query 11

Query 11 retrieves the list of research groups (ResearchGroup) that are suborganization of (subOrganizationOf) the university "http://www.University0.edu".

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:ResearchGroup .
    ?X ub:subOrganizationOf <http://www.University0.edu>}

In Rel, the query above is written as follows:

// install

def answer[:q11](x) {
    researchgroup(e_x, :id, x)
    and university(e_y, :id, "http://www.University0.edu")
    and sub_organization_of(e_x, e_y)
    from e_x, e_y
}

A sample output of Query 11 is shown below:

// query

top[5, answer[:q11]]
Loading query-11...

Query 12

The benchmark data do not produce any instances of class Chair. Instead, each Department individual is linked to the chair professor of that department by property headOf. Hence this query requires realization, i.e., inference that that professor is an instance of class Chair because he or she is the head of a department. Input of this query is small as well.

Query 12 retrieves the list of chairs (Chair) and their departments (Department) such that the chair works for (worksFor) the department and the department is a suborganization of (suborganizationOf) the university "http://www.University0.edu".

Query 12

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X, ?Y
    WHERE
    {?X rdf:type ub:Chair .
    ?Y rdf:type ub:Department .
    ?X ub:worksFor ?Y .
    ?Y ub:subOrganizationOf <http://www.University0.edu>}

In Rel, the query above is written as follows:

// install

def answer[:q12](x, y) {
    chair(e_x, :id, x)
    and department(e_y, :id, y)
    and university(e_z, :id, "http://www.University0.edu")
    and works_for(e_x, e_y)
    and sub_organization_of(e_y, e_z)
    from e_x, e_y, e_z
}

A sample output of Query 12 is shown below:

// query

top[5, answer[:q12]]
Loading query-12...

Query 13

Property hasAlumnus is defined in the benchmark ontology as the inverse of property degreeFrom, which has three subproperties: undergraduateDegreeFrom, mastersDegreeFrom, and doctoralDegreeFrom. The benchmark data state a person as an alumnus of a university using one of these three subproperties instead of hasAlumnus. Therefore, this query assumes subPropertyOf relationships between degreeFrom and its subproperties, and also requires inference about inverseOf.

Query 13 retrieves the list of people (Person) who are alumni (hasAlumnus) of university "http://www.University0.edu".

Query 13

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE
    {?X rdf:type ub:Person .
    <http://www.University0.edu> ub:hasAlumnus ?X}

In Rel, the query above is written as follows:

// install

def answer[:q13](x) {
    person(e_x, :id, x)
    and university(e_y, :id, "http://www.University0.edu")
    and has_alumnus(e_y, e_x)
    from e_x, e_y
}

The output of Query 13 is shown below:

// query

answer[:q13]
Loading query-13...

Query 14

This query is the simplest in the test set. This query represents those with large input and low selectivity and does not assume any hierarchy information or inference.

Query 14 retrieves the list of all the undergraduate students (UnderGraduateStudent).

Query 14

This is what the query looks like in SPARQL:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
    SELECT ?X
    WHERE {?X rdf:type ub:UndergraduateStudent}

In Rel, the query above is written as follows:

// install

def answer[:q14](x) {
    undergraduate_student(_, :id, x)
}

A sample output of Query 14 is shown below:

// query

top[5, answer[:q14]]
Loading query-14...

Query Validation

Lastly, you can validate the correctness of the LUBM queries using integrity constraints. See the Integrity Constraints concept guide for details.

You can perform the validation in three steps:

  1. Convert the answers in answer to Graph Normal Form (GNF).
  2. Confirm that all found answers are correct.
  3. Confirm that all answers have been found.

Here’s how to start with the GNF conversion:

// install

def answer_column_order[:q1] = {(:student_id, 1)}
def answer_column_order[:q2] = {(:student_id, 1); (:university_id, 2); (:department_id, 3)}
def answer_column_order[:q3] = {(:publication_id, 1)}
def answer_column_order[:q4] =
    {(:prof_id, 1); (:prof_name, 2); (:prof_email, 3); (:prof_tele, 4)}
def answer_column_order[:q5] = {(:person_id, 1)}
def answer_column_order[:q6] = {(:student_id, 1)}
def answer_column_order[:q7] = {(:student_id, 1); (:course_id, 2)}
def answer_column_order[:q8] = {(:student_id, 1); (:department_id, 2); (:student_email, 3)}
def answer_column_order[:q9] = {(:student_id, 1); (:faculty_id, 2); (:course_id, 3)}
def answer_column_order[:q10] = {(:student_id, 1)}
def answer_column_order[:q11] = {(:researchgroup_id, 1)}
def answer_column_order[:q12] = {(:chair_id, 1); (:dept_id, 2)}
def answer_column_order[:q13] = {(:person_id, 1)}
def answer_column_order[:q14] = {(:undergraduate_student_id, 1)}

def answer_gnf[q](row, column, value) =
    answer_column_order[q].(
        pivot[x... : hash[answer[q]](x..., row)]
    )(column, value)

To achieve this, first you define a relation answer_column_order, which captures the column names of all the queries. Next, you define answer_gnf, which converts all the query results into the Graph Normal Form.

The relation answer_gnf uses relations like hash and pivot to generate unique hash values for the answers and convert high-arity relations to high-cardinality relations. These relations are defined in the Rel Standard Library.

The validation of the results of Query 4 is shown below. The other query results can also be validated in a similar manner.

The first integrity constraint, all_answers_correct, checks that all found answers are present in the reference answers. The second integrity constraint, all_answers_found, checks that all reference answers have been indeed found. The details of equal can be found in the standard library.

// query

ic all_answers_correct(q, i, col) {
    answer_gnf(q, i, col, _) and
    q = :q4
    implies
    exists(j in first[answer_ref[q, col]]:
        equal(answer_gnf[q, i, col], answer_ref[q, col, j])
    )
}


ic all_answers_found(q, i, col) {
    i = first[answer_ref[q, col]] and
    q = :q4
    implies
    exists(j in first[answer_gnf[q]]:
        equal(answer_gnf[q, j, col], answer_ref[q, col, i])
    )
}

Note that if we removed the condition q = :q4 from both integrity constraints, the correctness of all queries (:q1 to :q14) will be checked together in one go.

Summary

This guide has demonstrated that Rel is a powerful modeling language when it comes to modeling data and expressing complex ontologies and their data. In particular, this guide has shown how naturally the OWL ontology of the LUBM benchmark can be modeled in Rel. You have also seen how SPARQL queries can be translated with great ease into Rel making the transition from RDF/OWL/SPARQL to Rel quite easy.

Furthermore, Rel offers modeling capabilities that support much more complex modeling and reasoning concepts that go well beyond the scope of this how-to guide and what was needed to model the LUBM ontology.