The Lehigh University Benchmark
This how-to guide demonstrates how to express reasoning using the Lehigh University Benchmark (LUBM).
Goal
The Lehigh University Benchmark (LUBM) (opens in a new tab) is a popular benchmark in the Web Ontology Language (OWL) domain designed to test the reasoning capabilities of a system. In this guide, you’ll learn how to express OWL constructs such as classes (opens in a new tab) and properties (opens in a new tab), as well as advanced concepts like hierachies and equivalent classes, in Rel.
You’ll build a knowledge graph of the data in the LUBM and use Rel to:
- Express concepts from the data as entity types and value types.
- Model properties and hierarchies as relations in Graph Normal Form (GNF).
- Express prescriptive reasoning concepts like equivalent classes or transitive properties.
- Translate LUBM test queries from SPARQL to Rel.
- Validate the query results using integrity constraints.
The Dataset
The LUBM defines a knowledge graph on synthetic data representing universities and their constituents, including students, faculty members, organizations, and departments.
For example:
- Every university in the LUBM knowledge graph consists of 15 to 25 departments.
- Each department has 7 to 10 full professors, one of which is the head of the department, 10 to 14 associate professors, and 8 to 11 assistant professors.
- Every faculty member teaches one or two courses
- Every department has three to four graduate students and 8 to 14 undergraduate students per faculty member.
The LUBM knowledge graph defines an ontology (opens in a new tab) that describes the various entities in the data and how they relate to one another. The benchmark tests a system’s ability to express hierarchies of entities, transitive and inverse properties, and other ontological concepts. A detailed profile of the LUBM data is available here (opens in a new tab).
Note: In this how-to guide, you’ll work with data from just one of the universities in the LUBM dataset.
There are nine CSV files in the LUBM dataset:
University.csv
.Department.csv
.Faculty.csv
.ResearchGroup.csv
.Publication.csv
.Course.csv
.Graduatestudent.csv
.Undergraduatestudent.csv
.edges.csv
.
The data for each entity class are stored in separate CSV files,
such as Faculty.csv
or Course.csv
.
Columns in the CSV files represent entity properties,
such as a faculty member’s name or a student’s email address.
Each CSV file for an entity class contains an id
column.
Entities in the LUBM dataset are identified with Uniform Resource Identifiers (URIs) (opens in a new tab).
The edges.csv
file stores relationships between entities in the knowledge graph.
In the next section, you’ll import these CSV files into a RAI database.
Data Import
The following code declares a module called configs
that contains configuration relations for each of the nine CSV files in the LUBM dataset:
// write query
def path = "s3://relationalai-documentation-public/lubm/sf0.1/"
module configs
def edges:path = concat[path, "edges.csv"]
def university:path = concat[path, "University.csv"]
def university:syntax:header {
(1, :id); (2, :name); (3, :type); (4, :uri)
}
def department:path = concat[path, "Department.csv"]
def department:syntax:header {
(1, :id); (2, :name); (3, :type); (4, :uri)
}
def faculty:path = concat[path, "Faculty.csv"]
def faculty:syntax:header {
(1, :id); (2, :name); (3, :email); (4, :type);
(5, :telephone); (6, :uri); (7, :research)
}
def researchgroup:path = concat[path, "ResearchGroup.csv"]
def researchgroup:syntax:header {
(1, :id); (2, :type); (3, :uri)
}
def publication:path = concat[path, "Publication.csv"]
def publication:syntax:header {
(1, :id); (2, :name); (3, :type); (4, :uri)
}
def course:path = concat[path, "Course.csv"]
def course:syntax:header {
(1, :id); (2, :name); (3, :type); (4, :uri)
}
def graduate_student:path = concat[path, "Graduatestudent.csv"]
def graduate_student:syntax:header {
(1, :id); (2, :research_assistant); (3, :name);
(4, :email); (5, :type); (6, :telephone); (7, :uri)
}
def graduate_student:schema = (:research_assistant, "boolean")
def undergraduate_student:path = concat[path, "Undergraduatestudent.csv"]
def undergraduate_student:syntax:header {
(1, :id); (2, :research_assistant); (3, :name);
(4, :email); (5, :type); (6, :telephone); (7, :uri)
}
def undergraduate_student:schema = (:research_assistant, "boolean")
end
def insert:lubm_csv[config] = load_csv[configs[config]]
Each configuration relation in the configs
module defines the path to and the header information for each CSV file in the LUBM dataset.
For instance, the configs:undergraduate_student
relation sets the path to the Undergraduatestudent.csv
file,
defines the order and names of seven fields in the CSV file header,
and sets the datatype for the :research_assistant
field to "boolean"
.
The load_csv[]
relation loads the CSV data,
and insert
stores the data in a base relation called lubm_csv
. This relation is in GNF:
Values are indexed by the name, column, and row number of their corresponding CSV file.
For example, here are the first three rows of CSV data for the University
class as they appear in the lubm_csv:university
relation:
// model
@inline
def top_csv[n, CSV](col, row, val) {
CSV(col, row, val)
and top[n, second[CSV]](_, row)
}
// read query
def output = top_csv[3, lubm_csv:university]
Note: The top_csv
relation is installed and persisted in the database so that it can be reused in subsequent queries.
The relation top_csv
mimics top
in the Standard Library.
Instead of returning the top n
tuples, however, top_csv
returns all tuples corresponding to the top n
rows of the CSV file.
The table structure of the CSV file can be viewed using the table
relation:
// read query
def output = ::std::display::table[top_csv[3, lubm_csv:university]]
Now that you’ve imported the LUBM data, the next step is to represent the LUBM ontology as a knowledge graph.
Ontology
OWL ontologies have a universal class (opens in a new tab) called Thing
to which all entities belong.
You can encode this in Rel by defining an entity type called Thing
inside of a module called LUBM
:
// model
module LUBM
entity type Thing = String
end
Instances of Thing
are constructed via strings.
Entities in the LUBM are identified with URIs, so it makes sense to use the URI string.
For instance, ^Thing["http://www.University0.edu"]
creates a Thing
instance representing the university with URI http://www.University0.edu
.
Note: In this guide, you’ll add more relations to the LUBM
module with multiple module
declarations. This helps facilitate discussion of the code. In practice, however, you would typically write the LUBM
module in a single declaration.
In order to model the LUBM ontology in Rel, you need to transform the lubm_csv
data into relations involving Thing
instances.
Data Transformation
In this section, you’ll write a template module called Transform
that transforms the data in the lubm_csv
base relation into relations involving entity and value types.
Entities
The Transform:filename_to_entity
relation builds Thing
instances from the URI strings in the id
column for a given CSV file:
// model
@outline
module Transform[CSV]
def filename_to_entity(csv_name, e) {
e = LUBM:^Thing[CSV[csv_name, :id, _]]
}
end
Important: The Transform
module is parameterized so that it can be used on any relation with a structure similar to lubm_csv
.
This means that the Transform
module must be annotated with the @outline
annotation.
The relation filename_to_entity
relates entities to the name of the CSV file that their URI came from.
In other words, filename_to_entity:university
is the set of all entities from the University.csv
file:
// read query
@inline
def T = Transform[lubm_csv]
def output = top[5, T:filename_to_entity:university]
You can use filename_to_entity
to create a Thing
entity relation containing the entity hashes for every URI in the LUBM dataset:
// model
module LUBM
with Transform[lubm_csv] use filename_to_entity
def Thing(e) { filename_to_entity(_, e) }
end
Next, you’ll transform the values from the columns of each CSV file into value types and assign them to the right entity.
Properties
Each entity corresponds to a row of data in one of the LUBM CSV files.
Each column in the row represents a property of that entity. For example, University
entities have :id
, :name
, :type
, and :uri
properties.
The Transform:assign_property
relation assigns entities to values for their properties:
// model
@outline
module Transform[CSV]
@ondemand
def assign_property[property_name, PropertyType](e, v) {
exists(csv_name, property_val, filepos, id :
CSV(csv_name, property_name, filepos, property_val)
and CSV(csv_name, :id, filepos, id)
and LUBM:^Thing(id, e)
and PropertyType(property_val, v)
)
}
end
This property assignment not only associates a property value with its entity, but also gives the property value a semantic meaning.
It does so by first converting the property value into a value type via v = PropertyType[property_val]
.
Later in this guide, you’ll define value types for the various properties that entities can have.
Here’s an example of how to use assign_property
to give the URI values semantic meaning and assign them to their corresponding entities:
// read query
@inline
def T = Transform[lubm_csv]
value type Uri = String
def has_uri = T:assign_property[:uri, ^Uri]
def output = top[5, has_uri]
The relation assign_property
is useful for transforming string-
and numeric-valued properties into value types,
but it isn’t ideal for properties with boolean values.
Boolean Properties
The :research_assistant
property is boolean.
It can either have the value boolean_true
or boolean_false
.
It’s idiomatic in Rel to store only facts that are true,
rather than define a boolean value type and track true and false values.
To facilitate this, the Transform:assign_boolean_property
relation contains the entity keys only for entities that have the property with the name property_name
:
// model
@outline
module Transform[CSV]
@ondemand
def assign_boolean_property(property_name, e) {
exists(csv_name, filepos, id:
CSV(csv_name, property_name, filepos, boolean_true)
and CSV(csv_name, :id, filepos, id)
and LUBM:^Thing(id, e)
)
}
end
For example, the following query displays the hashes of five entities with the :research_assistant
property:
// read query
@inline
def T = Transform[lubm_csv]
def output = top[5, T:assign_boolean_property:research_assistant]
Now that you have relations for transforming data into entity and value types, you can write a relation to link data together with edges.
Edges
The final relation to install in the Transform
module transforms the data in the edges
CSV file into edges in the LUBM knowledge graph:
// model
@outline
module Transform[CSV]
def make_edge[edge_name](e_source, e_target) {
exists(source_id, target_id, filepos:
CSV:edges(:TYPE, filepos, edge_name)
and CSV:edges(:SOURCE, filepos, source_id)
and CSV:edges(:TARGET, filepos, target_id)
and LUBM:^Thing(source_id, e_source)
and LUBM:^Thing(target_id, e_target)
)
}
end
Edges in the lubm_csv
relation are defined with two tuples:
(:SOURCE, filepos, source_id)
tuples define the source ID of an edge.filepos
is the row number in the CSV file andsource_id
is the URI of the source entity.(:TARGET, filepos, target_id)
tuples define the target ID for an edge.
A :SOURCE
tuple and a :TARGET
tuple with the same filepos
describe the same edge, which points from the source entity towards the target entity.
Edges have names.
For example, "takesCourse"
edges relate student entities to course entities for courses in which the student is enrolled:
// read query
@inline
def T = Transform[lubm_csv]
def output = top[5, T:make_edge["takesCourse"]]
In Edge Definitions, you’ll use make_edges
to link the LUBM data together.
But first, you must define the various entities and properties that make up the nodes of the LUBM knowledge graph.
Entity Definitions
Everything to do with the LUBM knowledge graph is declared in a module called LUBM
.
You can start by defining entity relations for each of the entity classes in the LUBM dataset:
// model
module LUBM
with Transform[lubm_csv] use filename_to_entity
def GraduateStudent = filename_to_entity[:graduate_student]
def UndergraduateStudent = filename_to_entity[:undergraduate_student]
def Course = filename_to_entity[:course]
def University = filename_to_entity[:university]
def Department = filename_to_entity[:department]
def Publication = filename_to_entity[:publication]
def Faculty = filename_to_entity[:faculty]
def ResearchGroup = filename_to_entity[:researchgroup]
end
Every entity relation in LUBM
is a subset of Thing
:
// read query
def output = subset[LUBM:GraduateStudent, LUBM:Thing]
Each entity relation also represents a distinct class of entities.
In other words, the entity relations in LUBM
are disjoint:
// read query
def output = disjoint[LUBM:GraduateStudent, LUBM:UndergraduateStudent]
The next step is to pair entities in LUBM
with their properties.
Property Definitions
In Data Import you defined a configs
module with the header syntax for each CSV file in the LUBM dataset.
There are eight different properties that entities may have:
:id
.:type
.:uri
.:name
.:email
.:telephone
.:research
.:research_assistant
.
Add seven value types to the LUBM
module to represent all of the properties, except :research_assistant
.
You can also define seven property relations that pair entity hashes with the values of their properties:
// model
module LUBM
with Transform[lubm_csv] use assign_property
value type Id = String
value type Type = String
value type Uri = String
value type Name = String
value type Email = String
value type Telephone = String
value type Research = String
def has_id = assign_property[:id, ^Id]
def has_type = assign_property[:type, ^Type]
def has_uri = assign_property[:uri, ^Uri]
def has_name = assign_property[:name, ^Name]
def has_email = assign_property[:email, ^Email]
def has_telephone = assign_property[:telephone, ^Telephone]
def researches = assign_property[:research, ^Research]
end
Here, :research_assistant
is a boolean property, so you can use the boolean_property
relation to get all of the entity hashes for which the property is boolean_true
:
// model
module LUBM
with Transform[lubm_csv] use assign_boolean_property
def ResearchAssistant = assign_boolean_property[:research_assistant]
end
Just like the LUBM
entity relations, ResearchAssistant
is a subset of Thing
:
// read query
def output = subset[LUBM:ResearchAssistant, LUBM:Thing]
However, ResearchAssistant
isn’t disjoint from all of the other entity relations.
For example, some GraduateStudent
entities are also members of ResearchAssistant
:
// read query
def grad_research_assistants = intersect[LUBM:GraduateStudent, LUBM:ResearchAssistant]
def output = top[5, grad_research_assistants]
You’ll add more hierachical relationships between entities later in this guide.
Edge Definitions
You can use the Transform:make_edge
relation to define edges in the LUBM
graph:
// model
module LUBM
with Transform[lubm_csv] use make_edge
def takes_course = make_edge["takesCourse"]
def member_of = make_edge["memberOf"]
def sub_organization_of = make_edge["subOrganizationOf"]
def undergraduate_degree_from = make_edge["undergraduateDegreeFrom"]
def graduate_degree_from = make_edge["mastersDegreeFrom"]
def phd_degree_from = make_edge["doctoralDegreeFrom"]
def publication_author = make_edge["publicationAuthor"]
def works_for = make_edge["worksFor"]
def advisor = make_edge["advisor"]
def head_of = make_edge["headOf"]
def teacher_of = make_edge["teacherOf"]
end
Each edge relation is a binary relation named after the type of edge in the LUBM dataset.
For example, the takes_course
relation contains takesCourse
edges of the form (e1, e2)
where e1
is an entity hash representing a student who takes the course represented by the e2
hash:
// read query
def output = top[5, LUBM:takes_course]
Entity hashes aren’t very useful to humans. In the next section, you’ll see how to display user-friendly strings for entities and value types.
Displaying Entities
When you inspect query results, it’s more useful to see string identifiers for entities, rather than their hash.
To that end, you can define a show
relation in the LUBM
module for displaying entity and value types:
// model
module LUBM
def show[e in LUBM:Thing](str) { LUBM:has_uri(e, LUBM:^Uri[str]) }
def show[name in LUBM:Name](str) { LUBM:^Name(str, name) }
def show[email in LUBM:Email](str) { LUBM:^Email(str, email) }
def show[tel in LUBM:Telephone](str) { LUBM:^Telephone(str, tel) }
end
Here, show
is split into four definitions.
The first maps an entity hash in LUBM:Thing
to its URI in the LUBM dataset.
The remaining definitions for show
map Name
, Email
, and Telephone
values to the strings used to construct them.
Note: There are seven value types defined in the LUBM
module,
but only three of them
— Name
, Email
, and Telephone
—
are handled by the show
relation.
You may want to write a show
definition for every entity
and value type in your model.
In this guide, show
is only defined for the value types
used in the LUBM test queries.
For example, the following query displays the email addresses of five entities:
// read query
def output(uri, email) {
exists(e, v:
top[5, LUBM:has_email](_, e, v)
and LUBM:show(e, uri)
and LUBM:show(v, email)
)
}
Now that all of the entities, properties, and edges have been implemented, you can apply the rules related to the reasoning capabilities that the LUBM tests.
Reasoning
In OWL, reasoning is the process of inferring new facts about an individual based on the ontology and the data. One of the main goals of the LUBM benchmark is to test the reasoning capability of the underlying system.
This section shows how some of the main reasoning requirements of the LUBM ontology can be realized in Rel. In particular, it covers:
- Entity Hierarchies.
- Equivalent Classes.
- Edge Hierarchies.
- Transitive Properties.
- Inverse Properties
- Data Consistency.
Entity Hierarchies
Every entity in the LUBM is a member of Thing
.
In other words, classes of entities like UndergraduateStudent
and Course
are all subentities of Thing
.
There are many such hierarchies in the LUBM ontology.
For example, entities in the Faculty
class have a :type
property with four possible values:
// read query
def output = LUBM:Faculty . LUBM:has_type
You can define new subentities based on these types:
// model
module LUBM
def AssistantProfessor(e) { Faculty(e) and has_type(e, ^Type["AssistantProfessor"]) }
def AssociateProfessor(e) { Faculty(e) and has_type(e, ^Type["AssociateProfessor"]) }
def FullProfessor(e) { Faculty(e) and has_type(e, ^Type["FullProfessor"]) }
def Lecturer(e) { Faculty(e) and has_type(e, ^Type["Lecturer"]) }
end
Here are the URIs of some AssistantProfessor
entities:
// read query
def output(uri) {
exists(e:
top[5, LUBM:AssistantProfessor](_, e)
and uri = LUBM:show[e]
)
}
The following are other hierarchies from the LUBM ontology that need to be modeled:
University
,Department
, andResearchGroup
are subentitiesOrganization
.Student
andFaculty
are subentities ofPerson
.AssistantProfessor
,AssociateProfessor
, andFullProfessor
are subentities ofProfessor
.
In Rel, you can model these three hierarchies as the union of previously defined entity relations:
// model
module LUBM
def Organization = University; Department; ResearchGroup
def Person = GraduateStudent; UndergraduateStudent; Faculty
def Professor = AssistantProfessor; AssociateProfessor; FullProfessor
end
Here, ;
acts as the union operator.
For more information about ;
see
Semicolon
in the Rel Reference manual.
Organization
is defined as the union of University
, Department
, and ResearchGroup
.
It contains all the universities, departments, and research groups defined earlier.
Person
and Professor
are defined similarly.
Equivalent Classes
In OWL, some classes are defined with necessary and sufficient conditions. These classes are called equivalent classes. In the LUBM ontology, there are two equivalent classes needed for answering the LUBM queries:
- Every
UndergraduateStudent
is aStudent
, and anyPerson
whotakes_course
is also aStudent
. - A
Chair
is anyPerson
who is thehead_of
aDepartment
.
You can model the Student
and Chair
classes in Rel as follows:
// model
module LUBM
def Student(e) {
UndergraduateStudent(e)
or (Person(e) and takes_course(e, _))
}
def Chair(e) { Person(e) and head_of(e, _) }
end
Here are the URIs of five entities in the Chair
class:
// read query
def output(uri) {
exists(e:
top[5, LUBM:Chair](_, e)
and LUBM:show(e, uri)
)
}
Edge Hierarchies
Just like entity relations, edge relations may have a hierarchy.
For instance, the LUBM ontology defines a superproperty called degree_from
that contains all of the undergratuate_degree_from
, graduate_degree_from
,
and phd_degree_from
edges.
You can model this in Rel as the union of the three edge relations:
// model
module LUBM
def degree_from {
undergraduate_degree_from;
graduate_degree_from;
phd_degree_from
}
end
The LUBM ontology also defines some subproperties. The two subproperties needed for the LUBM test queries are:
works_for
, which is a subproperty ofmember_of
.head_of
, which is a subproperty ofworks_for
.
This makes sense, because a person who works_for
an organization is also a member_of
that organization,
and the person who is the head_of
an organization must also work_for
that organization.
Here’s how to model this in Rel:
// model
module LUBM
def member_of = works_for
def works_for = head_of
end
Important: In Rel, =
doesn’t work as an assignment operator like it does in many other languages.
In this context, def member_of = works_for
states that member_of
contains works_for
, but doesn’t guarantee the converse.
The same observation applies to def works_for = head_of
.
See Multiple Definitions in the Rel Reference manual for more details.
In Data Consistency, you’ll add integrity constraints that enforce this hierarchy on any new data.
Transitive Properties
If organization A is a sub_organization_of
organization B,
and organization B is a sub_organization_of
organization C,
then organization A should also be a sub_organization_of
organization C.
That is, the sub_organization_of
relation should be transitive, and indeed the LUBM ontology requires this.
While the CSV data do contain sub_organization_of
,
they don’t contain all of the edges required for sub_organization_of
to be transitive.
You can see this when you check the number of tuples in sub_organization_of
:
// read query
def output = count[LUBM:sub_organization_of]
You can infer which tuples are missing and add them to sub_organization_of
using Rel’s dot join operator:
// model
module LUBM
def sub_organization_of(x, y) {
(sub_organization_of.sub_organization_of)(x, y)
}
end
Checking the count of tuples in sub_organization_of
again reveals that the number of tuples has nearly doubled:
// read query
def output = count[LUBM:sub_organization_of]
Inverse Properties
The LUBM ontology defines a property has_alumnus
that links University
entities
with Person
entities that earned a degree from that University
.
The relation has_alumnus
isn’t one of the edge types in the edges.csv
file you imported earlier.
You must infer it from the degree_from
property described in the data.
In particular, has_alumnus
is the inverse of degree_from
.
That is, if (person, university)
is a tuple in degree_from
,
then (university, person)
is a tuple in has_alumnus
.
You can accomplish this in Rel using the built-in transpose relation:
// model
module LUBM
def has_alumnus = transpose[degree_from]
end
Data Consistency
The rules specified in the LUBM module describe the LUBM ontology, but there’s nothing in place to ensure that future changes to the data maintain this description. You can enforce the ontology on all database transactions by installing integrity constraints in the database.
Warning: You can install integrity constraints in the LUBM because it’s not parameterized. Integrity constraints in parameterized modules are currently not supported.
For example, the following integrity constraint ensures that the degree_from
relation maps Person
entities to University
entities:
// model
module LUBM
ic degree_from_type(e1, e2) {
degree_from(e1, e2)
implies
Person(e1) and University(e2)
}
end
Here, degree_from_type
checks that all pairs (e1, e2)
in the degree_from
relation contain a
Person
entity in the first position and a University
entity in the second position.
However, this integrity constraint doesn’t ensure that degree_from
only contains tuples of length two.
Note: To learn more about writing integrity constraints, including how to enforce a relation’s arity, see the Integrity Constraints concept guide.
You can also write an integrity constraint to ensure that has_alumnus
is always the inverse of degree_from
:
// model
module LUBM
ic degree_from_inverse(person, university) {
degree_from(person, university)
iff
has_alumnus(university, person)
}
end
The keyword iff
is short for “if and only if.”
It’s used here to check that for every pair in degree_from
the inverse pair exists in has_alumnus
and vice versa.
You can also ensure that head_of
is a subset of works_for
:
// model
module LUBM
ic head_works_for(person, organization) {
head_of(person, organization)
implies
works_for(person, organization)
}
end
The three integrity constraints described here illustrate how to enforce rules on every transaction. They do not completely enforce the LUBM ontology. With the right integrity constraints, though, you can guarantee that every transaction maintains the structure described by the LUBM ontology.
Test Queries
The original 14 LUBM test queries are specified in SPARQL. In this section, you’ll write the test queries in Rel.
Each query is presented with its original explanation quoted from the LUBM SPARQL queries (opens in a new tab) document. The original SPARQL query is shown here so that it can be compared to Rel.
Query 1 presents two ways to translate the SPARQL query into Rel. The first shows how to translate the SPARQL query line-by-line into Rel, and the second shows how to write the Rel query more idiomatically. The remaining queries present only the idiomatic translation.
Additionally, each query is visualized as a subgraph in the LUBM knowledge graph. The diagram for each query is adapted from the diagrams provided by the authors (opens in a new tab) of the LUBM.
Query 1
This query bears large input and high selectivity. It queries just one class and one property and does not assume any hierarchy information or inference.
Query 1 retrieves graduate students (GraduateStudent
) who take the course (takesCourse
) with the ID "http://www.Department0.University0.edu/GraduateCourse0"
.
Note: In the description above, the terms GraduateStudent
and takesCourse
refer to their concepts in the LUBM ontology.
You can model these in Rel with the LUBM:GraduateStudent
and LUBM:takes_course
relations.
In terms of the LUBM knowledge graph, Query 1 asks for nodes ?X
that are of type GraduateStudent
and are the source node in a takesCourse
edge
with a specific target entity.
The following diagram illustrates this:
Here’s what the original SPARQL query looks like:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X
WHERE
{?X rdf:type ub:GraduateStudent .
?X ub:takesCourse
http://www.Department0.University0.edu/GraduateCourse0}
The query returns the IDs for all GraduateStudent
entities that satisfy the query.
In Rel, you can write Query 1 as follows:
// model
def answer:q1(x) {
exists(course:
LUBM:GraduateStudent(x)
and LUBM:takes_course(x, course)
and LUBM:has_id(course, LUBM:^Id["http://www.Department0.University0.edu/GraduateCourse0"])
)
}
There are a few things to note about the Rel definition above:
- The first three lines in the body of the definition are translated almost
directly from the three lines of the
WHERE
clause in the SPARQL query. - The
answer:q1
relation contains entity hashes, whereas the SPARQL query returns URI identifiers. Entity hashes are the preferred way to identify entities in Rel. answer:q1
is installed so that you can compare the Rel results with the test query answer set that you imported in the Data Import section.
Here are the answers Rel computed for Query 1:
// read query
def output(uri) {
exists(e:
answer:q1(e)
and LUBM:show(e, uri)
)
}
You can write Query 1 more compactly in Rel,
although the code no longer translates line-by-line from SPARQL.
For example, you can bind the e_student
parameter to LUBM:GraduateStudent
inside of the
definition’s header, and you could leverage LUBM:^Thing
to get the entity hash for the course:
// read query
// Alternative way to write Query 1 in Rel
def q1_alt(x in LUBM:GraduateStudent) {
LUBM:takes_course(x, LUBM:^Thing["http://www.Department0.University0.edu/GraduateCourse0"])
}
def output(uri) {
exists(e: q1_alt(e) and uri = LUBM:show[e])
}
You can see that q1_alt
contains the same four GraduateStudent
entities as answer:q1
.
Query 2
This query increases in complexity: Three classes and three properties are involved. Additionally, there is a triangular pattern of relationships between the objects involved.
Query 2 retrieves the list of graduate students (GraduateStudent
), universities (University
), and departments (Department
), such that the graduate student is a member of a department (memberOf
), where this Department is a part of the University (subOrganization
) from which the graduate student obtained an undergraduate degree (undergraduateDegreeFrom
).
The result of Query 2 is the set of all triples ?X
, ?Y
, and ?Z
that have the following structure in the LUBM knowledge graph:
This is what the query looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X, ?Y, ?Z
WHERE
{?X rdf:type ub:GraduateStudent .
?Y rdf:type ub:University .
?Z rdf:type ub:Department .
?X ub:memberOf ?Z .
?Z ub:subOrganizationOf ?Y .
?X ub:undergraduateDegreeFrom ?Y}
You can write Query 2 in Rel as follows:
// model
def answer:q2(x in LUBM:Student, y in LUBM:University, z in LUBM:Department) {
LUBM:member_of(x, z)
and LUBM:sub_organization_of(z, y)
and LUBM:undergraduate_degree_from(x, y)
}
Query 3
This query is similar to Query 1 but class
Publication
has a wide hierarchy.
Query 3 retrieves the list of publications (Publication
), where the author (PublicationAuthor
) is "http://www.Department0.University0.edu/AssistantProfessor0"
.
The result of Query 3 is the set of all ?X
that have the following structure in the LUBM knowledge graph:
This is what the query looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X
WHERE
{?X rdf:type ub:Publication .
?X ub:publicationAuthor
http://www.Department0.University0.edu/AssistantProfessor0}
Here’s how to write Query 3 in Rel:
// model
def answer:q3(x in LUBM:Publication) {
LUBM:publication_author(x, LUBM:^Thing["http://www.Department0.University0.edu/AssistantProfessor0"])
}
Query 4
This query has small input and high selectivity. It assumes
subClassOf
relationship betweenProfessor
and its subclasses. ClassProfessor
has a wide hierarchy. Another feature is that it queries about multiple properties of a single class.
Query 4 retrieves the ID, name, email address, and telephone of professors (Professor
) who work for (workFor
) the department "http://www.Department0.University0.edu"
.
The result of Query 4 is the set of all ?X
, Y1
, Y2
, and Y3
that have the following structure in the LUBM knowledge graph:
This is what the query looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X, ?Y1, ?Y2, ?Y3
WHERE
{?X rdf:type ub:Professor .
?X ub:worksFor <http://www.Department0.University0.edu> .
?X ub:name ?Y1 .
?X ub:emailAddress ?Y2 .
?X ub:telephone ?Y3}
Here’s how to write Query 4 in Rel:
// model
def answer:q4(x in LUBM:Professor, y1 in LUBM:Name, y2 in LUBM:Email, y3 in LUBM:Telephone) {
LUBM:works_for(x, LUBM:^Thing["http://www.Department0.University0.edu"])
and LUBM:has_name(x, y1)
and LUBM:has_email(x, y2)
and LUBM:has_telephone(x, y3)
}
Query 5
This query assumes
subClassOf
relationship betweenPerson
and its subclasses andsubPropertyOf
relationship betweenmemberOf
and its subproperties. Moreover, classPerson
features a deep and wide hierarchy.
Query 5 retrieves the list of people (Person
) who are members of (memberOf
) the department "http://www.Department0.University0.edu"
.
The result of Query 5 is the set of all ?X
that have the following structure in the LUBM knowledge graph:
Here’s the SPARQL query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X
WHERE
{?X rdf:type ub:Person .
?X ub:memberOf <http://www.Department0.University0.edu>}
You can write the query above in Rel as follows:
// model
def answer:q5(p in LUBM:Person) {
LUBM:member_of(p, LUBM:^Thing["http://www.Department0.University0.edu"])
}
Query 6
This query queries about only one class. But it assumes both the explicit
subClassOf
relationship betweenUndergraduateStudent
andStudent
and the implicit one betweenGraduateStudent
andStudent
. In addition, it has large input and low selectivity.
Query 6 retrieves all the individuals that are of class Student
.
The result of Query 6 is the set of all ?X
that have the following structure in the LUBM knowledge graph:
This is what the query looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X WHERE {?X rdf:type ub:Student}
You can write the query above in Rel as follows:
// model
def answer:q6 = LUBM:Student
Query 7
This query is similar to Query 6 in terms of class
Student
but it increases in the number of classes and properties and its selectivity is high.
Query 7 retrieves the list of students (Student
) and courses (Course
)
such that these students take the courses (takesCourse
)
and that these courses are taught by faculty "http://www.Department0.University0.edu/AssociateProfessor0"
.
The result of Query 7 is the set of all ?X
and ?Y
with the following structure in the LUBM knowledge graph:
Here’s the SPARQL query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X, ?Y
WHERE
{?X rdf:type ub:Student .
?Y rdf:type ub:Course .
?X ub:takesCourse ?Y .
<http://www.Department0.University0.edu/AssociateProfessor0>,
ub:teacherOf, ?Y}
You can write the query above in Rel as follows:
// model
def answer:q7(x in LUBM:Student, y in LUBM:Course) {
LUBM:takes_course(x, y)
and LUBM:teacher_of(LUBM:^Thing["http://www.Department0.University0.edu/AssociateProfessor0"], y)
}
Query 8
This query is further more complex than Query 7 by including one more property.
Query 8 retrieves the list of students (Student
), departments (Department
), and email addresses (emailAddress
) of the students, such that the student is a member of the department (memberOf
) and the department is a suborganization of (subOrganizationOf
) the university "http://www.University0.edu"
.
The result of Query 8 is the set of all ?X
, ?Y
, and ?Z
with the following structure in the LUBM knowledge graph:
Here’s what the query looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X, ?Y, ?Z
WHERE
{?X rdf:type ub:Student .
?Y rdf:type ub:Department .
?X ub:memberOf ?Y .
?Y ub:subOrganizationOf <http://www.University0.edu> .
?X ub:emailAddress ?Z}
In Rel, you can write the query above as follows:
// model
def answer:q8(x in LUBM:Student, y in LUBM:Department, z in LUBM:Email) {
LUBM:member_of(x, y)
and LUBM:sub_organization_of(y, LUBM:^Thing["http://www.University0.edu"])
and LUBM:has_email(x, z)
}
Query 9
Besides the aforementioned features of class
Student
and the wide hierarchy of classFaculty
, like Query 2, this query is characterized by the most classes and properties in the query set and there is a triangular pattern of relationships.
Query 9 retrieves the list of students (Student
), faculty (Faculty
), and courses (Courses
)
such that these students take the courses (takesCourse
) taught by their advisors (teacherOf
).
The result of Query 9 is the set of all ?X
, ?Y
, and ?Z
with the following structure in the LUBM knowledge graph:
This is what the query looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X, ?Y, ?Z
WHERE
{?X rdf:type ub:Student .
?Y rdf:type ub:Faculty .
?Z rdf:type ub:Course .
?X ub:advisor ?Y .
?Y ub:teacherOf ?Z .
?X ub:takesCourse ?Z}
Here’s how to write the query above in Rel:
// model
def answer:q9(x in LUBM:Student, y in LUBM:Faculty, z in LUBM:Course) {
LUBM:advisor(x, y)
and LUBM:teacher_of(y, z)
and LUBM:takes_course(x, z)
}
Query 10
This query differs from Query 6, 7, 8 and 9 in that it only requires the (implicit)
subClassOf
relationship betweenGraduateStudent
andStudent
, i.e.,subClassOf
relationship betweenUndergraduateStudent
andStudent
does not add to the results.
Query 10 retrieves the list of the students (Student
) who take (takesCourse
) the course "http://www.Department0.University0.edu/GraduateCourse0"
.
The result of Query 10 is the set of all ?X
with the following structure in the LUBM knowledge graph:
Here’s the SPARQL query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X
WHERE
{?X rdf:type ub:Student .
?X ub:takesCourse
<http://www.Department0.University0.edu/GraduateCourse0>}
?X ub:takesCourse ?Z}
You can write the query above in Rel as follows:
// model
def answer:q10(x in LUBM:Student) {
LUBM:takes_course(x, LUBM:^Thing["http://www.Department0.University0.edu/GraduateCourse0"])
}
Query 11
Query 11, 12 and 13 are intended to verify the presence of certain OWL reasoning capabilities in the system. In this query, property
subOrganizationOf
is defined as transitive. Since in the benchmark data, instances ofResearchGroup
are stated as a suborganization of aDepartment
individual and the later suborganization of aUniversity
individual, inference about thesubOrgnizationOf
relationship between instances ofResearchGroup
andUniversity
is required to answer this query. Additionally, its input is small.
Query 11 retrieves the list of research groups (ResearchGroup
) that are suborganizations of (subOrganizationOf
) the university "http://www.University0.edu"
.
The result of Query 11 is the set of all ?X
with the following structure in the LUBM knowlege graph:
You can write Query 11 in SPARQL as follows:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X
WHERE
{?X rdf:type ub:ResearchGroup .
?X ub:subOrganizationOf <http://www.University0.edu>}
Here’s how to write Query 11 in Rel:
// model
def answer:q11(x in LUBM:ResearchGroup) {
LUBM:sub_organization_of(x, LUBM:^Thing["http://www.University0.edu"])
}
Query 12
The benchmark data do not produce any instances of class
Chair
. Instead, eachDepartment
individual is linked to the chair professor of that department by propertyheadOf
. Hence this query requires realization, i.e., inference that that professor is an instance of classChair
because he or she is the head of a department. Input of this query is small as well.
Query 12 retrieves the list of chairs (Chair
) and their departments (Department
) such that the chair works for (worksFor
) the department and
the department is a suborganization of (suborganizationOf
) the university "http://www.University0.edu"
.
The result of Query 12 is the set of all ?X
and ?Y
with the following structure in the LUBM knowledge graph:
This is what Query 12 looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X, ?Y
WHERE
{?X rdf:type ub:Chair .
?Y rdf:type ub:Department .
?X ub:worksFor ?Y .
?Y ub:subOrganizationOf <http://www.University0.edu>}
In Rel, you can write Query 12 as follows:
// model
def answer:q12(x in LUBM:Chair, y in LUBM:Department) {
LUBM:works_for(x, y)
and LUBM:sub_organization_of(y, LUBM:^Thing["http://www.University0.edu"])
}
Query 13
Property
hasAlumnus
is defined in the benchmark ontology as the inverse of propertydegreeFrom
, which has three subproperties:undergraduateDegreeFrom
,mastersDegreeFrom
, anddoctoralDegreeFrom
. The benchmark data state a person as an alumnus of a university using one of these three subproperties instead ofhasAlumnus
. Therefore, this query assumessubPropertyOf
relationships betweendegreeFrom
and its subproperties, and also requires inference aboutinverseOf
.
Query 13 retrieves the list of people (Person
) who are alumni (hasAlumnus
) of university "http://www.University0.edu"
.
The result of Query 13 is the set of all ?X
with the following structure in the LUBM knowlege graph:
This is what the query looks like in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X
WHERE
{?X rdf:type ub:Person .
<http://www.University0.edu> ub:hasAlumnus ?X}
Here’s how to write the query above in Rel:
// model
def answer:q13(x in LUBM:Person) {
LUBM:has_alumnus(LUBM:^Thing["http://www.University0.edu"], x)
}
Query 14
This query is the simplest in the test set. This query represents those with large input and low selectivity and does not assume any hierarchy information or inference.
Query 14 retrieves the list of all the undergraduate students (UnderGraduateStudent
).
The result of Query 14 is the set of all ?X
with the following structure in the LUBM knowledge graph:
Here’s how to write Query 14 in SPARQL:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X
WHERE {?X rdf:type ub:UndergraduateStudent}
You can write Query 14 in Rel as follows:
// model
def answer:q14(x) = LUBM:UndergraduateStudent(x)
Query Validation
In this section you’ll write Rel code that checks that the query results
computed by the system match the answers provided in the LUBM benchmark.
After you import the CSV files containing the benchmark answers into a base relation called answer_ref
,
you’ll perform the validation in two steps:
- Convert the
answer
relation to Graph Normal Form (GNF) so that it matches the structure of theanswer_ref
base relation. - Confirm that all of the results match the reference answers using integrity constraints.
Data Import
The following declarations import the reference answers to the test queries and store them as a base relation called answers_ref
:
// write query
def path_answer = "s3://relationalai-documentation-public/lubm/answers/"
module answer_configs
def q1:path = concat[path_answer, "answers_query1.txt"]
def q1:syntax:header = (1, :student_id)
def q2:path = concat[path_answer, "answers_query2.txt"]
def q2:syntax:header {
(1, :student_id); (2, :university_id); (3, :department_id)
}
def q2:syntax:delim = '\t'
def q3:path = concat[path_answer, "answers_query3.txt"]
def q3:syntax:header = (1, :publication_id)
def q4:path = concat[path_answer, "answers_query4.txt"]
def q4:syntax:header {
(1, :prof_id); (2, :prof_name); (3, :prof_email); (4, :prof_tele)
}
def q4:syntax:delim = '\t'
def q5:path = concat[path_answer, "answers_query5.txt"]
def q5:syntax:header = (1, :person_id)
def q6:path = concat[path_answer, "answers_query6.txt"]
def q6:syntax:header = (1, :student_id)
def q7:path = concat[path_answer, "answers_query7.txt"]
def q7:syntax:header = (1, :student_id); (2, :course_id)
def q7:syntax:delim = '\t'
def q8:path = concat[path_answer, "answers_query8.txt"]
def q8:syntax:header {
(1, :student_id); (2, :department_id); (3, :student_email)
}
def q8:syntax:delim = '\t'
def q9:path = concat[path_answer, "answers_query9.txt"]
def q9:syntax:header {
(1, :student_id); (2, :faculty_id); (3, :course_id)
}
def q9:syntax:delim = '\t'
def q10:path = concat[path_answer, "answers_query10.txt"]
def q10:syntax:header =(1, :student_id)
def q11:path = concat[path_answer, "answers_query11.txt"]
def q11:syntax:header = (1, :researchgroup_id)
def q12:path = concat[path_answer, "answers_query12.txt"]
def q12:syntax:header = (1, :chair_id); (2, :dept_id)
def q12:syntax:delim = '\t'
def q13:path = concat[path_answer, "answers_query13.txt"]
def q13:syntax:header = (1, :person_id)
def q14:path = concat[path_answer, "answers_query14.txt"]
def q14:syntax:header = (1, :undergraduate_student_id)
end
def insert:answer_ref[id] = load_csv[answer_configs[id]]
There’s a bit of a problem now, though.
The answers_ref
relation is in GNF, but the answers
relation is not.
Data Normalization
Take a look at the structure of the answer_ref:q7
relation:
// read query
def output = top_csv[3, answer_ref:q7]
The output represents the first three rows of the reference answers CSV file. Each tuple corresponds to a single “cell” in the CSV file, indexed by its column name and row number. The first two elements of each tuple are the column name and row number. The third element is the value at that column and row position. Note that the row numbers begin with 2 because the first row — the header row — is schema, not data.
In order to compare the query results in Rel to the reference query answers,
you need to transform the answer
relation into the same structure as the answer_ref
relation.
To achieve this, first you define a relation answer_column_order
, which captures the column names of all the queries.
// model
module answer_column_order
def q1 = (:student_id, 1)
def q2 = (:student_id, 1); (:university_id, 2); (:department_id, 3)
def q3 = (:publication_id, 1)
def q4 = (:prof_id, 1); (:prof_name, 2); (:prof_email, 3); (:prof_tele, 4)
def q5 = (:person_id, 1)
def q6 = (:student_id, 1)
def q7 = (:student_id, 1); (:course_id, 2)
def q8 = (:student_id, 1); (:department_id, 2); (:student_email, 3)
def q9 = (:student_id, 1); (:faculty_id, 2); (:course_id, 3)
def q10 = (:student_id, 1)
def q11 = (:researchgroup_id, 1)
def q12 = (:chair_id, 1); (:dept_id, 2)
def q13 = (:person_id, 1)
def q14 = (:undergraduate_student_id, 1)
end
Next, you define answer_gnf
, which converts all the query results in the
answer
relation into GNF.
// model
def answer_gnf[q](column_name, index, text) {
exists(val:
answer_column_order[q].(
pivot[x... : enumerate[answer[q]](index, x...)]
)(column_name, val)
and text = LUBM:show[val]
)
}
Here, answer_gnf
uses the Standard Library relations enumerate
and pivot
to convert high-arity relations to high-cardinality relations.
You can use the dot join operator to assign column names to each value in the query results.
The entity hashes contained in the answer
relation are converted to the correct strings using the show
relation.
Answer Validation
You can validate the query answers using two integrity constraints.
The first integrity constraint, all_answers_correct
, checks that all found answers are present in the reference answers.
The second integrity constraint, all_answers_found
, checks that all reference answers have been indeed found.
Because the row numbers in answer_ref
do not necessarily match the indices of the tuples in answer_gnf
,
the integrity constraints check that for every query, query
, and every individual result, indexed by i
, in answer_ref
(answer_gnf
) there exists a result in answer_gnf
(answer_ref
), which is indexed by j
.
// read query
ic all_answers_correct(query, i) {
answer_gnf(query, _, i, _)
implies
exists(j:
forall(col where answer_ref(query, col, j, _):
answer_gnf[query, col, i] = answer_ref[query, col, j]
)
)
}
ic all_answers_found(query, i) {
answer_ref(query, _, i, _)
implies
exists(j:
forall(col where answer_gnf(query, col, j, _):
answer_gnf[query, col, j] = answer_ref[query, col, i]
)
)
}
These queries run without triggering any integrity constraint violations, which confirms that the results are correct and complete.
Summary
Rel is a powerful modeling language when it comes to expressing complex ontologies and their data. In particular, the OWL ontology of the LUBM benchmark can be modeled naturally in Rel.
In this guide, you saw how to:
- Import the LUBM dataset as a base relation.
- Transform the base data to a relational knowledge graph.
- Model entity and edge hierarchies.
- Infer new facts in the knowledge graph through reasoning.
- Translate the LUBM test queries from SPARQL to Rel.
- Transform query results to Graph Normal Form.
- Write integrity constraints to validate the query results.
See Also
Rel’s modeling capabilities go well beyond the concepts needed to model the LUBM ontology. For more details on modeling with Rel, see Relational Data Modeling and Graph Normal Form.