Rel
REL CONCEPTS
Entities

# Entities

This concept guide introduces entities in Rel.

## Goal#

The goal of this concept guide is to introduce entities and show how they are defined and used throughout Rel.

## Introduction#

An entity is something that exists, independently of its representation in the database, and can be uniquely identified. It can be a concrete object, such as a car or a person, or more abstract, such as an organization. A database often describes sets of entities, their properties, and the relationships between them, forming a knowledge graph, where entities are the nodes of the graph.

Imagine a database where we have professionals and their countries of origin. A common approach is to assign integer IDs to each person, country, and occupation, and then express queries using these IDs. However, we might inadvertently join on person and country IDs. We might use IDs that don’t exist, or make typos when querying a particular occupation name.

Entities alleviate these problems. They give us a way to describe distinct objects uniquely and unambiguously, keeping distinct concepts separate when needed. Entities let us decouple the details of how we identify concepts from how we use them.

## Basics#

### Constructing Entities#

As a simple example of constructing entities, consider a database where we want to refer to several countries. To create unique entities for each one, we can write the following entity definition:

// install

entity Country
country_from_name = {
"United States";
"England";
"Mexico";
"France";
"Iceland";
"Russia"
}

Here, Country refers to the entity type we are defining. On the right-hand side, we have an (anonymous) relation that contains the names (or more generally, the features) that uniquely identify the different Country instances. The constructor, country_from_name, will be a relation that maps the names to the respective entities, which will have a unique ID. In our case, this ID will be an Integer or a Hash – see section on ”desugaring”, below, to see how entities are constructed from the identifying features.

Throughout this concept guide, we will use the term “entity” to refer both to the unique ID in the database, and the real-world entity that it refers to.

For clarity, we use the convention X_from_Y for the constructor, where X refers to the entity type and Y to the identifying features. Also as a convention, we start the entity name(e.g. Country) with an uppercase letter, following the naming convention of other type predicates like String or Number (see stdlib). This naming convention is not mandatory.

In this example, the key is a single string value. As we will see in Section Compound Identifying Features, entities can be also identified by multiple values, which is useful for more complex entities or recursive entity creation (see Section Entities and Recursion).

With the above definition in place, Country will be a unary relation containing the corresponding five entities — in this case, unique hash IDs:

// query

Country

The constructor, country_from_name, will be a binary relation that maps each string to the corresponding entity:

// query

country_from_name

Applying the constructor to a string not in the list will not yield any result. Thus, country_from_name["United States"] will refer to an entity, but country_from_name["USA"] will be empty.

Behind the scenes, Country is actually derived from the constructor country_from_name (see Section on ”desugaring” for details).

### Entity Properties#

We now discuss how to assign properties to the newly constructed entity instances. We prefer to do this in a fully normalized way, with separate definitions for each property, keyed by the property name.

As a convention, and to avoid overloading the entity type relation Country, we define an all-lowercase country(:prop, e, value) relation that maps the property :prop of entity e to its value value. Thanks to Rel’s Module Syntax, this can be written as country:prop(e,value).

For example, let’s assign a :name to Country entities — which is particularly easy in this case, since we can find it in the constructor:

// install

def country:name(e, value) = country_from_name(value, e)

At first glance, it might seem redundant to assign the identifying feature in this way, since we can already access it via the constructor country_from_name. However, this will have the advantage that all properties will be accessible and queried in the same way.

Let’s define some more (non-identifying) properties.

// install

def country:population(e, 364134) =  country:name(e, "Iceland")
def country:population(e, 328239523) = country:name(e, "United States")

def country:name_alias[e] = {"USA"; "United States of America"},
country:name(e, "United States")

This assigns population information to Iceland and the United States, and two name aliases for the United States. Note that we do NOT need missing values for properties we don’t want to assign, or don’t know.

Note also how we used the identifying feature from our previous definition, country:name(e, value), to select the entity e. We could have used the constructor country_from_name(value, e) instead.

It is time to look at all the information we know about our Country entities:

// query

country


As we can see, all country entities have a :name. We added :population values for Iceland and the United States, and two :name_alas values for the United States.

### Displaying Entities#

It is useful to define a show relation that displays a more human-readable string than the internal entity ID. The identifying feature(s) are usually a good choice, and, by definition, the entity will be uniquely identifiable. For our Country entity, this is the name:

// install

def show[e] = country:name[e]

Let’s list all countries where we have population information, using the human-readable show relation to identify the entities:

// query

def output = show[e], country:population[e] from e

### Compound Identifying Features (n-ary constructors)#

Often, more than one value is required to uniquely identify an entity. A simple example is a person who is identified by a first and last name that are stored as two separate values (as should be the case in a normalized modeling approach).

This is how we can define an Actor entity, uniquely identified by first and last names:

// install

def actor_names = {
("Sharon", "Stone");
("Tim", "Curry");
("Robert", "DeNiro");
("Michael", "Jordan")
}

entity Actor
actor_from_name = actor_names

This time, the relation that contains the identifying features is a named relation, actor_names. This relation has arity 2 and was defined outside the entity construction statement. The constructor actor_from_name is now a relation of arity 3, mapping each name pair to an entity ID that uniquely references the actor.

// query

actor_from_name

In general, entities can be created from an $n$-ary relation, giving a constructor that has $n+1$ arguments:

entity E
E_from_values(x1,...,xn) = identifying_features_relation(x1,...,xn)

### Multiple Constructors#

It might be necessary to construct entities of the same type in different ways, if the set of identifying features is not uniform across all of the entities. To do so, we can define multiple constructors for the same entity type. For example, we can add an entity for “Cher” where she is uniquely identified by her single artist name.

// install

entity Actor
actor_from_name = {"Cher"}

The entity type Actor now has two constructors. The binary actor_from_name defines one Actor entity. The ternary actor_from_name defines four Actor entities.

It is also possible to overload the same constructor by arity and/or type. Let’s demonstrate that by defining a few more entity types:

// install

entity Musician
musician_from_name = {
"Bjork";
"Prince"
}

entity Musician
musician_from_name = {
("Paul", "McCartney");
("John", "Lennon")
}

entity Athlete
athlete_from_name = {
("Michael", "Jordan")
}

entity Professor
professor_from_name = {
("Michael", "Jordan")
}

Here, we defined four constructors for three entity types.

• The entity type Musician has two constructors of the same name, musician_from_name but of different arities. One constructor has arity 2 and the other has arity 3.
• We defined a Professor entity and an Athlete entity with the same identifying features (i.e., name pair): {"Michael", "Jordan"} — and earlier, an Actor too. These three entities are, however, distinct. This is possible because the constructor name is also taken into account when generating the entity ID.

Constructors can also be overloaded by data type, and will generate distinct entities for each type. For example:

// install

entity Num num_constructor = 2
entity Num num_constructor = "2"
entity Num num_constructor = 2.0

defines three different Num entities.

// query

num_constructor

### Entities of Multiple Types#

Cher is mostly known for her music, so we may want to classify her not just as Actor but also as Musician. We can do so by referring to the entity ID we have already created and adding her to the Musician relation.

// install

def Musician(e) = actor_from_name("Cher", e)

We could have also created a new entity for Cher as a Musician by adding her identifying features to the musician_from_name relation. However, this would have resulted in two separate entities referring to the same underlying person. This would defeat the purpose of having entities in the first place, and should be avoided as much as possible. (When joining two knowledge bases, or importing large amounts of raw data, this situation may be unavoidable; resolving this ambiguity is known as entity resolution.)

### Entity Hierarchies#

Hierarchies are a key semantic concept, ubiquitous in modeling real-world data (think OWL). Examples are any organizational structure, such as a corporation (e.g.: CEO, CTO, …, intern), a government (e.g.: President, Vice-President, …), the classification of life (life, domain, kingdom, …), or a family tree (children, parents, grandparents, …).

To model hierarchies between entities, we can, for example, define the concept of a Professional as a supertype that includes all professions we have defined earlier:

// install

def Professional = Actor; Athlete; Musician; Professor

This definition does not preclude creating additional constructors for Professional directly. For example, we can add Roger Penrose directly as a Professional:

// install

entity Professional
professional_from_name = ("Roger", "Penrose")

Since Roger Penrose is a Professor in Mathematics, we can add the entity we just created to the Professor relation:

// install

def Professor(x) = professional_from_name("Roger", "Penrose", x)

In hindsight, we notice it would have been easier to add him first as Professor because the definition def Professional = Actor; Athlete; Musician; Professor would have automatically added him as Professional.

### Entity Properties (Continued)#

How do we model properties in the presence of a hierarchy? Do we need to assign them at each hierarchical level again? You might guess the answer — we don’t.

However, we have a choice to make.

• Bottom-up: Define the entity properties at the lowest level of the hierarchy (Actor, Athlete, …) and propagate them up the hierarchy (to Professional)

• Top-down: Define the properties at the top level of the hierarchy (Professional) and propagate them down (to Actor, Athlete, …), or

• Mixed: Define the properties on the level where the entity IDs are defined and propagate the information up and down. In our case, most people were defined on the bottom level, but Roger Penrose was defined on the higher Professional level.

We demonstrate here the bottom-up and top-down approaches.

#### Bottom-Up#

In the bottom-up approach, we assign the properties on the lowest-level, and we define relations actor, athlete, …, to collect the entity properties of each. We follow the naming convention from Section Entity Properties, where the name of the relation containing the properties is all-lowercase.

// install

def actor:first_name(e, x) = actor_from_name(x, y..., e) from y...
def actor:last_name(e, x) = actor_from_name(_, x, e)

def athlete:first_name(e, x) = athlete_from_name(x, y..., e) from y...
def athlete:last_name(e, x) = athlete_from_name(_, x, e)

def musician:first_name(e, x) = musician_from_name(x, y..., e) from y...
def musician:last_name(e, x) = musician_from_name(_, x, e)
def musician(prop, e in Musician, x) = actor(prop, e, x)

def professor:first_name(e, x) = professor_from_name(x, _, e)
def professor:first_name(e in Professor, x) = professional_from_name(x, _, e)
def professor:last_name(e, x) = professor_from_name(_, x, e)
def professor:last_name(e in Professor, x) = professional_from_name(_, x, e)

Note that some of our :first_name definitions use the y... varargs notation, to allow for 0 or more arguments in its place. This way, the definition uses the first argument regardless of whether the x_from_name constructor has one, two, or more arguments. (The last operator from the stdlib can be used similarly.)

Note also that since we know that some actors are musicians, our last rule for musician refers to actor — but only for those cases where we have a Musician entity.

We created two properties, first_name and last_name, for the entity types Actor, Athlete, Musician, and Professor.

For Actor and Athlete, we define them directly from their constructors. Because the constructors are unique for each entity type, we don’t need to check that the entity ID entity is of the correct type.

The situation is a bit more complicated for the Musician and Professor entity types because they contain entities (more precisely, entity instances) that were initially defined for another entity type: “Cher” and “Roger Penrose” were initially defined as Actor and Professional and later also as Musician and Professor, respectively. Therefore, in the code above we use additional checks on the left-hand side: entity in Musician and entity in Professor ensure that the entity IDs also refer to the intended entity type.

Here, you see one main drawback by this modeling approach. If specific instances have multiple assigned entity types, then special treatment is needed that complicates the logic.

Inspecting the musician and professor relations shows that only the properties that we know are assigned:

// query

def output:musician = musician
def output:professor = professor


Notice how “Cher” and “Roger Penrose” are included in these relations, even though we defined them initially as entities of a different type.

To propagate these definitions up, we simply repeat the same union as we did for the entity type relations (e.g., Actor) but this time for the entity relations (e.g.: actor):

// install

def professional = actor; athlete; musician; professor

Let’s look at all properties associated with the first three entity instances in professional:

// query

def output[prop] = professional[prop, e] for e in last[top[3, Professional]]


As we can see, professional does include names that were initially assigned to actor, athlete, musician, or professor. This shows that we successfully propagated these properties to the professional relation without having to redefine them. Furthermore, we can access the properties exactly the same way in professional as we do in actor, et al.

#### Top-Down#

We can also model the properties in a top-down fashion, where we first define professional_topdown and then define actor_topdown, athlete_topdown, musician_topdown, and professor_topdown. Note that we use the suffix _topdown to keep these relation separate from the ones defined in the “bottom-up” approach above.

// install

def professional_topdown(:first_name, e, first) = {
actor_from_name; athlete_from_name; musician_from_name;
professor_from_name; professional_from_name
}(first, x..., e) from x...

def professional_topdown(:last_name, e, last) = {
actor_from_name; athlete_from_name; musician_from_name;
professor_from_name; professional_from_name
}(_, last, e)

The union over all the entity constructors can be more compactly expressed in the top-down approach than in the bottom-up approach above. However, this is only possible because all entity constructors have a very similar structure. If the individual constructors vary too much then their union must be expressed more verbosely and potentially require delicate handling.

Now, in the second step, we can propagate the entity properties down to the individual (sub)-entity level.

// install

def actor_topdown[p, x] = professional_topdown[p, x], Actor(x)
def athlete_topdown[p, x] = professional_topdown[p, x], Athlete(x)
def musician_topdown[p, x] = professional_topdown[p, x], Musician(x)
def professor_topdown[p, x] = professional_topdown[p, x], Professor(x)

Let’s compare the results for musician (bottom-up approach) and musician_topdown (top-down approach)

// query

equal(musician, musician_topdown)

and we see the two relations are in perfect agreement (see equal ), demonstrating that the two modeling approaches are equivalent.

### Displaying Entities (Continued)#

Now we have defined several types of entities, and some entities even belong to multiple types. Fortunately, this does not complicate defining a show relation for all of them.

Of course, we could define a show relation for each entity type individually. However, it is actually much easier to define show for all of them. The super-type Professional comes in handy as it includes all people we have defined as far; we only need to define one show relation for all entities of type Professional:

// install

def show[p in Professional] =
concat[
professional[:first_name, p],
(concat[" ",
professional[:last_name, p]
] <++ "")
]

Note that we use the left_override (<++) from stdlib to include professionals with no last name.

Let’s look at the entity instances from section Bottom-Up. We use show to return human-readable strings instead of the internal entity ID:

// query

def output[prop] = show[e], professional[prop, e] from e in last[top[3, Professional]]
// query

def output[prop] = show[e], actor[prop, e] from e in last[top[3, Actor]]

So far, we have defined a number of entity types and even an entity hierarchy, which can be used as the starting point for a Knowledge Graph, but we have not yet created any edges to connect different entities.

In Rel, an edge is just a binary relation, relating pairs of entities with each other. (We can also create hyper-edges, that connect three or more entities together.)

Edge information often comes from an external source like a CSV file. It can also come from domain knowledge. For instance, a father edge could be derived from a parent edge and a condition that the parent needs to be a Man.

In this concept guide, we will limit ourselves to a binary edge relation, has_nationality, that connects the Professional entities to the Country entities defined in Section Constructing Entities.

For convenience (and to avoid loading external data), we now load a string constant corresponding to a CSV file with two rows and save the data in the base relation nationality_csv:

// update

def config:data = """
first_name,last_name,country
John,Lennon,England
Bjork,,Iceland
Cher,,United States
Sharon,Stone,United States
Prince,,United States
Tim,Curry,United States
Roger,Penrose,England
Robert,DeNiro,United States"""
def delete[:nationality_csv] = nationality_csv
def insert[:nationality_csv] = load_csv[config]

(See the CSV Import how-to guide for more importing CSV data.)

The next step is to match the properties to their corresponding entities and collect the entity pairs in the relation has_nationality.

// install

def has_nationality(pent, cent) =
professional:first_name[pent] = nationality_csv[:first_name, pos]
// use "equal" to include null case:
and equal(professional:last_name[pent], nationality_csv[:last_name, pos])
and country:name[cent] = nationality_csv[:country, pos]
from pos

The entries in has_nationality are as expected:

// query

def output = p, e : has_nationality(pent, cent) and p = show[pent] and e = show[cent] from pent, cent


We can use our show relation to confirm that we have connected the right entities:

// query

def output = show[p], show[c]
from p, c where has_nationality(p, c)

In a RAI Notebook, we can visualize our small Knowledge Graph with this cell:

module graph
def node = x : has_nationality(x, _) or has_nationality(_, x)
def edge = has_nationality
def node_attribute[n, "label"] = show[n]
def edge_attribute[x, y, "label"] = "nationality", has_nationality(x,y)
end
def output = graphviz[graph]

## Entities and Integrity Constraints#

You can use integrity constraints to check the consistency of your entity definitions. For example, to make sure that has_nationality connects the right types of entities, we can add this integrity constraint:

// query

ic has_nationality_types(x,y) {
has_nationality(x,y) implies Professional(x) and Country(y)
}

To make sure that first_name is defined for all professionals, we can add:

// query

ic all_have_first_name(e) {
Professional(e) implies professional:first_name(e, _)
}

We can check that actor and musician are only defined for entities of the corresponding type:

// query

ic actor_type(prop, e) {(actor(prop, e, x...) from x...) implies Actor(e)}
ic musician_type(prop, e) {(musician(prop, e, x...)  from x...) implies Musician(e)}

We can also write an integrity constraint that verifies that all Actor entities, for instance, are also of type Professional. Mathematically, this means that Actor should be a subset (⊆) of Professional.

// query

ic Actor_subset_Professional {
Actor ⊆ Professional
}

## Entities and Recursion#

Entities can be constructed from other entities, and created recursively. You can find more details on recursion and how to write recursive logic in the Recursion concept guide. For example, we might want to represent paths through a directed acyclic graph, starting from a given set of source nodes, treating each path as a distinct entity.

We could represent paths as strings, and concatenate new node names onto existing ones. However, this would result in arbitrarily long strings, of different lengths for different paths, and any change to the string representation would require updating all of the path names.

Instead, we define an entity for each path, recursively, as follows:

// install

// define the graph
def edge = {(1,2); (2,3); (3,5); (1,4); (4,5)}

// starting nodes
def source = {1; 3}

// construct the Path entities
entity Path nil() = true
entity Path cons(path, x) = nil(path) and source(x)
entity Path cons(path, y) = cons(_, x, path) and edge(x, y) from x

The path constructors define three different kinds of paths: First, nil constructs a single entity for the empty path. Next, we define paths that are constructed from the nil path and one of the source nodes. Finally, we can recursively construct a new path from an existing path and a node y, if the existing path ends at x and there is an edge from x to y.

Note that this example does not lead to an infinite recursion because there are no cycles in the graph.

We can now define more complex properties of path entities, such as their number of nodes and edges:

// install

def path:numnodes(p, 0) = nil(p)
def path:numnodes[p] = path[:numnodes, prev] + 1
from prev where cons(prev, _, p)

def path:length(p, 0) = cons(nilpath, _, p) and nil(nilpath)
from nilpath

def path:length[p] = path:length[prev] + 1
from prev where cons(prev, _, p)

Note that in this formulation, the nil path has a :numnodes property of 0, but the :length property is not defined for it.

// query

path

We can assign string names to path entities with a recursive definition as well:

// install

def show[p in nil] = "nil"

def show[p in Path] = string[end_node]
from end_node where cons(nil, end_node, p)

def show[p in Path] = concat[show[prefix], concat["-", string[end_node]]]
from end_node, prefix where cons(prefix, end_node, p) and not nil(prefix)

We can now use show to see this representation of the Path entities:

// query

def sample_paths = show[p], p from p in Path

def output = sample_paths

## Behind the Scenes#

### @hash Entities#

By default, hashes are used to construct the entities (which, in the DB, are just unique internal IDs).

You should make no assumptions about the entities themselves, other than their uniqueness, and they should only be accessed via the Entity and constructor relations. Their specific values, ordering, etc. remain unspecified.

### Desugaring Entity Definitions#

It might be useful to see the Rel definitions that correspond to entity definitions – we are really defining a unary relation (the set that contains the entities), and a relation from keys to elements of that set.

Using point-free notation,

entity E c = r

is (roughly) equivalent to

def c = hash[r]
def E = last[c]

(the actual hashes will be different since the entity declaration also hashes the constructor and entity names). Note that hash adds the hash as the $(n+1)^\text{th}$ element when given a relation of arity $n$.

Similarly,

@auto_number
entity E c = r

is (roughly) equivalent to

def c = auto_number[r]
def E = last[c]

(auto_number will give different results each time it is called).

### Syntax Note#

As shown in Desugaring Entity Definitions, the entity constructor is a relation as well. This has the benefit that we can use the same syntax as for normal relations, to emphasize different aspects of the query definition or simply to write the definition more compactly.

For example, we could write a professional_from_name_occupation constructor that defines three different professionals with the same name, but different occupations, as follows:

entity Professional professional_from_name_occupation["Michael", "Jordan"] =
{"Athlete"; "Actor"; "Professor"}

or also as

entity Professional professional_from_name_occupation("Michael", "Jordan", x) =
{"Athlete"; "Actor"; "Professor"}(x)

If we wanted to account for the basketball player Michael Jordan acting in the movie “Space Jam” (1996) which makes him both an Actor and an Athlete, we might want to add a middle initial, to distinguish him from the actor Michael B. Jordan (“Black Panther”, “Creed”).