Entities
This concept guide introduces entities and shows how they are defined and used in Rel.
Introduction
An entity is something that exists in the real world and can be uniquely identified independently of its properties or other relationships. An entity may be a concrete object, such as a car or a person, or more abstract, such as an organization.
To say that a company, for example, can be uniquely identified independently of its properties means that a company may make changes to its products, its affiliates, or even its name while remaining the same company.
By contrast, a product’s price is not an entity, because a price is fully characterized by a number and a currency denomination. In other words, if a product’s price is 12 dollars, then changing either “12” or “dollars” means that you are no longer referring to the same price1. Non-entity elements like price are described using value types. A RelationalAI database is a knowledge graph where each node is an entity or value type and where the edges represent relationships between them.
Rel’s entities are a construct designed to model real-world entities. They offer a way to refer to objects independently of identifying real-world information.
To see why this is useful, consider a database representing a set of professionals and their countries of origin. One possible approach is to represent each entity using an identifying string, like a person’s full name or a country’s name. However, this approach makes it difficult to handle a name change.
For example, when the country Türkiye (opens in a new tab) changed its name from Turkey, you would have to be careful to make the change everywhere it appears in the database.
This could lead to inconsistencies if the name were not changed everywhere, or if the string "Turkey"
were updated in a relation in which it refers to a bird.
Rel’s solution to this problem is to provide elements to refer to countries rather than country names. Then you can store the relationship between each country and its name in a base relation. You can also use the entity elements — rather than names or some other identifying information — to reference countries elsewhere in the database.
With this design, a name change can be effected by updating a single relation.
Furthermore, you can write queries without worrying about spurious equalities such as the example string equality between "Turkey"
the country name and "Turkey"
the bird.
For further discussion of the advantages of using entities, see Graph Normal Form.
Basics
This section describes how to define entities in Rel and how to represent properties of entities.
Constructing Entities
The following declaration makes Country
an entity type for which each country is identified via a string:
// model
entity type Country = String
This declaration defines a single binary relation, denoted ^Country
, that maps identifying strings to country entities.
For example, ^Country["Iceland"]
is a special element in the database that can be used to represent the country of Iceland2.
The identifying value "Iceland"
is called the key of the entity ^Country["Iceland"]
.
Typically when working with an entity type, you want to define two other relations: one that maps entities to their identifying property and another that contains entities.
You might call the former relation country_name
in this case, and you would conventionally call the latter relation Country
:
// write query
def original_country = {
"United States";
"England";
"Mexico";
"France";
"Iceland";
"Russia";
"Turkey"
}
def insert:country_name = {
^Country[name], name from name in original_country
}
// model
def Country(c) = country_name(c, _)
The following cell’s output shows that the cells above created seven Country
entities:
// read query
def output = count[Country]
It is necessary to define country_name
rather than using ^Country
because Rel does not provide a way to extract an entity’s key from the entity.
For example, if c
is defined using the definition def c = ^Country["Mexico"]
, then it is not possible to “ask” the value c
to report that the string "Mexico"
was used to construct it.
In other words, the following query is not permitted and results in an error:
// read query
def c = ^Country["Mexico"]
def output(name) = {
^Country[name] = c
}
This language behavior is important because it allows the database user to accommodate changes in identifying data. Suppose you want to update the database to reflect Türkiye’s name change from Turkey, for example. You can do that without having to change the entity everywhere it appears in the database:
// write query
def delete:country_name(e, name) = {
country_name(e, name) and name = "Turkey"
}
def insert:country_name(e, "Türkiye") = {
e = ^Country["Turkey"]
}
With this update in place, country_name
reports the updated country name.
For example, consider the following query for the countries whose names start with "T"
:
// read query
def output = country_name[{
c : substring[country_name[c], 1, 1] = "T"
}]
The database is still using the entity generated from the original name, but the relation country_name
has been updated to ensure that it contains the correct name.
An entity-constructing relation like ^Country
should only be used to create entities.
The relationship between an entity and its identifying information should be managed through other relations like country_name
.
In this example, each country’s key is a single string value. Entities can also be identified by multiple values. See the section Compound Keys for more details.
Properties of Entities
Working with entities in a real-world application means associating them with values or other entities.
The section Constructing Entities illustrates one example of such a property relation, namely the relation country_name
that maps each country entity to its name. Likewise, you might want to define another relation called country_population
that maps each entity to its population, etc.
A simpler approach is to define a module called country
such that country[:name]
maps each country entity to its name and country[:population]
maps each country to its population, etc.
In other words, you can store properties of countries in an arity-3 relation called country
whose elements are of the form (prop, c, value)
, where:
prop
is the name of a property, like:population
.c
is the country entity.value
is the value of the propertyprop
for the countryc
.
The following cell adds four new tuples to the relation country
.
It assigns population information to Iceland and the United States, and it also defines two name aliases for the United States:
// write query
def new_data = {
:population, "Iceland", 364134;
:population, "United States", 328239523;
:name_alias, "United States", "USA";
:name_alias, "United States", "United States of America"
}
def insert:country(:name, e, value) = {
country_name(e, value)
}
def insert:country(prop, e, value) = {
new_data(prop, name, value) and country_name(e, name) from name
}
You can see all of the properties defined so far by displaying the contents of the relation country
:
// read query
def output(prop, name, value) = {
country(prop, e, value) and country(:name, e, name) from e
}
Displaying Entities
Note that the table above shows the name of each country in the second column, rather than showing the entity value itself. It’s best to avoid directly displaying entities in output tables, because entities are represented internally as large integers that are not very useful to look at:
// read query
def output = country:name
Note that country:name
is shorthand for country[:name]
.
At first glance it might seem preferable to use entity identifiers with real-world meaning rather than large integers. However, since the purpose of entities is to represent objects whose identity persists despite changes in identifying information, it is actually essential to use a representation that is independent of real-world attributes.
Identification systems used in the real world use the same approach. A Social Security Number, for example, refers to an individual without describing any information like their name, date of birth, etc. These features are maintained in databases instead.
For this reason, it can be helpful to define a dedicated relation with a name like show
whose purpose is to make the entity human-readable in the table:
// model
def show[e] = "^Country[“%(country:name[e])”]"
One succinct way to use show
to create new relations involving the entity type is to use from
to project out the entity:
// read query
def output = show[e], country:population[e] from e
This query says, “Find all database elements e
such that the product show[e], country:population[e]
is nonempty, and collect all such products.”
For more details on the from
keyword in Rel, see the From Expressions section in Rel Language Reference.
The reason it’s helpful to use a generic name like show
rather than a specific relation like country:name
for this purpose is that show
can be used for multiple entity types at the same time.
This idea is illustrated in the section on hierarchies below.
Compound Keys
Often, more than one value is required to uniquely identify an entity. A simple example is a person who is identified by a first and last name that are stored as two separate values, as should be the case in a normalized modeling approach.
The code cells below illustrate how to define an Actor
entity, uniquely identified by first and last name:
// model
entity type Actor = String, String
The constructor ^Actor
is a relation of arity 3 that maps each name pair to an entity value that uniquely references the actor.
// write query
def actor_names = {
("Sharon", "Stone");
("Tim", "Curry");
("Robert", "DeNiro");
("Michael", "Jordan")
}
def insert:actor:name = {
^Actor[name...], name... from name... in actor_names
}
Note the use of varargs to represent the (first_name, last_name)
pair with a single variable. Continuing:
// model
def Actor(a) = actor:name(a, name...) from name...
// read query
def output(name...) = actor:name(e, name...) from e
This example identifies actors using two strings, but an entity declaration may be -ary for any value of . For example, the entity key may contain 3, 4, or 5 elements, in which case the corresponding constructor would have arity 4, 5, or 6, respectively.
Advanced Entity Construction
Multiple Constructors
In some cases it might be necessary to construct entities of the same type in different ways. For example, the set of identifying features may not be uniform across all of the entities. The actress Cher, for example, goes by a single name rather than two names.
To accommodate this situation, Rel allows multiple declarations for the same entity type:
// model
entity type Actor = String
The entity type Actor
has two constructors: a ternary one and a binary one.
The binary constructor can be used to insert a tuple for Cher into the relation actor:name
:
// write query
def insert:actor:name(e, "Cher") = {
^Actor("Cher", e)
}
// read query
def output(name...) = actor:name(e, name...) from e
Note that it is necessary in this example to use parenthetical relational application rather than square brackets, like ^Actor["Cher"]
.
That’s because the presence of the ternary constructor means that there are infinitely many tuples in ^Actor
whose first element is "Cher"
.
Thus ^Actor["Cher"]
refers to infinitely many tuples and cannot be resolved by the system.
Using parentheses lets Rel know that the expression refers to the binary constructor only.
It is also possible to overload the same constructor by type. Consider the following additional declarations:
// model
entity type Musician = String
entity type Musician = String, String
entity type Athlete = String, String
entity type Professor = String, String
This cell defines four constructors for three entity types.
The entity type Musician
has two constructors of different arities.
One constructor has arity 2 and the other has arity 3.
// write query
def musician_names = {
("Bjork");
("Prince");
("Paul", "McCartney");
("John", "Lennon")
}
def insert:musician:name(e, n) = {
musician_names(n) and ^Musician(n, e)
}
def insert:musician:name(e, fn, ln) = {
musician_names(fn, ln) and ^Musician(fn, ln, e)
}
def insert:athlete:name = {
^Athlete[name...], name... from name... in {("Michael", "Jordan")}
}
def insert:professor:name = {
^Professor[name...], name... from name... in {("Michael", "Jordan")}
}
// model
def Musician(e) = musician:name(e, name...) from name...
def Athlete(e) = athlete:name(e, name...) from name...
def Professor(e) = professor:name(e, name...) from name...
// read query
def output:musician(name...) = musician:name(e, name...) from e
def output:athlete(name...) = athlete:name(e, name...) from e
def output:professor(name...) = professor:name(e, name...) from e
Note that the Professor
entity and the Athlete
entity have the same identifying features: a first name of Michael and a last name of Jordan. However, these entities are distinct because the entity types are different.
Constructors can also be overloaded by data type and will generate distinct entities for each type. For example, suppose that an organization changes its employee IDs from numbers to strings but still needs to support both in its database:
// model
entity type EmployeeID = Int
entity type EmployeeID = String
// read query
^EmployeeID[128471294];
^EmployeeID["128471294"]
As expected, the two entities are distinct.
Entities of Multiple Types
Cher is known for her music as well as her acting, so it makes sense to classify her not just as Actor
but also as Musician
.
Since the relation Musician
is managed by the user and not by the system, Cher’s multiple roles may be accommodated by adding the previously created entity to the musician:name
relation:
// write query
def insert:musician:name(e, n) = {
actor:name(e, n) and n = "Cher"
}
It also would have been possible to create a new entity for Cher as a Musician
using the ^Musician
constructor.
However, this would have resulted in two separate entities referring to the same underlying person.
This would defeat the purpose of entities and should be avoided as much as possible.
Ensuring that individuals are consistently identified in the database is known as entity resolution. When joining two knowledge bases or importing large amounts of raw data, it may not be feasible to do entity resolution perfectly.
Entity Hierarchies
Hierarchies arise often when modeling using entities. For example:
- Corporation leadership structure: executive, manager, …, and intern.
- The classification of life (life, domain, kingdom, …).
- A family tree (children, parents, grandparents, …).
The preceding sections, for example, have introduced several examples of professions: actor, athlete, musician, and professor. To model the Professional
type that encompasses all of these, you can use a union:
// model
def Professional = Actor; Athlete; Musician; Professor
You can also declare Professional
as an entity type so that new professionals can be created without referring to any particular subtype like actor or athlete:
// model
entity type Professional = String, String
// write query
def insert:professional:name(e, first, last) = {
^Professional(first, last, e) and first = "Roger" and last = "Penrose"
}
// model
def Professional(e) = professional:name(e, name...) from name...
// read query
def output(name...) = professional:name(e, name...) from e
def output:count = count[Professional]
Since Roger Penrose is a Professor in Mathematics, he can be added to the Professor
relation:
// write query
def insert:professor:name(e, first, last) = {
professional:name(e, first, last) and first = "Roger" and last = "Penrose"
}
Note that it would have been easier to add him first as Professor
because the definition def Professional = Actor; Athlete; Musician; Professor
would have automatically added him as Professional
.
Nevertheless, the example illustrates the flexibility afforded by being able to manage the unary relations Professional
and Professor
yourself.
Properties of Entities
There are a few approaches to defining relations involving entities in a way that respects hierarchical levels:
-
Bottom-up: Define the entity properties at the lowest level of the hierarchy —
Actor
,Athlete
, etc. — and propagate them up the hierarchy toProfessional
. -
Top-down: Define the properties at the top level of the hierarchy —
Professional
— and propagate them down toActor
,Athlete
, etc. -
Mixed: Define the properties on the level where the entity IDs are defined and propagate the information up and down. For example, Roger Penrose was defined on the higher
Professional
level, so his properties would be defined there. The other individuals in the database were defined using their profession, so their properties would be defined at that level.
The next two sections illustrate the first two of these three approaches.
Bottom-Up
The bottom-up approach involves assigning properties at the level of actor
, athlete
, etc.:
// write query
def insert:actor:first_name(e, first_name) = {
actor:name(e, first_name, last_name...) from last_name...
}
def insert:actor:last_name(e, last_name) = {
actor:name(e, first_name, last_name) from first_name
}
def insert:musician:first_name(e, f) = musician:name(e, f, l...) from l...
def insert:musician:last_name(e, l) = musician:name(e, _, l)
def insert:musician(prop, e in Musician, x) = actor(prop, e, x)
def insert:professor:first_name(e, n) = professor:name(e, n, _)
def insert:professor:first_name(e in Professor, n) = professional:name(e, n, _)
def insert:professor:last_name(e, n) = professor:name(e, _, n)
def insert:professor:last_name(e in Professor, n) = professional:name(e, _, n)
Note that since some actors are musicians, the last rule for musician
refers to actor
— but only for those cases where the entity is an element of the relation Musician
.
Thus the code above defines two properties, first_name
and last_name
, for the entity types Actor
, Athlete
, Musician
, and Professor
.
For Actor
and Athlete
, these properties are defined directly from their constructors.
Because the constructors are unique for each entity type, it isn’t necessary to check that the entity ID entity
is of the correct type.
The situation is a bit more complicated for the Musician
and Professor
entity types because they contain entities that were initially defined for another entity type:
“Cher” and “Roger Penrose” were initially defined as Actor
and Professional
and later also as Musician
and Professor
, respectively.
Therefore, the code above used additional checks on the left-hand side: The bindings entity in Musician
and entity in Professor
ensure that the entity IDs also refer to the intended entity type.
The example illustrates the main drawback of this modeling approach. If specific instances have multiple assigned entity types, then special treatment is needed, which complicates the logic.
Inspecting the musician
and professor
relations shows that only known properties are assigned:
// read query
def output:musician = musician
def output:professor = professor
Notice how “Cher” and “Roger Penrose” are included in these relations, even though they were initially defined as entities of a different type.
To propagate these definitions up, perform an analogous union for property relations:
// model
def insert:professional = actor; athlete; musician; professor
You can look at all properties associated with the first three entity instances in professional
:
// read query
def output[prop] = professional[prop, e] for e in last[top[3, Professional]]
This output shows that professional
does include names that were initially assigned to actor
, athlete
, musician
, or professor
.
This shows that these properties were successfully propagated to the professional
relation.
Furthermore, the properties may be accessed exactly the same way for professional
as for actor
, etc.
Top-Down
The properties may also be modeled in a top-down fashion where professional_topdown
is defined first and then actor_topdown
, athlete_topdown
, musician_topdown
, and professor_topdown
are defined.
Note the suffix _topdown
is used here to keep these relations separate from those defined in the bottom-up approach above.
// model
def professional_topdown:name = {
actor:name; athlete:name; musician:name;
professor:name; professional:name
}
def professional_topdown(:first_name, e, first) = {
actor:name; athlete:name; musician:name;
professor:name; professional:name
}(e, first, n...) from n...
def professional_topdown(:last_name, e, last) = {
actor:name; athlete:name; musician:name;
professor:name; professional:name
}(e, _, last)
The union over all the entity constructors can be more compactly expressed in the top-down approach than in the bottom-up approach above. However, this is only possible because all entity constructors have a very similar structure. If the individual constructors vary too much, then their union must be expressed more verbosely and potentially require delicate handling.
In the second step, the entity properties are propagated down to the individual (sub)entity level:
// model
def actor_topdown[p, x] = professional_topdown[p, x], Actor(x)
def athlete_topdown[p, x] = professional_topdown[p, x], Athlete(x)
def musician_topdown[p, x] = professional_topdown[p, x], Musician(x)
def professor_topdown[p, x] = professional_topdown[p, x], Professor(x)
Comparing the results for musician
(bottom-up approach) and musician_topdown
(top-down approach) with the relation equal
yields the following:
// read query
equal(musician, musician_topdown)
The cell output reveals that the two relations are in perfect agreement, demonstrating that the two modeling approaches are equivalent.
Displaying Entities (Continued)
Several types of entities were defined in the preceding sections, and some of the entities belonged to multiple types.
This section illustrates how to define a show
relation for all of these entities.
Instead of defining show
for each entity type individually, you can take advantage of the type Professional
that includes all people defined so far:
// model
def show[p in Professional] = {
"%(professional[:first_name, p]) %(professional[:last_name, p] <++ "")"
}
Note the use of the left_override
(<++
) from the Standard Library to handle professionals with no last name.
Consider the entity instances from the Bottom-Up section.
The show
relation may be used to return human-readable strings instead of the internal entity ID:
// read query
def output[prop] = show[e], professional[prop, e] from e in last[top[3, Professional]]
// read query
def output[prop] = show[e], actor[prop, e] from e in last[top[3, Actor]]
Linking Entities
The preceding sections show how to define entity types in hierarchy and how to represent entity properties. This section illustrates how to define relations involving multiple entities.
Consider, for example, a binary parent
relation with a tuple of the form (child, parent)
for each fact of the form “The parent of child
is parent
.”
Each tuple in such a relation corresponds to an edge in a knowledge graph.
Information about knowledge graph edges often comes from an external source like a CSV file.
An edge relation may also be derived in part from definitions: The father
edges in an ancestry database would be implied by the rule that a father is a parent who is also a man, assuming that relations are available which specify the parent
edges and instances of the type Man
.
The code below shows how to define a binary relation has_nationality
that connects the Professional
entities to the Country
entities defined in the Constructing Entities section.
Start with the contents of a CSV file:
// write query
def config:data = """
first_name,last_name,country
John,Lennon,England
Bjork,,Iceland
Cher,,United States
Sharon,Stone,United States
Prince,,United States
Tim,Curry,United States
Roger,Penrose,England
Robert,DeNiro,United States"""
def insert:nationality_csv = load_csv[config]
See the CSV Import guide for more on importing CSV data.
The next step is to match the properties to their corresponding entities and collect the entity pairs in the relation has_nationality
.
// model
def has_nationality(p, c) = {
professional:first_name[p] = nationality_csv[:first_name, line_number]
// use "equal" to account for null case:
and equal(professional:last_name[p], nationality_csv[:last_name, line_number])
and country:name[c] = nationality_csv[:country, line_number]
from line_number
}
The entries in has_nationality
are as expected:
// read query
def output(p, e) = {
has_nationality(pent, cent) and p = show[pent] and e = show[cent] from pent, cent
}
The graphviz
module can be used to visualize this relation:
// read query
module graph
def node(x) = has_nationality(x, _) or has_nationality(_, x)
def edge = has_nationality
def node_attribute[n, "label"] = show[n]
def edge_attribute[x, y, "label"] = "nationality", has_nationality(x,y)
end
def output = ::std::display::graphviz[graph]
Entities and Integrity Constraints
You can use integrity constraints to check the consistency of your entity definitions.
For example, to make sure that has_nationality
connects the right types of entities, add this integrity constraint:
// read query
ic has_nationality_types(x,y) {
has_nationality(x, y) implies Professional(x) and Country(y)
}
This integrity constraint ensures that first_name
is defined for all professionals:
// read query
ic all_have_first_name(e) {
Professional(e) implies professional:first_name(e, _)
}
This one checks that actor
and musician
are only defined for entities of the corresponding type:
// read query
ic actor_type(prop, e) {(actor(prop, e, x...) from x...) implies Actor(e)}
ic musician_type(prop, e) {(musician(prop, e, x...) from x...) implies Musician(e)}
You can also write an integrity constraint that verifies that all Actor
entities, for instance, are also of type Professional
.
Mathematically, this means that Actor
should be a subset (⊆) of Professional
:
// read query
ic Actor_subset_Professional {
Actor ⊆ Professional
}
Summary
In brief, you can use entities in Rel to model concrete or abstract entities in the real world. To construct entities, you typically need to declare the entity type and define two other relations. Moreover, you can have entities of multiple types or ones that require a certain hierarchal structure. The latter can be modeled via the top-down or bottom-up approaches. Finally, using integrity constraints can ensure that the entity definitions are consistent.
Footnotes
-
Note that even if the British pound is currently trading against the US dollar at a 2-1 ratio, a price of 12 dollars may be regarded as different from a price of six pounds, because a 12-dollar purchase and a six-pound purchase would have different currency-conversion implications. For the purposes of this discussion, two monetary amounts specify the same price only if the currency is the same and the number of currency units is the same. ↩
-
The object computed by the expression
^Country["Iceland"]
is a hash (opens in a new tab) of the nameCountry
and the string"Iceland"
. ↩