Skip to content
Entities

Entities

This concept guide introduces entities and shows how they are defined and used in Rel.

Introduction

An entity is something that exists in the real world and can be uniquely identified independently of its properties or other relationships. An entity may be a concrete object, such as a car or a person, or more abstract, such as an organization.

To say that a company, for example, can be uniquely identified independently of its properties means that a company may make changes to its products, its affiliates, or even its name while remaining the same company.

By contrast, a product’s price is not an entity, because a price is fully characterized by a number and a currency denomination. In other words, if a product’s price is 12 dollars, then changing either “12” or “dollars” means that you are no longer referring to the same price1. Non-entity elements like price are described using value types. A RelationalAI database is a knowledge graph where each node is an entity or value type and where the edges represent relationships between them.

Rel’s entities are a construct designed to model real-world entities. They offer a way to refer to objects independently of identifying real-world information.

To see why this is useful, consider a database representing a set of professionals and their countries of origin. One possible approach is to represent each entity using an identifying string, like a person’s full name or a country’s name. However, this approach makes it difficult to handle a name change.

For example, when the country Türkiye (opens in a new tab) changed its name from Turkey, you would have to be careful to make the change everywhere it appears in the database. This could lead to inconsistencies if the name were not changed everywhere, or if the string "Turkey" were updated in a relation in which it refers to a bird.

Rel’s solution to this problem is to provide elements to refer to countries rather than country names. Then you can store the relationship between each country and its name in a base relation. You can also use the entity elements — rather than names or some other identifying information — to reference countries elsewhere in the database.

With this design, a name change can be effected by updating a single relation. Furthermore, you can write queries without worrying about spurious equalities such as the example string equality between "Turkey" the country name and "Turkey" the bird. For further discussion of the advantages of using entities, see Graph Normal Form.

Basics

This section describes how to define entities in Rel and how to represent properties of entities.

Constructing Entities

The following declaration makes Country an entity type for which each country is identified via a string:

// model
 
entity type Country = String

This declaration defines a single binary relation, denoted ^Country, that maps identifying strings to country entities. For example, ^Country["Iceland"] is a special element in the database that can be used to represent the country of Iceland2. The identifying value "Iceland" is called the key of the entity ^Country["Iceland"].

Typically when working with an entity type, you want to define two other relations: one that maps entities to their identifying property and another that contains entities. You might call the former relation country_name in this case, and you would conventionally call the latter relation Country:

// write query
 
def original_country = {
    "United States";
    "England";
    "Mexico";
    "France";
    "Iceland";
    "Russia";
    "Turkey"
}
 
def insert:country_name = {
  ^Country[name], name from name in original_country
}
// model
 
def Country(c) = country_name(c, _)
🔎

Entity names begin with a capital letter by convention, following the naming convention of other type relations like String or Number.

The following cell’s output shows that the cells above created seven Country entities:

// read query
 
def output = count[Country]
 

It is necessary to define country_name rather than using ^Country because Rel does not provide a way to extract an entity’s key from the entity. For example, if c is defined using the definition def c = ^Country["Mexico"], then it is not possible to “ask” the value c to report that the string "Mexico" was used to construct it. In other words, the following query is not permitted and results in an error:

// read query
 
def c = ^Country["Mexico"]
 
def output(name) = {
    ^Country[name] = c
}
EXCEPTION
We cannot infer that the following definition can be evaluated. Typically, this happens when there are unground variables in the definition. (A variable is unground if Rel does not know how to iterate through values to bind the variable.) The definition is deemed "inadmissible" by our groundness inference algo.
an unexpected exception occurred while executing the transaction

This language behavior is important because it allows the database user to accommodate changes in identifying data. Suppose you want to update the database to reflect Türkiye’s name change from Turkey, for example. You can do that without having to change the entity everywhere it appears in the database:

// write query
 
def delete:country_name(e, name) = {
    country_name(e, name) and name = "Turkey"
}
 
def insert:country_name(e, "Türkiye") = {
    e = ^Country["Turkey"]
}

With this update in place, country_name reports the updated country name. For example, consider the following query for the countries whose names start with "T":

// read query
 
def output = country_name[{
    c : substring[country_name[c], 1, 1] = "T"
}]
 

The database is still using the entity generated from the original name, but the relation country_name has been updated to ensure that it contains the correct name.

An entity-constructing relation like ^Country should only be used to create entities. The relationship between an entity and its identifying information should be managed through other relations like country_name.

In this example, each country’s key is a single string value. Entities can also be identified by multiple values. See the section Compound Keys for more details.

Properties of Entities

Working with entities in a real-world application means associating them with values or other entities. The section Constructing Entities illustrates one example of such a property relation, namely the relation country_name that maps each country entity to its name. Likewise, you might want to define another relation called country_population that maps each entity to its population, etc.

A simpler approach is to define a module called country such that country[:name] maps each country entity to its name and country[:population] maps each country to its population, etc.

In other words, you can store properties of countries in an arity-3 relation called country whose elements are of the form (prop, c, value), where:

  • prop is the name of a property, like :population.
  • c is the country entity.
  • value is the value of the property prop for the country c.

The following cell adds four new tuples to the relation country. It assigns population information to Iceland and the United States, and it also defines two name aliases for the United States:

// write query
 
def new_data = {
    :population, "Iceland", 364134;
    :population, "United States", 328239523;
    :name_alias, "United States", "USA";
    :name_alias, "United States", "United States of America"
}
 
def insert:country(:name, e, value) = {
    country_name(e, value)
}
 
def insert:country(prop, e, value) = {
    new_data(prop, name, value) and country_name(e, name) from name
}

You can see all of the properties defined so far by displaying the contents of the relation country:

// read query
 
def output(prop, name, value) = {
  country(prop, e, value) and country(:name, e, name) from e
}
 

Displaying Entities

Note that the table above shows the name of each country in the second column, rather than showing the entity value itself. It’s best to avoid directly displaying entities in output tables, because entities are represented internally as large integers that are not very useful to look at:

// read query
 
def output = country:name
 

Note that country:name is shorthand for country[:name].

🔎

At first glance it might seem preferable to use entity identifiers with real-world meaning rather than large integers. However, since the purpose of entities is to represent objects whose identity persists despite changes in identifying information, it is actually essential to use a representation that is independent of real-world attributes.

Identification systems used in the real world use the same approach. A Social Security Number, for example, refers to an individual without describing any information like their name, date of birth, etc. These features are maintained in databases instead.

For this reason, it can be helpful to define a dedicated relation with a name like show whose purpose is to make the entity human-readable in the table:

// model
 
def show[e] = "^Country[“%(country:name[e])”]"

One succinct way to use show to create new relations involving the entity type is to use from to project out the entity:

// read query
 
def output = show[e], country:population[e] from e

This query says, “Find all database elements e such that the product show[e], country:population[e] is nonempty, and collect all such products.” For more details on the from keyword in Rel, see the From Expressions section in Rel Language Reference.

The reason it’s helpful to use a generic name like show rather than a specific relation like country:name for this purpose is that show can be used for multiple entity types at the same time. This idea is illustrated in the section on hierarchies below.

Compound Keys

Often, more than one value is required to uniquely identify an entity. A simple example is a person who is identified by a first and last name that are stored as two separate values, as should be the case in a normalized modeling approach.

The code cells below illustrate how to define an Actor entity, uniquely identified by first and last name:

// model
 
entity type Actor = String, String

The constructor ^Actor is a relation of arity 3 that maps each name pair to an entity value that uniquely references the actor.

// write query
 
def actor_names = {
  ("Sharon", "Stone");
  ("Tim", "Curry");
  ("Robert", "DeNiro");
  ("Michael", "Jordan")
}
 
def insert:actor:name = {
  ^Actor[name...], name... from name... in actor_names
}

Note the use of varargs to represent the (first_name, last_name) pair with a single variable. Continuing:

// model
 
def Actor(a) = actor:name(a, name...) from name...
// read query
 
def output(name...) = actor:name(e, name...) from e

This example identifies actors using two strings, but an entity declaration may be nn-ary for any value of nn. For example, the entity key may contain 3, 4, or 5 elements, in which case the corresponding constructor would have arity 4, 5, or 6, respectively.

Advanced Entity Construction

Multiple Constructors

In some cases it might be necessary to construct entities of the same type in different ways. For example, the set of identifying features may not be uniform across all of the entities. The actress Cher, for example, goes by a single name rather than two names.

To accommodate this situation, Rel allows multiple declarations for the same entity type:

// model
 
entity type Actor = String

The entity type Actor has two constructors: a ternary one and a binary one. The binary constructor can be used to insert a tuple for Cher into the relation actor:name:

// write query
 
def insert:actor:name(e, "Cher") = {
  ^Actor("Cher", e)
}
// read query
 
def output(name...) = actor:name(e, name...) from e
 

Note that it is necessary in this example to use parenthetical relational application rather than square brackets, like ^Actor["Cher"]. That’s because the presence of the ternary constructor means that there are infinitely many tuples in ^Actor whose first element is "Cher". Thus ^Actor["Cher"] refers to infinitely many tuples and cannot be resolved by the system. Using parentheses lets Rel know that the expression refers to the binary constructor only.

It is also possible to overload the same constructor by type. Consider the following additional declarations:

// model
 
entity type Musician = String
entity type Musician = String, String
entity type Athlete = String, String
entity type Professor = String, String

This cell defines four constructors for three entity types. The entity type Musician has two constructors of different arities. One constructor has arity 2 and the other has arity 3.

// write query
 
def musician_names = {
  ("Bjork");
  ("Prince");
  ("Paul", "McCartney");
  ("John", "Lennon")
}
 
def insert:musician:name(e, n) = {
  musician_names(n) and ^Musician(n, e)
}
 
def insert:musician:name(e, fn, ln) = {
  musician_names(fn, ln) and ^Musician(fn, ln, e)
}
 
def insert:athlete:name = {
  ^Athlete[name...], name... from name... in {("Michael", "Jordan")}
}
 
def insert:professor:name = {
  ^Professor[name...], name... from name... in {("Michael", "Jordan")}
}
// model
 
def Musician(e) = musician:name(e, name...) from name...
def Athlete(e) = athlete:name(e, name...) from name...
def Professor(e) = professor:name(e, name...) from name...
// read query
 
def output:musician(name...) = musician:name(e, name...) from e
def output:athlete(name...) = athlete:name(e, name...) from e
def output:professor(name...) = professor:name(e, name...) from e
 

Note that the Professor entity and the Athlete entity have the same identifying features: a first name of Michael and a last name of Jordan. However, these entities are distinct because the entity types are different.

Constructors can also be overloaded by data type and will generate distinct entities for each type. For example, suppose that an organization changes its employee IDs from numbers to strings but still needs to support both in its database:

// model
 
entity type EmployeeID = Int
entity type EmployeeID = String
// read query
 
^EmployeeID[128471294];
^EmployeeID["128471294"]

As expected, the two entities are distinct.

Entities of Multiple Types

Cher is known for her music as well as her acting, so it makes sense to classify her not just as Actor but also as Musician. Since the relation Musician is managed by the user and not by the system, Cher’s multiple roles may be accommodated by adding the previously created entity to the musician:name relation:

// write query
 
def insert:musician:name(e, n) = {
  actor:name(e, n) and n = "Cher"
}

It also would have been possible to create a new entity for Cher as a Musician using the ^Musician constructor. However, this would have resulted in two separate entities referring to the same underlying person. This would defeat the purpose of entities and should be avoided as much as possible.

💡

Ensuring that individuals are consistently identified in the database is known as entity resolution. When joining two knowledge bases or importing large amounts of raw data, it may not be feasible to do entity resolution perfectly.

Entity Hierarchies

Hierarchies arise often when modeling using entities. For example:

  1. Corporation leadership structure: executive, manager, …, and intern.
  2. The classification of life (life, domain, kingdom, …).
  3. A family tree (children, parents, grandparents, …).

The preceding sections, for example, have introduced several examples of professions: actor, athlete, musician, and professor. To model the Professional type that encompasses all of these, you can use a union:

// model
 
def Professional = Actor; Athlete; Musician; Professor

You can also declare Professional as an entity type so that new professionals can be created without referring to any particular subtype like actor or athlete:

// model
 
entity type Professional = String, String
// write query
 
def insert:professional:name(e, first, last) = {
    ^Professional(first, last, e) and first = "Roger" and last = "Penrose"
}
// model
 
def Professional(e) = professional:name(e, name...) from name...
// read query
 
def output(name...) = professional:name(e, name...) from e
def output:count = count[Professional]

Since Roger Penrose is a Professor in Mathematics, he can be added to the Professor relation:

// write query
 
def insert:professor:name(e, first, last) = {
  professional:name(e, first, last) and first = "Roger" and last = "Penrose"
}

Note that it would have been easier to add him first as Professor because the definition def Professional = Actor; Athlete; Musician; Professor would have automatically added him as Professional. Nevertheless, the example illustrates the flexibility afforded by being able to manage the unary relations Professional and Professor yourself.

Properties of Entities

There are a few approaches to defining relations involving entities in a way that respects hierarchical levels:

  • Bottom-up: Define the entity properties at the lowest level of the hierarchy — Actor, Athlete, etc. — and propagate them up the hierarchy to Professional.

  • Top-down: Define the properties at the top level of the hierarchy — Professional — and propagate them down to Actor, Athlete, etc.

  • Mixed: Define the properties on the level where the entity IDs are defined and propagate the information up and down. For example, Roger Penrose was defined on the higher Professional level, so his properties would be defined there. The other individuals in the database were defined using their profession, so their properties would be defined at that level.

The next two sections illustrate the first two of these three approaches.

Bottom-Up

The bottom-up approach involves assigning properties at the level of actor, athlete, etc.:

// write query
 
def insert:actor:first_name(e, first_name) = {
    actor:name(e, first_name, last_name...) from last_name...
}
 
def insert:actor:last_name(e, last_name) = {
    actor:name(e, first_name, last_name) from first_name
}
 
def insert:musician:first_name(e, f) = musician:name(e, f, l...) from l...
def insert:musician:last_name(e, l) = musician:name(e, _, l)
def insert:musician(prop, e in Musician, x) = actor(prop, e, x)
 
def insert:professor:first_name(e, n) = professor:name(e, n, _)
def insert:professor:first_name(e in Professor, n) = professional:name(e, n, _)
def insert:professor:last_name(e, n) = professor:name(e, _, n)
def insert:professor:last_name(e in Professor, n) = professional:name(e, _, n)

Note that since some actors are musicians, the last rule for musician refers to actor — but only for those cases where the entity is an element of the relation Musician.

Thus the code above defines two properties, first_name and last_name, for the entity types Actor, Athlete, Musician, and Professor.

For Actor and Athlete, these properties are defined directly from their constructors. Because the constructors are unique for each entity type, it isn’t necessary to check that the entity ID entity is of the correct type.

The situation is a bit more complicated for the Musician and Professor entity types because they contain entities that were initially defined for another entity type: “Cher” and “Roger Penrose” were initially defined as Actor and Professional and later also as Musician and Professor, respectively. Therefore, the code above used additional checks on the left-hand side: The bindings entity in Musician and entity in Professor ensure that the entity IDs also refer to the intended entity type.

The example illustrates the main drawback of this modeling approach. If specific instances have multiple assigned entity types, then special treatment is needed, which complicates the logic.

Inspecting the musician and professor relations shows that only known properties are assigned:

// read query
 
def output:musician = musician
def output:professor = professor
 

Notice how “Cher” and “Roger Penrose” are included in these relations, even though they were initially defined as entities of a different type.

To propagate these definitions up, perform an analogous union for property relations:

// model
 
def insert:professional = actor; athlete; musician; professor

You can look at all properties associated with the first three entity instances in professional:

// read query
 
def output[prop] = professional[prop, e] for e in last[top[3, Professional]]
 

This output shows that professional does include names that were initially assigned to actor, athlete, musician, or professor. This shows that these properties were successfully propagated to the professional relation. Furthermore, the properties may be accessed exactly the same way for professional as for actor, etc.

Top-Down

The properties may also be modeled in a top-down fashion where professional_topdown is defined first and then actor_topdown, athlete_topdown, musician_topdown, and professor_topdown are defined. Note the suffix _topdown is used here to keep these relations separate from those defined in the bottom-up approach above.

// model
 
def professional_topdown:name = {
    actor:name; athlete:name; musician:name;
    professor:name; professional:name
}
 
def professional_topdown(:first_name, e, first) = {
    actor:name; athlete:name; musician:name;
    professor:name; professional:name
}(e, first, n...) from n...
 
def professional_topdown(:last_name, e, last) = {
    actor:name; athlete:name; musician:name;
    professor:name; professional:name
}(e, _, last)

The union over all the entity constructors can be more compactly expressed in the top-down approach than in the bottom-up approach above. However, this is only possible because all entity constructors have a very similar structure. If the individual constructors vary too much, then their union must be expressed more verbosely and potentially require delicate handling.

In the second step, the entity properties are propagated down to the individual (sub)entity level:

// model
 
def actor_topdown[p, x] = professional_topdown[p, x], Actor(x)
def athlete_topdown[p, x] = professional_topdown[p, x], Athlete(x)
def musician_topdown[p, x] = professional_topdown[p, x], Musician(x)
def professor_topdown[p, x] = professional_topdown[p, x], Professor(x)

Comparing the results for musician (bottom-up approach) and musician_topdown (top-down approach) with the relation equal yields the following:

// read query
 
equal(musician, musician_topdown)

The cell output reveals that the two relations are in perfect agreement, demonstrating that the two modeling approaches are equivalent.

Displaying Entities (Continued)

Several types of entities were defined in the preceding sections, and some of the entities belonged to multiple types. This section illustrates how to define a show relation for all of these entities.

Instead of defining show for each entity type individually, you can take advantage of the type Professional that includes all people defined so far:

// model
 
def show[p in Professional] = {
    "%(professional[:first_name, p]) %(professional[:last_name, p] <++ "")"
}

Note the use of the left_override (<++) from the Standard Library to handle professionals with no last name.

Consider the entity instances from the Bottom-Up section. The show relation may be used to return human-readable strings instead of the internal entity ID:

// read query
 
def output[prop] = show[e], professional[prop, e] from e in last[top[3, Professional]]
// read query
 
def output[prop] = show[e], actor[prop, e] from e in last[top[3, Actor]]

Linking Entities

The preceding sections show how to define entity types in hierarchy and how to represent entity properties. This section illustrates how to define relations involving multiple entities.

Consider, for example, a binary parent relation with a tuple of the form (child, parent) for each fact of the form “The parent of child is parent.” Each tuple in such a relation corresponds to an edge in a knowledge graph.

Information about knowledge graph edges often comes from an external source like a CSV file. An edge relation may also be derived in part from definitions: The father edges in an ancestry database would be implied by the rule that a father is a parent who is also a man, assuming that relations are available which specify the parent edges and instances of the type Man.

The code below shows how to define a binary relation has_nationality that connects the Professional entities to the Country entities defined in the Constructing Entities section. Start with the contents of a CSV file:

// write query
 
def config:data = """
first_name,last_name,country
John,Lennon,England
Bjork,,Iceland
Cher,,United States
Sharon,Stone,United States
Prince,,United States
Tim,Curry,United States
Roger,Penrose,England
Robert,DeNiro,United States"""
def insert:nationality_csv = load_csv[config]

See the CSV Import guide for more on importing CSV data.

The next step is to match the properties to their corresponding entities and collect the entity pairs in the relation has_nationality.

// model
 
def has_nationality(p, c) = {
        professional:first_name[p] = nationality_csv[:first_name, line_number]
    // use "equal" to account for null case:
    and equal(professional:last_name[p], nationality_csv[:last_name, line_number])
    and country:name[c] = nationality_csv[:country, line_number]
    from line_number
}

The entries in has_nationality are as expected:

// read query
 
def output(p, e) = {
  has_nationality(pent, cent) and p = show[pent] and e = show[cent] from pent, cent
}
 

The graphviz module can be used to visualize this relation:

// read query
 
module graph
   def node(x) = has_nationality(x, _) or has_nationality(_, x)
   def edge = has_nationality
   def node_attribute[n, "label"] = show[n]
   def edge_attribute[x, y, "label"] = "nationality", has_nationality(x,y)
end
def output = ::std::display::graphviz[graph]
graphviz

Entities and Integrity Constraints

You can use integrity constraints to check the consistency of your entity definitions. For example, to make sure that has_nationality connects the right types of entities, add this integrity constraint:

// read query
 
ic has_nationality_types(x,y) {
    has_nationality(x, y) implies Professional(x) and Country(y)
}

This integrity constraint ensures that first_name is defined for all professionals:

// read query
 
ic all_have_first_name(e) {
    Professional(e) implies professional:first_name(e, _)
}

This one checks that actor and musician are only defined for entities of the corresponding type:

// read query
 
ic actor_type(prop, e) {(actor(prop, e, x...) from x...) implies Actor(e)}
ic musician_type(prop, e) {(musician(prop, e, x...)  from x...) implies Musician(e)}

You can also write an integrity constraint that verifies that all Actor entities, for instance, are also of type Professional. Mathematically, this means that Actor should be a subset (⊆) of Professional:

// read query
 
ic Actor_subset_Professional {
    Actor  Professional
}

Summary

In brief, you can use entities in Rel to model concrete or abstract entities in the real world. To construct entities, you typically need to declare the entity type and define two other relations. Moreover, you can have entities of multiple types or ones that require a certain hierarchal structure. The latter can be modeled via the top-down or bottom-up approaches. Finally, using integrity constraints can ensure that the entity definitions are consistent.

Was this doc helpful?

Footnotes

  1. Note that even if the British pound is currently trading against the US dollar at a 2-1 ratio, a price of 12 dollars may be regarded as different from a price of six pounds, because a 12-dollar purchase and a six-pound purchase would have different currency-conversion implications. For the purposes of this discussion, two monetary amounts specify the same price only if the currency is the same and the number of currency units is the same.

  2. The object computed by the expression ^Country["Iceland"] is a hash (opens in a new tab) of the name Country and the string "Iceland".