Rel
REL CONCEPTS
Integrity Constraints

# Integrity Constraints

This concept guide covers integrity constraints in Rel.

## Goal#

Integrity Constraints (ICs) are used to ensure accuracy and consistency of the data in a relational database. They can guarantee that data insertion, updating, and other processes are performed in such a way that data integrity is not affected. The goal of this concept guide is to familiarize the reader with Rel’s IC functionality through a number of examples. After reading this concept guide you should expect to know how to write Integrity Constraints in Rel.

## Introduction#

An integrity constraint (IC) ensures, as the name suggests, the integrity of the database, requiring that its relations and data obey the constraint specified. We can broadly classify ICs into three categories:

• Type Constraints are used to check data types (e.g.: String(name)). These constraints are inherent to the data model (aka schema) and usually can be evaluated statically. They are discussed in detail in Section Type Constraints.
• Value Constraints are used to check that individual entries in a relation hold a specific value or lie within a specific range (e.g.: age >= 0). They are discussed in detail in Section Value Constraints.
• Logical Constraints are used to ensure that relations adhere to specific logical rules. They often involve multiple relations. Logical constraints are also a great way to express expert knowledge in our RKGMS. They are discussed in detail in Section Logical Constraints.

In Rel, an integrity constraint is defined in a similar way as any relation. Consequently, the same language constructs (i.e.: Relational Abstraction, Relational Application, and more) can be used. The technical details of how integrity constraints are defined can be found in the Integrity Constraints section in the Language reference.

In contrast to defining a relation, where variables are kept only if they adhere to the logical statement, an IC probes the opposite, i.e., variable combinations are kept only when they don’t fulfill the logical statement.

ICs may involve a single fact, a single relation, or multiple relations. In the Simple Example section, you can find the basics for writing integrity constraints.

More advanced ICs are discussed in the Common Use Cases section. More specifically, this discusses type, value, and logical constraints.

When an IC is violated, the running transaction is aborted and the appropriate information is logged in the output relation. It is even possible to write ICs that report back counter-examples that violate the IC.

In the Case study: Mini-IMDb Dataset section, you can see how ICs can be enforced on real-world data.

As we will see, the best practices when writing ICs are:

• to give descriptive names to ICs and
• to use arguments in the IC declaration.

Following both points will be most helpful when analyzing IC violations. Naming the IC will help us identify which IC is violated; and IC arguments will help us understand the reason for an IC violation, because counter-examples will be reported back for the specified arguments. In Section Best Practices we discuss in detail the advantages of these two points.

## Simple Example#

A very simple integrity constraint is:

// query

ic {
1 < 2
}

which checks the correctness of the formula 1 < 2. This IC, of course, holds independently of any particular data in the database.

Generally, an IC involves one or more relations because its purpose is to ensure the integrity of the data, or the correctness of definitions and rules. To check, for instance, that two relations are the same we can write:

// query

def R = {1;2}, {'a'; 'b'}
def S = {(1, 'a'); (1, 'b'); (2, 'a'); (2, 'b')}

ic {
equal(R, S)
}

Note that you should use equal(R, S) to check that two relations are the same, instead of =. For more details see the definition of equal.

To ensure that the reasoning between two facts holds you can use implies:

// query

def square(x, y) {
x = range[-3, 3, 1] and
y = x^2
}

ic is_square_positive(x, y) {
square(x, y) implies y >= 0
}

def output = square

In this example, the IC guarantees that all squares y are positive (as all squares are positive numbers, y>0).

The IC name/identifier is_square_positive(x, y) will be displayed in the output in case the IC doesn’t hold, in other words if y is negative. This is optional: If no name is provided by the user, the system assigns a unique identifier to the IC.

Giving descriptive names to ICs will help identify them if they are violated, as we discuss next.

## Common Use Cases#

This section demonstrates the various use cases for integrity constraints. You can start by defining a person relation that contains people and their properties. As you progress, you will define more and more person properties as needed to learn different IC use cases.

### Type Constraints#

You can start by defining the name property — in particular, person[:name, :first] and person[:name, :last] — to store first and last names.

// install

def person[:name, :first] = {
(1, "Fred");
(2, "David");
(3, "Brian");
(4, "Jacques");
(5, "Julie");
}

def person[:name, :last] = {
(1, "Smith");
(2, "Johnson");
(3, "Williams");
(4, "Brown");
(5, "Jones");
}

Now you can query it to see how your data looks.

// query

person

The name property is modeled in the highly normalized Graph Normal Form (GNF). GNF encourages dividing :name into the more basic properties :first and :last. You can now check that the first and last names are strings:

// query

ic person_type_check(id, sub_property){
forall(x:
person[:name, sub_property](id, x) implies String(x)
)
}

The IC person_type_check checks that all sub-properties under person[:name] have value type String. Let’s say that you want to add another sub-property to person[:name] called :MI, with the person’s middle initial. You would also have to store it as String to avoid violating the IC. For instance, you could not store it as Char without violating or modifying the IC.

Type ICs can often be evaluated statically, based only on the schema, without having to actually evaluate the relation. With the @static annotation, you can tell the RKGMS that you expect that this IC can be statically evaluated.

// query

@static
ic person_type_check(id, sub_property) {
forall(x :
person[:name, sub_property](id, x)  implies  String(x)
)
}

If an IC named ic_name is annotated as @static but cannot be statically evaluated, the system will issue a warning:

warn: declaration 'ic_name' is marked @static but not computed statically


### Value Constraints#

The next thing to check is value constraint. For that, add an age property, :age, to the person relation:

// install

def person[:age] = {
(1, 37);
(2, 31);
(3, 47);
(4, 27);
(5, 22);
}
def person_age = person[:age]

An IC can check that all individuals are 18 or older:

// query

person[:age](id, age)
implies
age >= 18
}

Value constraints can also be used to check the arity of relations. You can check that person[:age] has arity 2:

// query

ic arity_check(x){
arity[person_age] = x implies x = 2
}

Note that by using the IC argument x, you will be informed of the actual value of any arity other than 2 if the IC is violated.

Note that arity is in general an over-approximation that is statically evaluated. In practice, sometimes false alarms can happen if predicates are overloaded by arity. In such cases, the IC can be adjusted accordingly.

The cardinality of a relation can also be checked using value constraints. You can check that person[:age] has five records:

// query

ic cardinality_check(x){
count[person_age] = x implies x = 5
}

### Logical Constraints#

#### Disjointness#

Next, you can define a :gender property in person and check the disjoint property of the derived male (‘M’), female (‘F’), and non_binary (‘X’) relations.

// install

def person[:gender] = {
(1, 'M');
(2, 'M');
(3, 'F');
(4, 'F');
(5, 'X');
}

def male(p) = person(:gender, p, 'M')
def female(p) = person(:gender, p, 'F')
def non_binary(p) = person(:gender, p, 'X')
def gender(p) = person(:gender, p, _)

// query

ic is_disjoint() {
disjoint(female, male) and
disjoint(non_binary, male ∪ female)
}

This checks that male and female are disjoint relations and that non_binary is disjoint from the union, indicated by ∪ of the male and female relations.

#### Subset#

You can use the same gender property and check whether the derived gender relations are subsets of gender.

// query

ic is_subset() {
subset(female, gender) and
male ⊆ gender and
non_binary ⊆ gender
}

In Rel, the built-in subset relation can be called either by its name or by the symbol ⊆.

#### Unique Value#

Using the :age property in person you can define a unique value constraint to check that each person only has one age value.

// install

def person[:age] = {
(1, 37);
(2, 31);
(3, 47);
(4, 27);
(5, 22);
}
// query

ic is_unique(id) {
forall(age1, age2:
(person[:age](id, age1) and
person[:age](id, age2))
implies
age1 = age2
)
}

In the IC above, the id is an IC argument and in case of a violation, the error report returns all id values that violate the IC.

The uniqueness of a person’s age can also be checked using the built-in function relation:

// query

ic is_unique {
function(person[:age])
}

Another alternative to check the uniqueness is to count all ages associated with a person:

// query

ic is_unique(id, x) {
count[person[:age, id]] = x implies x = 1
}

Caution: Intuitively we may want to write the IC this way:

ic (id) {
count[person[:age, id]] = 1
}

However, this will lead to an unbound error for the argument id, as discussed in the Avoiding Unbound Argument Errors When Evaluating ICs section.

#### Symmetry#

Now we want to add a logical constraint to check the symmetry of a relation. To demonstrate this, we define an is_friends relation that collects pairs of people that are friends with each other. Obviously, friendship is mutual and consequently the relation is_friends should reflect that. We do so by checking that the relation is symmetric w.r.t. both arguments:

// install

def is_friends = {
(1, 2);
(1, 5);
(2, 1);
(3, 4);
(4, 3);
(5, 1);
}

ic is_symmetric(id1, id2) {
is_friends(id1, id2) implies is_friends(id2, id1)
}

The symmetry of a relation can also be checked in a point-free way (which lets us skip the id1 and id2 variables) with the built-in transpose relation defined in the Standard Library.

// query

ic is_symmetric {
equal(is_friends, transpose[is_friends])
}

While this formulation is more compact, it has no arguments, so it will not show us the counter-example values for id1, id2 if the IC is violated.

#### Transitivity#

Another logical constraint that might be of interest is transitivity. To demonstrate this, define a property :dob which refers to the date of birth of a person, and an is_older relation containing pairs (a, b) where person a is older than person b.

// install

def person[:dob] = {
(1, 1984);
(2, 1990);
(3, 1974);
(4, 1994);
(5, 1999);
}

def is_older(id1, id2) { person[:dob][id1] < person[:dob][id2]}

We know that the relation is_older should be transitive: if person a is older than person b which is older than person c, then a is also older than c. So let’s check that this is true by writing the following IC:

// query

ic is_transitive {
is_older.is_older ⊆ is_older
}

A relation that doesn’t have to be transitive is the friendship relation. If a is friends with b and b is friends with c then we can’t conclude that a is (or isn’t) friends with c. We can check whether or not our is_friends relation is transitive.

ic is_transitive(id1, id2, id3) {
is_friends(id1, id2) and is_friends(id2, id3)
implies
is_friends(id1, id3)
}

We use the point-wise notation, where we state explicitly the relation variables id1, id2, and id3, so we can capture friend combinations where the transitivity condition doesn’t hold, and indeed we get the following IC violation:

ERROR: LoadError: ClientException: Transaction aborted. The following problems caused the abort:
-- INTEGRITY_CONSTRAINT_VIOLATION ---------------------------------------------------
Integrity constraints are violated. The ICs are defined as follows:

15| ic is_transitive(id1, id2, id3) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16|     is_friends(id1, id2) and is_friends(id2, id3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17|     implies
^^^^^^^^^^^^
18|     is_friends(id1,id3)
^^^^^^^^^^^^^^^^^^^^^^^^
19| }
^^

error: ICs violated
Transaction output:
abort() => [()]
output(:is_transitive, Int64, Int64, Int64) => [(1, 2, 1), (1, 5, 1), (2, 1, 2), (2, 1, 5), (3, 4, 3), (4, 3, 4), (5, 1, 2), (5, 1, 5)]

From the error report you can see that there are eight tuples (a, b, c) that violate the transitivity condition. In particular, there are two classes of violations:

• Trivial violations like (1, 2, 1) where the first and last persons are the same and you can’t be friends with yourself.
• More interesting triples like (2, 1, 5) where all three people are different.

This example clearly highlights why arguments in your IC definition can be very helpful in understanding the data.

## Integrity Constraint Violations#

This section will discuss IC violations and what information about the violation is reported back to the user. The Best Practices section shows how to ensure you get the most actionable information back when an IC is violated.

Look what happens when you define an IC with a statement that evaluates to false:

ic violation_check {
1 = 2
}

As you can see below, the transaction is aborted due to the IC violation. In general, any transaction that triggers an IC violation will be aborted. This may happen because (1) you try to define an IC that does not hold given the current data, or (2) you try to change the data in a way that violates an existing IC.

ERROR: LoadError: ClientException: Transaction aborted. The following problems caused the abort:
-- INTEGRITY_CONSTRAINT_VIOLATION ---------------------------------------------------
Integrity constraints are violated. The ICs are defined as follows:

2| ic violation_check = {
^^^^^^^^^^^^^^^^^^^^^^^
3|     1 = 2
^^^^^^^^^^^^^^^^
4| }
^^

error: ICs violated
Transaction output:
abort() => [()]
output(:violation_check) => [()]

The error message highlights which IC is violated. Violated ICs are also reported in the output relation. The line output(:violation_check) => [()] states that the relation violation_check, called the IC relation, holds the tuple (), which represents true. Because it is not empty, the IC is violated.

🔎

An IC is violated when its IC relation is not empty: the IC relation is the negation of the actual expression defined in the IC declaration.

Concretely, in the above example, the IC relation violation_check is equivalent to the expression: not 1 = 2, which is true. Hence violation_check holds a logical true, represented by the tuple ().

The Integrity Constraints with Arguments section shows that if an IC is defined with arguments then the IC relation contains all counter-examples that violate the IC.

## Case Study: Mini-IMDb Dataset#

This section uses the Mini-IMDb data as an example of a real world dataset and shows how ICs can enforce a schema and encode expert knowledge in your database.

To load the Mini-IMDb dataset, we use the load_csv functionality. First we define a schema and a path to our data files:

// install

def url = "s3://relationalai-documentation-public/dataset/imdb_job/data_mini/"

def config_name[:path] = concat[url, "name.csv"]
def config_name[:schema, :person_id] = "int"
def config_name[:schema, :name] = "string"

def config_title[:path] = concat[url, "title.csv"]
def config_title[:schema, :movie_id] = "string"
def config_title[:schema, :title] = "string"
def config_title[:schema, :year] = "int"

def config_cast_info[:path] = concat[url, "cast_info.csv"]
def config_cast_info[:schema, :movie_id] = "int"
def config_cast_info[:schema, :person_id] = "int"
def cast_info_csv = load_csv[config_cast_info]

Next we define relations to access the data in Rel:

// install

def movie[:title] = title_csv[:movie_id, pos], title_csv[:title, pos] from pos
def movie[:year]  = title_csv[:movie_id, pos], title_csv[:year, pos] from pos
def movie[:id]    = first[movie[_]]

def actor[:name] =  name_csv[:person_id, pos], name_csv[:name, pos] from pos

def cast = cast_info_csv[:movie_id, pos], cast_info_csv[:person_id, pos] from pos

def co_actors(id1, id2) =
cast(movie_id, id1) and
cast(movie_id, id2) and
id1 != id2
from movie_id

### Defining Mini-IMDb Integrity Constraints#

Now, let’s define ICs on this Mini-IMDb dataset. First we check that the actor name is a String and the actor identifier is a Number.

Note that there is an advantage in creating entities for concepts like actors or movies and using the entity keys to refer to them compared to using simple identifiers like integers. For details see the Entities concept guide.

// query

@static
ic person_type_check(name, id) {
actor(:name, id, name)
implies
Number(id) and String(name)
}

In general, we should check all the other data types. For the purpose of this concept guide, we will consider only this one example.

As discussed above, we can also insert expert knowledge into our database using ICs. For instance, we can check that no movie year is made before 1874 because we know the oldest entry in IMDb is the documentary “Passage de Venus” made in 1874.

// query

ic value_check(mid, year) {
movie(:year, mid, year)
implies
year >= 1874
}

Next, you can also check that the co-author relation, co_actors, is symmetric as it should be:

// query

ic is_symmetric {
equal(co_actors, transpose[co_actors])
}

and indeed it is.

The Mini-IMDb dataset is quite small and the number of useful ICs is quite small. As the dataset grows and more information is stored in the knowledge graph, the need for integrity constraints also grows, to ensure that the knowledge is stored in a correct and consistent manner.

## Best Practices#

This section describes some best practices for writing ICs so that you have the most actionable information to hand when an IC is violated.

### Integrity Constraints with Arguments#

In the previous examples, each IC had no arguments and the IC violation could only show that the IC was violated. To gain more information about why and for which values the IC was violated, you need to add arguments to the IC.

For example, when checking that the relation person[:age]

def person[:age] = {
(1, 37);
(2, 31);
(3, 47);
(4, 27);
(5, 12);
}

only contains people who are older than 18

ic value_violation(id, age) {
person[:age](id, age)
implies
age >= 18
}

you get the following error report:

ERROR: LoadError: ClientException: Transaction aborted. The following problems caused the abort:
-- INTEGRITY_CONSTRAINT_VIOLATION ---------------------------------------------------
Integrity constraints are violated. The ICs are defined as follows:

10| ic value_violation(id, age) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11|     person[:age](id, age)
^^^^^^^^^^^^^^^^^^^^^^^^^^
12|     implies
^^^^^^^^^^^^
13|     age >= 18
^^^^^^^^^^^^^^
14| }
^^

error: ICs violated
Transaction output:
value_violation(Int64, Int64) => [(5, 12)]

The report points out that the person with id = 5 has age 12, and this fact is the cause of the IC violation, since 12 < 18.

If you write the same IC without arguments, expecting it to hold for all id and age values,

ic value_violation{
forall(id, age:
person[:age](id, age)
implies
age >= 18
)
}

the IC violation is reported as follows:

ERROR: LoadError: ClientException: Transaction aborted. The following problems caused the abort:
-- INTEGRITY_CONSTRAINT_VIOLATION ---------------------------------------------------
Integrity constraints are violated. The ICs are defined as follows:

10| ic value_violation() {
^^^^^^^^^^^^^^^^^^^^^^^
11|     forall(id, age:
^^^^^^^^^^^^^^^^^^^^
12|         person[:age](id, age)
^^^^^^^^^^^^^^^^^^^^^^^^^^
13|         implies
^^^^^^^^^^^^^^^^
14|         age >= 18)
^^^^^^^^^^^^^^^^^^^
15|     }
^^^^^^

error: ICs violated
Transaction output:
value_violation() => [()]

There is no additional information about which data record caused the IC violation.

In particular, the only information reported back is that the IC evaluated to true (the relation {()}), which means the expression in the IC evaluated to false.

### Avoiding Unbound Argument Errors in ICs#

When checking an IC, the system evaluates its negation. Any instance that satisfies the negation will be a counter-example. For the evaluation to take place, the variables must be bound to a finite domain.

Otherwise, it is possible that an infinite number of values can violate the IC. This happens when arguments are not bound, which triggers an unbound variable error.

Consider this example:

def R = {1; 2}

ic check_R_values(id) {
R(id) and id <= 2
}

It results in an error that tells us that (1) the IC is violated and (2) the variable id is unbound:

INTEGRITY_CONSTRAINT_VIOLATION
error: ICs violated
-- UNBOUND_VARIABLE -----------------------------------------------------------------
3| ic check_R_values(id) {
^^^^^^^^^^^^^^^^^^^^^^^^^^
4|     R(id) and id <= 2
^^^^^^^^^^^^^^^^^^^^^^
5| }
^^

error: expression is not bound. Maybe this is a typo, or consider adding '@inline' if the definition is intentionally infinite?
-- UNBOUND_VARIABLE -----------------------------------------------------------------
3| ic check_R_values(id) {
^^^

error: 'id' is not bound. Maybe this is a typo, or consider adding '@inline' if the definition is intentionally infinite?
Transaction output:
abort() => [()]

At first glance, these error reports might be surprising, because all values in R fulfill the condition id <= 2. However, because an IC relation is the negation of the specified expression, the IC relation check_R_values(id) is equivalent to

id: not (R(id) and id <= 2)

and there are an infinite number of id values that make the statement not (R(id) and id <= 2) true. In particular, nothing here says that the values of id must be in R. For instance, values like id = 3 or id = "abc" are valid counter-examples that violate the IC.

You can rewrite the above IC using implies to bind the variable id to relation R and get the desired integrity check, which is valid:

// query

def R = {1; 2}

ic check_R_values(id) {
R(id) implies id <= 2
}

### Named Integrity Constraints#

Similar considerations apply to naming ICs. The system creates a unique identifier for ICs that lack a user-provided name. If you have multiple ICs without a name, it will be hard to know which IC was violated.

Take the following toy example:

ic (id){
id = {1;2;3}
implies
id <= 2
}

ic (id){
id = {1;2;3}
implies
id < 2
}

Now both ICs are violated, and the system reports the id values that make them fail:

ERROR: LoadError: ClientException: Transaction aborted. The following problems caused the abort:
-- INTEGRITY_CONSTRAINT_VIOLATION ---------------------------------------------------
Integrity constraints are violated. The ICs are defined as follows:

1| ic (id){
^^^^^^^^^
2|     id = {1;2;3}
^^^^^^^^^^^^^^^^^
3|     implies
^^^^^^^^^^^^
4|     id <= 2
^^^^^^^^^^^^
5| }
^^

7| ic (id){
^^^^^^^^^
8|     id = {1;2;3}
^^^^^^^^^^^^^^^^^
9|     implies
^^^^^^^^^^^^
10|     id < 2
^^^^^^^^^^^
11| }
^^

error: ICs violated
Transaction output:
abort() => [()]
output(:rel-query-action##3385#constraint#0, Int64) => [(3,)]
output(:rel-query-action##3385#constraint#1, Int64) => [(2,), (3,)]

However, the ICs were not named and therefore you can’t infer which counter-examples correspond to which IC.

To summarize the best practices:

• Give ICs descriptive names and
• Define ICs with useful arguments such that the error reports, when IC violations occur, provide the most actionable information for analyzing why the IC violation was triggered.

## Tips and Tricks#

### IC Warnings#

You may encounter a NON_EMPTY_INTEGRITY_CONSTRAINT warning. This is to alert you about cases where an IC is trivially satisfied because a relation is empty, but is not an error. For example:

// query

def minors(id) = person[:age](id, age) and age < 18 from age

ic minors_IC {
forall(id : minors(id) implies Int(person:name:first[id]))
}

Given the person relation data we installed above, minors is empty, so it is true that all minors have a first name of type Int — which is not what we might have intended to write. Hence, the warning: “warn: integrity constraint ‘minors_IC’ is (trivially) true when the relation minors is empty”.

This warning can also appear when we require a relation to be false that we do expect to be false (e.g. load errors after a CSV import). In these cases, it can be safely ignored.

### Disabling Integrity Constraints#

All integrity constraints can be disabled with the :disable_integrity_constraints control relation, by adding it to relconfig in an update transaction:

def insert[:relconfig] = {:disable_integrity_constraints}

After this, all ICs are ignored, including any previously installed ICs, and future ones. This feature should be used with care, but can be useful, for example, to improve performance if very expensive ICs are present.

Because we can update base relations for the duration of a read-only query, we can also use this control relation to disable ICs for a particular query, rather than permanently. For instance, this query will not fail, but any installed ICs will still be active afterwards, because the query is not an update:

// query

def insert[:relconfig] = {:disable_integrity_constraints}

def foo = 1; 2; 3
ic { count[foo] = 7 }

def output = foo