Rel Integrity Constraints Reference

Reference guide for Rel Integrity Constraints.

(This is a reference document. See the Integrity Constraints Concept Guide for an introduction to ICs.)

Introduction

Integrity Constraints define a requirement that is imposed within a transaction. If a defined IC is violated, the running transaction is aborted and the appropriate information is logged in the :output relation.

An Integrity Constraint can cover a single fact, a single relation, or multiple relations. When any IC is violated, the transaction is aborted and an informative and structured output is presented to the user of the database. For example, consider a constraint on a graph stating that there cannot be nodes with self-edges. A violation of this IC can report, in a structured form, the nodes that violate this constraint. The output relation will include counter-examples to the constraint; in this case, the nodes with self-edges.

Semantics

IC expressions are evaluated to either true or false depending on the state of the database. If the constraint is violated (false), then the transaction is aborted and the database returns information about the constraint violation to the user. In the current design, this always causes the transaction to abort.

Grammar and Syntax

Declaration :=
    Attribute* "ic" Id? FormalParamsBracket* FormalParamsParen? "=" Expr
    Attribute* "ic" Id? FormalParamsBracket* FormalParamsParen? "{" Expr "}"

Commentary

  • Integrity Constraints are declarations, and we use the keyword ic to declare them.
  • ICs are positive statements: they state what needs to be true.
  • ICs have optional names (the identifier). If an identifier is not declared, a random name is generated internally. The system uses the name for reporting constraint violations to the application.
  • ICs can have optional formal parameters. The system uses these parameters for reporting constraint violations; in particular, demonstrating the actual counterexamples.
  • An arbitrary Expression is used for the constraint evaluation.

The following examples give a high-level overview of how to declare integrity constraints:

ExampleDescription
ic no_self_edge(x, y) {edge(x, y) implies x != y}no_self_edge is a constraint defined on the ‘edge’ relation
ic = forall(x, y: edge(x, y) implies x != y)You can omit the name of the constraint
ic(a, b) = edge(a, b) implies a != bYou can declare free variables
ic age_constraint = forall(p, a, b where age(p, a) and age(p, b): a=b)The same person cannot have different ages
ic all_age = forall(x: person(x) implies age(x, _))Every person has an age

Further IC Examples

ic() {
	forall(x : person(x) implies age(x, _))
}
  • Logic: Every person needs to have an age
  • Anonymous constraint without parameters
  • Violations do not contain information that can be used by an application.

ic total_age() = forall(x: person(x) implies age(x, _))
  • This is a named IC without parameters.
  • Violations do not contain information that can be used by an application.

ic total_age(x) = person(x) implies age(x, _)
  • Named IC, making x a parameter, which is used in identifying the actual violations. Note that x is already introduced, so it is no longer quantified in the body of the constraint.

ic total_age(x) {
    person(x) implies age(x, _)
}
  • Named IC.
  • Convenience of using curly braces to indicate the body of the constraint. This mimics Alloy syntax.

ic {
    total(person, age)
}
  • Anonymous IC that uses a higher-order abstraction from the standard library, total.

ic(x) {
    person(x) implies age(x, _))
}
  • Anonymous IC.
  • Does contain information about the violation (formal parameter x).

Functional dependency

The following is an anonymous IC enforcing a functional dependency on age, making sure that nothing can have two different ages:

ic() = forall(p, a, b: age(p, a) and age(p, b) implies a = b)

This IC could also be written using stdlib’s function:

ic { function[age] }

An equivalent named IC, including parameters that will explain the violation, is:

ic fun_age_ic(p, a, b) {
    age(p, a) and age(p, b) implies a = b
}

Type constraints

We can constrain the types of relation using type predicates such as Int, String, etc. For example,

ic age_int_ic(x) { age(_, x) implies Int(x) }

We can also use other unary relations as types. For example, this IC checks that every student is also a person:

ic() {
    forall(x: student(x) implies person(x))
}

The following named IC also checks that every student is a person, but avoids the quantifier:

ic student_type(x) {
    student(x) implies person(x)
}

Point-free version using the higher-order infix operator (subset):

ic() {
    student ⊆ person
    // Can also be written as `subset(student, person)`.
}

This IC checks that the parent relation hold only for values of person type:

ic(x, y) {
    parent(x, y) implies person(x) and person(y)
}

Point-free version using the higher-order infix operator (subset):

ic() {
    parent ⊆ (person, person)
    // Can also be written as `subset(parent, (person, person))`.
}

Deriving Constraint Violations

Constraints could be implemented by confirming that the IC formula evaluates to true, but because constraints are almost always universally quantified, it is easier to derive true if the constraint is violated, i.e., it is sufficient and simpler to check the existence of counterexamples. For this, rAI internally evaluates the negation of the formula into a relation definition, which changes the universal quantification that is common in constraints to an existential quantification.

Step-by-step derivation example

To demonstrate this process, here is a step-by-step example of the internal transformations applied:

  1. Define an IC that enforces no self-edges:
     ic no_self_edge() = forall(x: node(x) implies not edge(x, x))
  1. Convert the IC to a relation defined as the negation of the expression body:
    // ic rel() E   =>   def rel() = not(E)
	=> def no_self_edge_def() = not forall(x: node(x) implies not edge(x, x))
  1. Apply logical transformation rules on the negation of universal quantification and double negations:
   // not(forall(domain, expr)) => exists(domain and not(expr))
   => def no_self_edge_def() = exists(x: node(x) and not(not edge(x, x)))
   // not(not F) => F
   => def no_self_edge_def() = exists(x: node(x) and edge(x, x))

Constraint violation derivation examples

We now give more examples for the derived negated definitions. For demonstration purposes, we name the negated relation violates, but it does not necessarily have this name internally.

      ic() {forall(x: student(x) implies person(x))}
  =>  def violates() = exists(x: student(x) and not person(x))
      ic() {forall(x, y: parent(x,y) implies ( person(x) and person(y)))}
  =>  def violates() = exists(x, y: parent(x,y) and not(person(x) and person(y)))
      ic() {forall(x, y: age(x,y) implies ( person(x) and Int(y)))}
  =>  def violates() = exists(x, y: age(x,y) and not(person(x) and Int(y)))
      ic() {forall(x: node(x) implies not edge(x, x))}
  =>  def violates() = exists(x: node(x) and edge(x, x))
      ic() {forall(x: person(x) implies age(x, _))}
  =>  def violates() = exists(x: person(x) and not age(x, _))
	  ic() {forall(x, y: parent(x, y) implies ( person(x) and person(y) and x!=y and not parent(y, x)))}
  =>  def violates() = exists(parent(x, y) and (not person(x) or
						  not person(y) or
						  x = y or
						  parent(y,x)))
	  ic() {forall(p, a, b: (age(p,a) and age(p, b)) implies a=b)}
  =>  def violates() = exists(p, a, b: age(p,a) and age(p,b) and a!=b)
	  ic() {exists(x: president(x))}
  =>  def violates() = not exists(x: president(x))

Handling of formal parameters

In definitions, formal parameters are moved into the body. For example:

 	 def has_self_edge(x) = edge(x, x)
  => def has_self_edge = x: edge(x, x)

We follow the same approach for constraints, as the following example shows:

	 ic no_self_edge(x) node(x) implies not edge(x, x)
  => // Move formal parameters to the body expression:
     ic no_self_edge = x: node(x) implies not edge(x,x)
  => // Rewrite:
	def no_self_edge_def = x: node(x) and edge(x, x)

Representing IC Violations

One of our main design decisions is to separate the database logic from transaction control. Therefore, we introduce two relations during the course of evaluating Integrity Constraints:

  • The :abort relation, which evaluates to true if a constraint is violated. Currently, when an abort is triggered an exception is thrown to the end user. In the future, it will be up to the transaction control semantics to handle the abort.
  • The :output relation, which represents the IC violations relationally. The :output relation includes information about the violating constraints.

Currently we implement a mechanism that expresses the abort and output relations as disjunctions of constraint-violation definitions, as shown below:

def edge = {(2,2); (5,78); (3,1); (2,5); (6,1)}
def node = {1; 2; 3; 5; 78; 6; -1}
ic positive_node(x) = node(x) implies x > 0
ic no_self_edge(a, b) = edge(a, b) implies a != b

Internally these constraints would be translated to the following relation definitions:

def positive_node_def(x) = not(node(x) implies x > 0)
def no_self_edge_def(x, y) = not(edge(a, b) implies a != b)

def abort = exists(positive_node_def) or exists(no_self_edge_def)
def output(icname) =
   (icname = :positive_node_def and exists(positive_node_def)) or
   (icname = :no_self_edge_def and exists(no_self_edge_def))