Rel Primer: Basic Syntax
This Primer is an introduction to the basic syntax of Rel.
Introduction
This Rel Primer is an introduction to the main features of Rel. It assumes basic knowledge of database concepts and programming.
Rel is RelationalAI’s language for building models and querying data.
Rel has a simple but powerful syntax designed to express a large set of relational operations, build declarative models, and capture general domain knowledge, including, in particular, knowledge graphs.
Relations and Tuples
The most important concept of Rel is that of a relation. Rel imports data into relations, combines relations to define new ones, and queries relations to get answers back — in the form of new relations. Expressions in Rel denote relations, too.
Before describing expressions and other constructs of Rel, this Primer begins with a general overview of relations and their building blocks.
Basic Concepts
In this section, the bold font highlights examples of abstract concepts. These examples are written in an ad hoc notation and are not valid Rel expressions. Rel syntax will be introduced in the next section, Literal Expressions in Rel.
Data Values
The basic building blocks of relations are data values. These are values that represent numbers, strings, dates, and the like. Every data value is a member of some data type. For instance, the type of an integer number 1 is different from the type of a string such as “abc.”
Tuples
A tuple is an ordered immutable sequence of data values. Each of the data values in a tuple is called an element of the tuple. In written form, the sequence is typically enclosed in parentheses and its elements are separated by commas.
For example, the empty tuple () contains no elements. The length of this tuple is 0. The tuple (0, “b”, “c”, 0) contains four elements: the integer 0, the string “b”, the string “c”, and the integer 0. The length of this tuple is 4.
The order of elements is important. For instance, the tuple (“abc”, 10, “abc”) is different from (10, “abc”, “abc”). Elements that are identical are indistinguishable, so if you swap the elements of (1, 1) you will still have the same tuple.
In Rel a tuple can contain only data values. Tuples cannot contain other tuples.
Two tuples of lengths and can be concatenated to form a new tuple of length . For example, the concatenation of tuples (1, 2, 3) and (4, 3) is the tuple (1, 2, 3, 4, 3).
A tuple of length n is often called an n-tuple. There are some convenient special terms for smaller values of n:
- The 0-tuple () is called the empty tuple (there is only one).
- A 1-tuple, such as (0), is called a unary tuple.
- A 2-tuple, such as (0, 1), is called a pair.
- A 3-tuple, such as (0, 1, 2), is called a triple.
- A 4-tuple, such as (0, 1, 2, 3), is called a quadruple.
- A 5-tuple, such as (0, 1, 2, 3, 4), is called a quintuple.
Relations
A relation is an unordered set of tuples. Each of the tuples contained in a relation is called its member. A member of a relation is said to belong to the relation.
The members of a set are customarily written within curly braces and separated with commas. When you write down some representation of a set, you must order the members somehow, but the order does not matter. For instance, { (1, 2), (3, 4) } and { (3, 4), (1, 2) } both represent the same set. Its members are the tuples (1, 2) and (3, 4).
A set can contain only unique elements. You cannot have a relation such as { (1), (2), (1) }, because it contains two identical tuples. Though you can write an expression in Rel that seems to denote { (1), (2), (1) }, it is treated as an expression denoting { (1), (2) }. That is, the uniqueness of members is automatically taken care of when evaluating expressions.
There are two important attributes of a relation: its cardinality and its arity.
The cardinality of a relation is the number of its members. The empty relation {} has cardinality 0. A relation with cardinality 1 — that is, with only one member — is called a singleton relation, or just a singleton. For instance, { (1, 2) } is a singleton.
The arity of a relation is the length of its members. If all the tuples in a relation are of the same length n, then the relation is said to have arity n. In Rel a relation can contain tuples of different lengths. In this case, the relation is said to have (or to be of) a mixed arity.
A relation of arity n is often called an n-ary relation. There are some convenient special terms for smaller values of n:
- The 0-ary relation { () } is called the nullary relation. The empty relation {} is often considered to be a nullary relation, too.
- A 1-ary relation, such as { (0), (1), (2) }, is called a unary relation.
- A 2-ary relation, such as { (0, 1) }, is called a binary relation.
- A 3-ary relation, such as { (0, 1, 2), (1, 2, 0) }, is called a ternary relation.
It’s useful to think of a relation as a database table. Each tuple is then a row of the table, the arity is the number of columns, and the cardinality is the number of rows. The k-th position in the tuples of a relation is sometimes called the k-th column of the relation.
Since the value of a correctly formed expression in Rel is always a relation, it is often convenient to refer to the arity of an expression. When you read that “an expression has arity n,” you should interpret it as “the relation that is the value of the expression has arity n.” Similarly, you can say that an expression “is a unary expression,” etc.
A Rel relation is a generalization of a mathematical relation. A mathematical relation is a Cartesian product of some number of sets. Unlike a Rel relation, a mathematical relation always contains tuples of the same length.
Union of Relations
Let P and Q be two relations. Then the union of P and Q is the relation that contains all the members of P and all of the members of Q and no other tuples.
For example, the union of relations { (1, 2), (3, 4) } and { (5, 6), (1, 2) } is the relation { (1, 2), (3, 4), (5, 6) }.
Product of Relations
Let P be a relation with members and let Q be a relation with members. Then the product of P and Q is the relation that contains members, each of which is a tuple from P concatenated with a tuple from Q.
For example, the product of relations { (1, 2), (3, 4) } and { (5, 6), (1, 2) } is { (1, 2, 5, 6), (1, 2, 1, 2), (3, 4, 5, 6), (3, 4, 1, 2) }.
In the context of databases, the product is sometimes called a cross join, an outer product, or a Cartesian product. The last two terms are inspired by mathematics, but are used somewhat differently. In conventional mathematics the Cartesian product of { (1, 2), (3, 4) } and { (5, 6), (1, 2) } is { ((1, 2), (5, 6)), ((1, 2), (1, 2)), ((3, 4), (5, 6)), ((3, 4), (1, 2)) }. That is, the members of the Cartesian product would contain pairs of tuples belonging to the two sets, not their concatentations. The term outer product, in turn, refers to an operation on two vectors.
Literal Expressions in Rel
This section is an introduction to literal expressions, that is, Rel expressions that describe constants.
Just as in the previous section, the bold font highlights examples of abstract concepts.
The use of inline code
formatting indicates examples of proper Rel syntax.
The Simplest Expressions With Data Values
A Rel expression denotes a relation.
When you write a number such as 17
, the expression denotes the unary singleton relation {(17)}.
Similarly, the Rel expression "abc"
denotes the unary singleton relation {(“abc”)}.
Just as in conventional arithmetic, you can use parentheses and curly brackets for grouping.
So to underscore the fact that 17
is a relation you can write {(17)}
.
But this has exactly the same meaning as 17
, (17)
, {17}
, (({17}))
, etc.
The Product Operator
In Rel, the comma is an infix operator that denotes the product of two relations.
For instance, 17, "abc"
denotes the binary singleton relation { (17, “abc”) }, which is the product of {(17)} and {(“abc”)}.
The comma is an associative operator, that is, (1, 2), 3
and 1, (2, 3)
denote the same relation.
This is a consequence of how “product” is defined.
So you can freely use sequences of commas without considering the order of operations.
For instance, 1, 2, 2, 1
denotes a relation whose only member is a quadruple, namely { (1, 2, 2, 1) }.
It is customary to write such an expression in parentheses: (1, 2, 2, 1)
.
This resembles the conventional notation for a tuple.
The expressions ()
and {()}
denote a nullary singleton relation that contains the empty tuple.
The name “product” suggests that there is a certain similarity between this operation and arithmetic multiplication.
Just like the multiplication operator, the product operator is associative.
Unlike multiplication, the product is not commutative, that is, 1, 2
is not the same as 2, 1
.
However, the relations {}
and {()}
are analogous to the arithmetic 0 and 1, respectively.
- For any relation
P
, both{}, P
andP, {}
are equivalent to{}
. - For any relation
P
, both{()}, P
andP, {()}
are equivalent toP
.
In Rel the expression false
is a synonym for {}
and true
is a synonym for {()}
.
This is further explained in Logical Operators.
The Union Operator
In Rel, the semicolon is an infix operator that denotes the union of two relations.
For instance, 17; "abc"
denotes the unary relation { (17), (“abc”) }.
The semicolon is an associative operator, that is, (1; 2); 3
and 1; (2; 3)
denote the same relation.
This is a consequence of how “union” is defined.
So you can freely use sequences of semicolons without considering the order of operations.
For instance, 1; 2; 2; 1
denotes the relation { (1), (2) } — recall that duplicates are removed when expressions are evaluated.
It is customary to write such an expression as {1; 2}
.
This resembles the conventional notation for a set, except that you use a semicolon instead of a comma.
The expression {}
denotes the empty relation.
Be aware of the difference between ,
and ;
.
If you use the conventional mathematical notation for the set of the first three natural numbers, { 1, 2, 3 }
, then the result will be a set that contains only one member: a tuple of length 3.
The semicolon has a lower precedence — that is, binds less tightly — than the comma, so the expression 1, 2; 3, 4
denotes a set of two pairs.
The operation of computing the union of two relations is somewhat similar to arithmetic addition. As already noted, the semicolon operator is associative. Moreover:
- The semicolon is commutative. For instance, the expressions
1; 2
and2; 1
denote the same set. - The comma distributes over the semicolon. For instance,
1, (2; 3), 4
is equivalent to1, 2, 4; 1, 3, 4
. - The empty relation
{}
plays a role analogous to the arithmetic 0. For any relationP
,{}; P
andP; {}
are both identical toP
.
Giving a Relation a Name
You can give a name to an expression that denotes a relation. This allows you to refer to the relation without having to repeatedly write out the defining expression.
To give a relation a name, you use a construct called a rule, also known as a definition. For example:
def firstFive { 1; 2; 3; 4; 5 }
The expression firstFive
now refers to the unary relation that contains a tuple for each of the first five natural numbers.
You can also write:
def firstFive = 1; 2; 3; 4; 5
This is just an alternative syntax for the previous example.
In both cases def firstFive
is the left-hand side of the definition, and 1; 2; 3; 4; 5
is the right-hand side of the definition.
The left-hand and right-hand sides are also called the head and the body of the definition, respectively.
The =
that separates the left-hand side of a rule from the right-hand side is neither an assignment operator nor — in this context — an equality operator.
It is, rather, an implication operator.
If =
is not present — as in the first example above — then the implication operator is implicit.
A rule is an implication in the following sense: If a tuple is in the relation described by the body, then that tuple should be in the relation described by the head.
As you will see in Relational Application, the statement above is not quite precise. In general, the tuples from the body may be modified before inserting them into the relation described by the head.
Moreover, a relation can be defined by a number of different rules. For more details, see Base and Derived Relations.
If a relation has been given a name, you can use the name in Rel expressions. For instance, you can use it as the right-hand side of another definition, like this:
def five_naturals = firstFive
From now on, the name five_naturals
refers to the same relation as firstFive
.
See Dependencies Between Rules for more details.
You can also write the above as:
def five_naturals { firstFive }
Here is a somewhat more realistic example. Suppose you want to define a relation that gives the clubs for some of your favorite soccer players. One way to do it is like this:
// read query
def club {
("Neymar", "PSG");
("Messi", "BFC");
("Werner", "Chelsea");
("Pulisic", "Chelsea")
}
def output = club
The system automatically displays the contents of the relation output
.
In other respects it’s a relation like any other.
Rel chooses a default order to display the tuples in the relation,
which may be different from the order used to write the relation down, as in this example.
You should not assume that tuples will appear in any particular order; but you can use utilities like enumerate
, top
, and bottom
to sort and rank them.
When a relation is given a name such as club
, it is usual to say that ”club
is a relation.”
However, this is not precise.
During the computation, the name club
can be made to refer to another relation.
For example, it can refer to a relation containing information about additional soccer players — ones not included in the initial set.
Though it is convenient to say informally that “a new player has been added to the relation club
,” it’s important to remember that a relation is immutable.
Comments
Comments in Rel are similar to the ones found in the programming languages C and C++.
Writing //
indicates that the following text is treated as a comment that extends until the end of the current line.
Writing /*
indicates that the following text is treated as a comment that extends until the nearest occurrence of */
.
For example:
def primes = 2; 3; 5; 7 // The single-digit primes
def odd = 1; 3; 5; 7; 9 // The single-digit odd numbers
/*
* NOTE: If you want a lot more numbers of a particular kind, you might be better off
* generating them algorithmically, rather than writing them by hand.
*/
Relational Application
Assume you have the following relation with information about the surnames, first names, and ages of a group of people:
// model
def age {
("Smith", "Jane", 20);
("Smith", "John", 21);
("Smith", "Jane", 22);
("Miller", "Jane", 21);
("Miller", "John", 20)
}
You can give the entire relation another name:
// read query
def output = age
Recall also that output
is special only in that its contents are printed out.
Otherwise, it’s just like any other relation.
Square Brackets on the Right
If you are interested only in those tuples in age
that begin with a particular surname, you can use square brackets, like this:
// read query
def output = age["Smith"]
The expression age["Smith"]
is an example of a partial relational application.
In this case the relation age
is applied to the argument "Smith"
.
Here’s one way to describe the mechanism of evaluating such an expression:
- It selects those tuples in the applied relation whose initial element matches the argument.
- It removes the first element from each selected tuple (in a way that does not affect the applied relation).
- Finally, it inserts the modified tuples into a new relation. This relation is the value of the expression.
In the example above the resulting relation was given the name output
.
If there are no tuples whose initial element matches the argument, then the resulting relation is empty.
In Rel the empty relation {}
is equivalent to false
, so it is natural to interpret an empty result as “No, there are no tuples whose initial element matches the argument”:
// read query
def output = age["Schmidt"]
Square Brackets on the Left
Square brackets can also be written on the left-hand side of a definition. When you add something in square brackets after the name in the head of a rule, Rel extends each tuple from the body with an additional initial element specified in the square brackets, and gives the result a name.
You can use this feature to preserve the elements removed during partial relational application, thus restricting its effect to selecting the interesting tuples:
// read query
def output["Smith"] = age["Smith"]
In general, you can add anything you want:
// read query
def R["Cadet"] = age["Smith"]
def output = R
A relation name followed by square brackets in the head of a definition is also a partial relational application. Consider the following:
// read query
def R["Cadet"] = age["Smith"]
def output = R["Cadet"]
Here, R
is a relation such that the partial application R[Cadet]
is the relation described by the body of the definition of R
.
Multiple Square Brackets
The result of a partial relational application is a relation. If the arity of that relation is not zero, you can use it in a partial relational application, for example like this:
// read query
def output = age["Smith"]["Jane"]
Here you got a relation that contains the ages of those people whose surname is "Smith"
and whose first name is "Jane"
.
Before giving a name to the relation, you can extend it by using multiple square brackets on the left-hand side:
// read query
def output["Space"]["Cadet"]["Jane"]["Smith"] = age["Smith"]["Jane"]
Variable Arguments
The argument of a partial relational application can also be a variable. The written form of a variable is an identifier. You can think of it as a placeholder that can match a data value. Once it matches something, it becomes instantiated to that data value and is essentially indistinguishable from it.
For example:
// read query
def output[y] = age["Smith"][y]
What happened here? It might be helpful to trace it out in detail:
- Recall that the value of
age["Smith"]
is the relation{("Jane", 20), ("Jane", 22), ("John", 21)}
. - The variable
y
matched the first element of("Jane", 20)
, so it was instantiated to"Jane"
. - The expression
age["Smith"][y]
thus becameage["Smith"]["Jane"]
, whose value is the relation{ (20); (22) }
. - So
output
must contain the tuples of this relation, extended with an additional initial element specified in the head of the rule: the instantiation ofy
, that is,"Jane"
. - The variable
y
was then instantiated to the other value that occurs among the first elements of the relationage["Smith"]
, which is"John"
. - The expression
age["Smith"][y]
thus becameage["Smith"]["John"]
, whose value is the relation{ (21) }
. - So
output
must contain the tuples of this relation, extended with the instantiation ofy
, that is,"John"
.
In this case the result was as if you had written def output = age["Smith"]
.
But in general you can use variables much more flexibly.
For instance, you can find out the surnames and ages of people whose first name is “Jane”:
// read query
def output[x] = age[x]["Jane"]
You might find it instructive to figure out the details of how the above works. But doing so is rarely needed, thanks to the declarative nature of the language.
As another example, you can define a new relation in which the order of the names is different from the one in age
:
// read query
def output[y][x] = age[x][y]
For every tuple in age
there is a corresponding tuple in output
, but with the first two elements swapped.
A variable is identified by its name, so the partial relational application age[x][x]
would construct a relation with the ages of those people whose both names are identical.
In this case, the result would be the empty relation {}
.
Note, however, that repeating a variable on the left-hand side would not have the expected effect:
def output[x][y][x] = age[x][y]
The compiler will treat this as the introduction of two variables named x
and will generate an error message about one of them being “unbound.”
The workaround looks like this:
// read query
def output[z][y][x] = age[x][y], z = x
See Comma and Semicolon as Logical Operators for an explanation.
You must properly introduce the variable used in the body of a rule, otherwise the compiler will raise an “undefined variable” error. One way to introduce a variable is to include it in the head. The subsequent sections discuss other ways of introducing variables. For now just remember to use the variables on the left-hand side.
A List of Items in Single Square Brackets
The value of age["Smith"]["Jane"]
is a unary relation, so you can use it in a partial relational application:
// read query
def output = age["Smith"]["Jane"][20]
The first element in the copy of tuple (20)
was removed, leaving the empty tuple ()
.
So the resulting relation is {()}
.
In Rel, the nonempty nullary relation {()}
is equivalent to true
.
The intuitive meaning of the result is “Yes, the relation age["Smith"]["Jane"]
does contain the tuple (20)
.”
You can also interpret this as “The relation age["Smith"]
contains the tuple ("Jane", 20)
,” or “The relation age
contains the tuple ("Smith", "Jane", 20)
.”
Rel allows you to write [x, y]
instead of [x][y]
.
So, depending on the way in which you want to look at your result, you can choose one of the following equivalent definitions:
def output = age["Smith"]["Jane"][20]
def output = age["Smith", "Jane"][20]
def output = age["Smith"]["Jane", 20]
def output = age["Smith", "Jane", 20]
The form with only one set of square brackets is the most frequently used in practice. You can also use this notation on the left-hand side of a definition.
You can apply this notation even if the result is not an empty tuple.
For instance, age["Smith", "Jane"]
is equivalent to age["Smith"]["Jane"]
.
You can see it as a partial relational application with an argument in the form of a tuple. The explanation of the mechanism for evaluating a partial relational application could be amended as follows:
- It selects those tuples in the relation that have a prefix matching the argument tuple.
- It removes the matching prefix from each selected tuple (in a way that does not affect the applied relation).
- Finally, it inserts the modified tuples into a new relation. This relation is the value of the expression.
Similarly, by using the form with commas on the left-hand side of a definition you are specifying a tuple that will be added as a prefix to the tuples of the relation that is being given a name.
Saying “this partial relational application has three arguments” is customary, and perhaps more convenient than saying “the argument of this relational application is a triple.”
Parentheses
In the last example of a partial relational application the number of arguments was equal to the arity of the relation.
Such an application is no longer really a “partial” one.
Moreover, as you saw, its value can only be {}
or {()}
, that is, either false
or true
.
Rel provides special syntax for this important case. Instead of square brackets you can use parentheses. Such a construct is called a relational application:
// read query
def output = age("Smith", "Jane", 20)
A boolean expression is an expression whose value must be false
or true
.
In the context of Rel, a boolean expression is often called a formula.
So a relational application is an example of a formula.
If the right-hand side of a definition is a formula, you can also use parentheses on the left-hand side of a definition. In fact, this is the recommended style:
// read query
def output(y, x, z) = age(x, y, z)
The difference between parentheses and square brackets is more than just a notational convention.
If the number of arguments in a relational application is smaller than the number of elements in any tuple, then the relational application is always evaluated to false
, and the compiler warns you that there is a mismatch in arity.
// read query
def output = age("Smith", "Jane")
If the relation has a mixed arity, that is, contains tuples of different lengths, then a relational application picks up only those tuples whose length is equal to the number of arguments. If the number of arguments does not match the size of any tuples, no error is reported. Compare the following four examples:
// read query
def mixed {("Wilson", "Adam", 20); ("Willis", "Joseph", 50, "boss") }
def output[x, y] = mixed[x, y]
// read query
def mixed {("Wilson", "Adam", 20); ("Willis", "Joseph", 50, "boss") }
def output(x, y) = mixed(x, y)
// read query
def mixed {("Wilson", "Adam", 20); ("Willis", "Joseph", 50, "boss") }
def output(x, y, z) = mixed(x, y, z)
// read query
def mixed {("Wilson", "Adam", 20); ("Willis", "Joseph", 50, "boss") }
def output(x, y, z, v) = mixed(x, y, z, v)
Note that you can obtain a formula even with a partial relational application, for instance by using the equality operator =
.
If R
is a unary relation and x
is a variable, then the expression x = R
— or R = x
— is equivalent to R[x]
, that is, its value is true
or false
, and if it’s true
then it may additionally instantiate x
, as discussed in Variable Arguments.
So, for example, the right-hand side of the following definition is a formula, and the variable x
ranges over the unary relation age["Smith", "Jane"]
:
// read query
def output(x) { x = age["Smith", "Jane"] }
You can combine this with a variable that ranges over the first names:
// read query
def output(firstName, x) { age["Smith", firstName] = x }
A more conventional way of writing this would be:
// read query
def output(firstName, x) { age("Smith", firstName, x) }
Varargs allow you to express it more succinctly:
// read query
def output(x...) { age("Smith", x...) }
By using varargs you can allow your relational application to pick up tuples of any length:
// read query
def mixed {("Wilson", "Adam", 20); ("Willis", "Joseph", 50, "boss"); ("what?"); () }
def output("LHS", x...) = mixed(x...)
As you can see, with varargs you can use a relational application to simulate a partial relational application in its full generality. To give another example:
// read query
def output(x...) = age("Smith", x...)
This, however, is not the recommended style. Varargs are best reserved for situations in which the arity of the applied relation is not fixed or unknown.
Logical Operators
Say you have defined and installed the following relations:
// model
def character { ("cat", "Felix"); ("beagle", "Spike"); ("duck", "Scrooge") }
def mammal { "pig"; "cat"; "beagle"; "whale" }
def bird { "sparrow"; "duck"; "parakeet"; "vulture" }
As explained in Relational Application, a relational application evaluates to either {}
or {()}
, depending on whether the arguments match some tuple in the applied relation.
Compare, for instance, these two examples:
// read query
def output = bird("duck")
// read query
def output = mammal("duck")
In the first example, output
was the relation {()}
, which was displayed as the empty tuple ()
.
In the second example, output
was the empty relation {}
, which was not displayed.
As noted in Product Operator, in Rel the relations {}
and {()}
are given the names false
and true
, respectively.
The two examples above show why this makes sense.
Indeed, a duck is a bird and is not a mammal.
Rel features logical operators for negation (not
), conjunction (and
), and disjunction (or
).
Each of the logical operators is defined so that every operand is either {}
or {()}
and so is the result.
Here is an example of negation:
// read query
def output = not mammal("duck")
Notice that the operand of not
is a relational application, and therefore a formula.
The relational application is evaluated first, therefore not
is applied to either false
or true
.
The same principle applies to the other logical operators: Their operands are evaluated first.
Here are two examples of conjunction:
// read query
def output = character("cat", "Felix") and not mammal("cat")
// read query
def output = character("cat", "Felix") and mammal("cat")
As noted in Variable Arguments, you can use variables instead of literals. This makes the examples somewhat more interesting.
The following computes the kinds and names of those characters that are mammals:
// read query
def output(kind, name) = character(kind, name) and mammal(kind)
Roughly speaking, the evaluation is as follows:
- The tuple
(kind, name)
is successively matched against each of the tuples incharacter
. This instantiates the variableskind
andname
, thus instantiating the tuple(kind, name)
to a tuple incharacter
. Thereforecharacter(kind, name)
evaluates totrue
. - If the instantiation of
kind
is in the relationmammal
, thenmammal(kind)
evaluates totrue
; otherwise it evaluates tofalse
. - The entire body evaluates to
true
if and only if both the operands ofand
evaluate totrue
. If the body evaluates totrue
, then the current instantiation of the tuple(kind, name)
is added tooutput
.
The following computes the kinds and names of those characters that are not mammals:
// read query
def output(kind, name) = character(kind, name) and not mammal(kind)
The process is very similar to the one described above.
Notice that kind
becomes instantiated only to one of the first elements in the tuples of character
, not to anything in the universe that is not a mammal.
If you want to list characters that are either mammals or birds, you can use disjunction as the second operand of a conjunction:
// read query
def output(kind, name) = character(kind, name) and (mammal(kind) or bird(kind))
Disjunction is true if one or both of the operands are true. For example:
// read query
def p = 1; 2
def q = 2; 3
def output = (p(1) or q(1)) and (p(2) or q(2))
The expression p(1) or q(1)
is true, because p(1)
is true, even though q(1)
is not.
The expression p(2) or q(2)
is true, because both p(2)
and q(2)
are true.
Comma and Semicolon as Logical Operators
Recall that, for any relation P, the product of {()}
and P — or P and {()}
— is P, while the product of {}
and P — or P and {}
— is {}
.
In particular, if the operands of the product operator ,
are formulas, the result is true
if and only if both the operands are true
.
Similarly, if the operands of the union operator ;
are formulas, the result is true
if and only if at least one of the arguments is true
.
Notice that true; true
is true
, because the union of {()}
and {()}
is {()}
.
So you can always replace conjunction and disjunction by product and union, respectively:
// read query
def output(x, y) = character(x, y) , (mammal(x) ; bird(x))
The opposite, however, is not true. Product and union are more general than conjunction and disjunction.
This is because you can only use conjunction and disjunction if both the operands are formulas, that is, evaluate to true
or false
.
If that is not the case, the result will be an error:
// read query
def output[x] = character[x] and mammal(x)
You can obtain the desired effect by using a comma and a semicolon, as shown in the examples below:
// read query
def output[x] = character[x], mammal(x)
// read query
def output[x] = character[x], mammal(x) ; character[x], bird(x)
When only one operand of ,
is a formula, and the other is a relation whose arity is greater than zero, you can say that the relation is filtered by the formula.
In the first example above, the formula mammal(x)
filters the unary relation character[x]
.
That is, the result is a unary relation containing the names of those characters that are mammals.
This filtered relation is then extended — on the left-hand side — by the corresponding instantiations of x
.
The second example computes a union of two unary relations. One contains the names of those characters that are mammals and the other one contains the names of those that are birds.
The principles outlined above apply also to infinite relations defined in the Standard Library.
For example, the infinite relation gt
, whose name is an acronym for “greater than,” contains the tuple (3, 1)
, but it does not contain the tuple (1, 3)
.
You can write a relational application such as gt(3, 1)
in infix form as 3 > 1
.
In the following example, the relation R
is a product of { 1; 2; 3 }
and { 2; 1 }
:
// read query
def R = { 1; 2; 3 }, { 2; 1 }
def output = R
A relational application of R
is a formula, so you can extract a subset of its tuples by using a conjunction:
// read query
def R = { 1; 2; 3 }, { 2; 1 }
def output(x, y) = R(x, y) and x > y
If you use R
in a partial relational application, you can filter it by using the product operator:
// read query
def R = { 1; 2; 3 }, { 2; 1 }
def output[x] = R[x], x < 3
Notice, however, that in this case the filter could not express the condition that the second element is smaller than the first, because there is no variable that ranges over the second element.
Partial Applications Are Not Functions
In many settings, []
looks like a function call — as in add[1, 2]
or cos[2 * pi_float64]
— but unlike a function call, such an expression may evaluate to a relation that contains zero elements or more than one element.
In the example below, neighbor[x]
can have zero, one, or two elements for each x
:
// read query
def neighbor(x, y) {
(y = x + 1 or y = x - 1) and x > 0 and y > 0
}
def output["zero"] = neighbor[0] // Has zero elements because `x` cannot be `0`.
def output["one" ] = neighbor[1] // Has one element because `y` = `2` or `y` = `0`, but `y` cannot be `0`.
def output["two" ] = neighbor[2] // Has two elements because `y` = `3` or `y` = `1`, which are both valid.
The relation neighbor
— just like add
, multiply
, etc. — is infinite.
In this example, neighbor
restricts the domain of x
, which makes the partial relational application neighbor[x]
evaluate to a finite relation.
Similarly, the evaluation of 3 + 4
— which is an alternative form of the partial relational application add[3, 4]
— yields the finite relation {(7)}
.
Note that add[x, y]
will not work if the domains of x
or y
are not finite.
Operator Distribution
If R
is a relation whose arity is not zero, and S
is a unary relation, Rel can evaluate R[S]
.
The result is the union of R[x]
over all the tuples x
in S
.
For example:
def output = neighbor[{1; 3}]
The result is {2; 4}
, since neighbor[1]
is {2}
and neighbor[3]
is {2; 4}
.
Here’s another example:
// read query
def output = {1; 2} + {3; 4; 5}
The expression {1; 2} + {3; 4; 5}
is equivalent to add[{1;2}, {3;4;5}]
, which is equivalent to add[{1;2}][{3;4;5}]
.
The relation add
is partially applied to the unary relation {1; 2}
, and the resulting relation is partially applied to the unary relation {3; 4; 5}
.
The result would have been the same if you had written:
def output = 1 + 3; 1 + 4; 1 + 5; 2 + 3; 2 + 4; 2 + 5
You can say that operators such as +
distribute over the union of unary relations.
Note: In the expression R[S]
, the relation S
must have arity 1, unless R
is a higher order definition.
See @inline
Definitions in the Rel Primer: Advanced Syntax guide.
Base and Derived Relations
The RelationalAI documentation often has small relations with sample data built into the Rel code, as in the definition of club
above.
Typically, though, “raw” data are stored in relations by ingesting them from outside sources, such as JSON or CSV files.
No definitions are necessary for these relations, called base relations.
See Working With Base Relations.
By contrast, relations defined by rules are derived relations, also known in the database world as views.
Derived relations are transient, local only to a query, unless:
- You use their contents to create new base relations.
- You install their definitions — hence the term “installed view.”
Database logic is defined by a set of rules, also known as definitions. Each derived relation can refer to other derived relations, relations defined in libraries such as the Standard Library, and base relations.
Some quick things to know about rules:
- Their order does not matter.
- A relation in Rel is the union of the relations defined by the rules that name the relations in their heads.
The rules in the following example define
myrel
to be{ 1; 2; 3 }
:
// read query
def myrel = {}
def myrel = 1; 2
def myrel = 2; 3
-
Rules for a particular relation — that is, those that name the relation in their heads — can be interspersed with rules for other relations. It is best to avoid this when possible, because it affects readability.
-
Definitions can be recursive — see Recursion.
-
Relations can be overloaded by arity and type — see Advanced Syntax.
A rule is an example of a declaration. Other kinds of declarations are integrity constraints, bound declarations, and module declarations.
A collection of declarations that are intended to be logically related is often called a Rel model.
The evaluation of a query is based on the entire collection of:
- The base relations in the database.
- The declarations in all of the models that are installed in the database.
- The declarations in the query.
Dependencies Between Rules
A very important property of rules is this:
- If you define a relation Q in terms of relation R, then subsequent changes to R will be reflected in Q.
The following example defines three relations.
R
is defined in terms of P
and Q
(see Operator Distribution for an explanation of P[Q]
).
// model
def P = (1, "a", "A"); (2, "b", "B")
def Q = 2; 3; 4
def R["letter"] = P[Q]
The contents of R
are as expected:
// read query
def output = R
If you now add new tuples to P
and Q
, relation R
will be automatically updated:
// read query
def P = (3, "c", "C")
def Q = 1
def output = R
Optional Schema Declaration
Rel does not require predefined schemas. When you create a base relation or a derived relation in Rel, the system automatically tracks the arity and type of the tuples in the relation. Unlike in traditional database systems, you do not have to specify this information beforehand as a “database schema.”
You can, however, enforce schemas if you want. You can introduce integrity constraints that restrict the arity and types of relations, for instance. You can also use bound declarations.
Relational Abstraction
Recall that a relation is a set of tuples.
One way to describe a relation is to write down a formula that is true
exactly for those tuples in the relation and false
for all others.
You can see this most clearly in definitions that introduce a variable for each column in the left-hand side of the definition, as in def myrel(x1, x2, ..., xn) = ...
, and then give a boolean expression on the right-hand side that constrains those variables:
// read query
def mydomain(x) = range(1, 7, 1, x) // `x` will range over 1; 2; 3; 4; 5; 6; 7.
def myrel(x, y) {
mydomain(x)
and mydomain(y)
and x + y = 9
}
def output = myrel
Note that the Rel Standard Library has many utility relations including the one used here: range(start, stop, step, x) is true
if x
appears in the sequence of integers from start
to stop
(inclusive), counting in increments of step
.
This style is preferred, since it often leads to more readable definitions.
However, relations can also be specified using relational abstraction, indicated by :
in Rel.
The definitions above are equivalent to:
// read query
def mydomain = x : range(1, 7, 1, x)
def myrel {
x, y : mydomain(x)
and mydomain(y)
and x + y = 9
}
def output = myrel
The items listed on the left of the colon are called bindings. In this example each binding is just the name of a variable, but it can be more. See Binding Shortcuts.
The general pattern used here is bindings :
expression.
This is called a relational abstraction, because it defines an anonymous relation that is derived from — but, in general, not identical to — the relation that is the value of the expression that follows the colon.
This use of :
is similar to the vertical bar used for describing mathematical sets (opens in a new tab).
You can read it as “such that.”
The bar separates the variables from the conditions: for example, .
There are variables on the left of the :
, a boolean condition over those variables on the right,
and the set contains all the combinations of elements for those variables — that is, tuples — that make the condition true.
But in Rel, :
can do much more.
The expression Expr in vars :
Expr does not have to be a formula, that is, its value can be a relation whose arity is greater than 0.
If Expr has arity , the tuples from the value of Expr are appended to the instantiations of vars, and the total arity will be the number of variables in vars plus .
For example:
// read query
def mydomain = x : range(1, 4, 1, x)
def myrel = x : x * 5, mydomain(x)
def output = myrel
This is similar to how ,
works. For both :
and ,
, when the right-hand side is a boolean expression, it works as a filter.
For example, x * 5
is computed only for those values of x
that satisfy mydomain(x)
.
If the right-hand side is not a boolean expression, then the value of the expression is appended to the tuple.
This offers a way to do group-by aggregations. For more details, see Group-By in the Rel Primer: Aggregations, Group-By, and Joins guide.
The arity of bindings :
expression is the number of variables in bindings plus the arity of expression.
The mechanism described above is essentially identical to that for partial relational applications. In fact, you can rewrite the last example as follows:
// read query
def mydomain = x : range(1, 4, 1, x)
def myrel[x, y] { y = x * 5, mydomain(x) }
def output = myrel
In this formulation it is clear that every tuple obtained from mydomain(x)
(for some instantiation of x
) is extended on the left with the instantiation of x
and the value of x * 5
.
Since mydomain(x)
happens to be a formula, each such tuple happens to be empty, so the result will consist of just the extensions.
Binding Shortcuts
It is often useful to put filters in the bindings, that is, on the left-hand side of :
in a relational abstraction.
For this, you can use in
and where
, which can be used directly wherever variables are introduced, to restrict the domain of the variables. For example:
// read query
def mydomain(x in range[1, 7, 1]) = true // Same as: `def mydomain = range[1, 7, 1]`
def myrel = x in mydomain, y in mydomain where x + y = 9 : x * y
def output = myrel
Note that while in
restricts the domain of a single variable, where
can restrict combinations of variables.
Moreover, where
must appear after all the variables in a sequence of bindings, whereas in
must appear after each individual variable.
Note also that you cannot use where
on the left-hand side of a definition, but you can use in
.
Note that in
— and its equivalent symbol ∈
— is not a stand-alone relation and can be used only in bindings. If you want to express x in R
in a formula, you can write R(x)
.
A Note About Style
When you define a relation in the style of def myrel = x1, ..., xn : Expr
,
the arity of myrel
depends on the arity of the expression on the right (Expr
), which might not be clear at first sight.
You only know that the arity of myrel
will be at least .
By contrast, the definition def myrel(x1, ..., xn) = ...
makes it explicit that the arity of the relation is exactly , which is why it is usually preferred.
Similar considerations apply to the choice between square brackets and parentheses on the left-hand side of a definition.
You can always replace the left-hand side def myrel(x, y)
by def myrel[x, y]
.
But when you use parentheses you are thereby declaring that myrel
is a binary relation.
Moreover, the compiler will detect an error if the right-hand side of the definition is not a formula.
Using round parentheses on the left-hand side of a definition is not always preferable. For instance, you might have a relation that contains the list prices and the discounts for various items of merchandise, like this:
// model
def list_price {
"couch", 500;
"table", 400;
"chair", 50
}
def discount {
"table", 50;
"chair", 13;
"couch", 75
}
If you want to compute the final prices of all the items in the generally preferred style, you can write:
// read query
def output(item, price) {
exists( listed, reduced : list_price(item, listed)
and discount(item, reduced)
and price = listed - reduced)
}
The example features existential quantification, which will be explained in the second part of this Primer.
Without quantification, you would have to mention the variables listed
and reduced
on the left-hand side, in order to include their instantiation in the resulting relation.
You can simplify this example by using partial relational application, as well as square brackets on the left-hand side:
// read query
def output[item] = list_price[item] - discount[item]
This form is more succinct and more idiomatic.
To store the entire relation with final prices, you can follow the pattern above, using a name such as prices
instead of output
.
But if you want, say, only to show the prices for the first two items in this relation, you don’t have to give it a name.
Just use relational abstraction and pass it directly to top
:
// read query
def output = top[2, (item : list_price[item] - discount[item])]
Notice that the relational abstraction must be enclosed in parentheses or curly brackets to ensure that 2
is treated as the first argument of top
, and not as a part of the bindings.
A Note About the Example
It is worth noting that the example above works only because each item appears exactly once in each of the source relations. If you extend the relations with an item whose name is not unique, you will get all the possible combinations of list price and discount for this item:
// read query
def list_price = "couch", 700
def discount = "couch", 150
def output[item] = list_price[item] - discount[item]
If this is not clear, you can revisit the discussion of partial relational application and of operator distribution.
Finally, in case an item appears only in list_price
, but not in discount
, its price will not be computed:
// read query
def list_price = "credenza", 990
def output[item] = list_price[item] - discount[item]
What happened here?
The expression list_price["credenza"] - discount["credenza"]
evaluates to the empty relation, because the value of discount["credenza"]
is the empty relation.
The first — more complicated — definition of the price relation would have the same effect, because discount("credenza", price)
would evaluate to false
.
There are two ways to fix this. One way is to use a conditional expression:
// read query
def list_price = "credenza", 990
def output[item] =
if discount[item] then list_price[item] - discount[item] else list_price[item] end
Notice that you can use discount[item]
as a boolean condition, because {}
is equivalent to false
.
The more idiomatic solution is to use a left override.
The expression to the right of <++
provides a default value if the expression to the left evaluates to the empty relation:
// read query
def list_price = "credenza", 990
def output[item] = list_price[item] - discount[item] <++ list_price[item]
Computing Arity and Cardinality
As noted in Relations, if you think of relations as tables, the arity of a relation is the number of columns, and the cardinality is the number of rows.
Arity
To compute the arity of a relation, you can use arity
:
// read query
def R = {(1, "a"); (1, "b"); (1, "c")}
def output = arity[R]
In fact, arity
computes the lengths of all tuples in a relation.
If a relation contains tuples of various lengths, arity
will return all tuple lengths that occur in the relations:
// read query
def R {
("SKI CHAMPIONSHIPS", 2009);
("SKI CHAMPIONSHIPS", 2009, "Lindsey", "Vonn", "Downhill")
}
def output = arity[R]
In some relatively rare cases, arity
might give an over-approximation of the actual arity.
The only guarantee is that no tuple in a relation R
has more than arity[R]
elements.
Cardinality
You can calculate the cardinality of a relation with count
:
// read query
def R = {1; 3; 5; 7; 9}
def output = count[R]
It does not matter whether a relation has tuples with various lengths: count
returns the number of tuples, regardless of length.
You only have to watch out when calculating the cardinality of the empty relation {}
.
For the empty relation, count
evaluates to false
.
To get the zero for the cardinality, you can use the Standard Library relations: either left_override
or right_override
, which can be written in infix form as <++
and ++>
, respectively.
Consider the two examples below:
// read query
def output = count[{}]
// read query
def output = (count[{}] <++ 0)
Notice that in the first example there was no output, because count[{}]
evaluates to false
.
Formatting
In Rel neither indentation nor line breaks matter. The style conventions adopted on the website are for readability purposes.
Consider the following code:
// read query
def club {
("Neymar", "PSG");
("Messi", "BFC");
("Werner", "Chelsea");
("Pulisic", "Chelsea")
}
def output = club
ic {count[club] = 4}
The declaration ic {count[club] = 4}
is an example of an integrity constraint.
You can also write the above like this:
// read query
def club {
("Neymar", "PSG");
("Messi", "BFC");
("Werner", "Chelsea"); ("Pulisic", "Chelsea") }
def output = club ic {count[club] = 4}
Both yield the same result. Indeed, the two code blocks are not just correct in Rel, they are also syntactically identical and render the same output.
Even though the second code block is syntactically identical to the first, it is highly recommended to use spaces and line breaks to make your code readable and maintainable.
Standard Library
Rel includes a Standard Library with many functions and utilities; for example, range
, maximum
,
minimum, sin, cos,
and many other mathematical operations.
The Standard Library also includes higher order definitions that take relations as arguments,
such as argmax
, argmin
, and first
.
Help
The query help[:name]
— or, in module syntax, just help:name
— will display a brief docstring for each relation.
For example, help:argmax
, help:range
, or help:min
.
These relations are currently imported into the main Rel namespace.
Therefore, unexpected outcomes may occur when users define their own relations with the same names.
Some names to avoid are: domain
, function
, total
, first
, last
, and equal
.
A quick way to check whether a name is reserved is to try help:name
.
Summary
This article has covered the basics of Rel syntax. The next in the Rel Primer series focuses on aggregations, group-by, and joins, followed by more advanced features of Rel.