Relational Abstractions
Relational abstraction expressions in Rel
Bindings
Bindings := FormalParams
Bindings := FormalParams "where" Expr
FormalParams := FormalParam
FormalParams := "(" FormalParams ")"
FormalParams := FormalParams "," FormalParams
FormalParam := FormalId
| FormalId "in" Expr
| FormalId "∈" Expr
| Constant
FormalId := Id
| Id "..."
Bindings are the mechanism by which variables are introduced in Rel. It is often useful to immediately specify the expected domain of the new variables, and the binding syntax supports that with where
and in
.
Bindings are not expressions themselves, but are used in many language constructs that are expressions. It is best to understand the bindings syntax separately from the language constructs where it is used.
Examples of bindings:
-
x
- Introduces a single variable named
x
.
- Introduces a single variable named
-
1
- A constant is also a binding.
-
x, y, z
- Introduces three variables:
x
,y
, andz
.
- Introduces three variables:
-
x in A
- Introduces a single variable
x
whose domain is restricted toA
.
- Introduces a single variable
-
x ∈ A
- Equivalent to the previous example.
-
x where A(x)
- Equivalent to the previous example.
-
x ∈ Int
- Introduces a single variable
x
, with domainInt
. Domains do not have to be finite.
- Introduces a single variable
-
z in {1; 2; 3; 4; 5}
- Domains can also use literal relations.
-
z in 1
- A domain can be a singleton (or even empty, but that is not often useful).
-
t where team(t) and soccer(t)
- A domain can be a more complex logical formula.
-
x in A, y in B, z in C
or equivalentlyx ∈ A, y ∈ B, z ∈ C
- Introduces three variables, where
x
has domainA
,y
hasB
, andz
hasC
.A
,B
, andC
must have arity 1.
- Introduces three variables, where
-
x ∈ Red, y ∈ Green where edge(x, y)
in
andwhere
can be combined.
-
x, y where edge(x, y) and Red(x) and Green(y)
- Equivalent to the previous example.
-
x, y where edge(x, y)
- Introduces two variables
x
andy
in theedge
domain.
- Introduces two variables
For the where
case, the Expr
must be arity 0 (a formula).
The formula can use all the variables in the binding.
For the in
/∈
case, the expression should have arity 1.
Relational Abstraction
Expr := Bindings ":" Expr
If :
is followed by an identifier, you should separate the two with white space.
Otherwise the colon will be interpreted as beginning a Symbol.
For example, def output {x: x = 7}
will produce 7
, but def output {x:x = 7}
will result in an error message.
Relational abstraction defines a relation without a name. It is the most important construct in the language. You will see later that all syntactic variations translate into relational abstractions.
The expression that follows the colon is often called the body of the relational abstraction. The arity of a relational abstraction is the number of variables in the bindings plus the arity of the body. If the body is a formula — that is, has arity 0 — then only the variables from the bindings remain.
The examples in this section are based on the following definitions:
def p = {1; 2; 3}
.def q = {2; 3; 4}
.def r = {(1, "a"); (2, "b"); (3, "c")}
.def abc = {"a"; "b"; "c"}
.def parent = {("John", "Mary"); ("Mary", "Felix"); ("Felix", "George")}
.
It is common for a relational abstraction to have a formula as the expression. This is similar to a Datalog rule, but without a name for the relation that is defined. Each of the following examples use a formula:
x: p(x) and q(x)
- Unary relation (one variable
x
). - Value:
{2; 3}
.
- Unary relation (one variable
x: p(x) or q(x)
- Unary relation.
- Value:
{1; 2; 3; 4}
.
x, s: q(x) and r(x, s)
- Binary relation.
- Value:
{(2, "b"); (3, "c")}
.
s: exists(x: q(x) and r(x, s))
- Unary relation. The
x
is existentially quantified in the body of the relational abstraction. - Value:
{"b"; "c"}
.
- Unary relation. The
x, y: p(x) and q(y)
- Binary relation.
- This is the Cartesian product of
p
andq
, which is{(1, 2); (1, 3); (1, 4); (2, 2); (2, 3); (2, 4), (3, 2); (3, 3); (3, 4)}
x, t, y: parent(x, t) and parent(t, y)
- Ternary relation.
- Value:
{("John", "Mary", "Felix"); ("Mary", "Felix", "George")}
.
x, y: parent(x, t) and parent(t, y)
- This is an error because variable
t
is never introduced.
- This is an error because variable
x, y: ∃(t: parent(x, t) and parent(t, y))
- Value:
{("John", "Felix"); ("Mary", "George")}
.
- Value:
x: Int(x) and x > 0
- Value is the infinite relation
{1; 2; 3; 4; ...}
.
- Value is the infinite relation
x, y: x = y
- Binary relation of all values that are equal, for example,
(1, 1)
and('a', 'a')
.
- Binary relation of all values that are equal, for example,
x: x = 1
- Singleton relation
{(1,)}
.
- Singleton relation
x: true
- Infinite unary relation (for example, contains
1
and"abc"
).
- Infinite unary relation (for example, contains
x: false
- Empty unary relation.
x...: p(x...) and q(x...)
- Varargs are explained in Rel Primer: Advanced Syntax.
In this case,
x...
will only represent a single variable, becausep
andq
have arity 1. The value is{2; 3}
.
- Varargs are explained in Rel Primer: Advanced Syntax.
In this case,
{x: p(x) and q(x)}
- If an expression contains curly braces that are not used to form a relation literal or parentheses that are not used to form a relational application, then the curly braces or parentheses are used solely to indicate precedence. Sometimes a relational abstraction is more clear when it is enclosed in curly braces. This does not change the semantics in any way.
The previous examples all use formulas for the body of the relational abstraction. Expressions with arity bigger than 0 are allowed as well:
x, y: x + y
- Ternary, infinite relation.
- Example tuples:
(1, 1, 2)
,(2, 3, 5)
, and(3.5, 1.2, 4.7)
.
x: x + 1
- Binary, infinite relation.
- Example tuples:
(1, 2)
,(1.0, 2.0)
, and(3.14, 4.14)
.
x in Int: x + 1
- Binary, infinite relation.
- Example tuples:
(1, 2)
,(2, 3)
, and(3, 4)
.
x where Int(x): x + 1
- Equivalent to the previous example.
Here are some more advanced examples:
p: today - date_of_birth[p]
- Binary relation of person and their age.
The arity is 2: 1 for
p
plus 1 for the body.
- Binary relation of person and their age.
The arity is 2: 1 for
x: sum[v: scores(x, v)]
- Sum scores
v
, grouped byx
. - Note that there are two relational abstractions here: one outermost, and one that is an argument of
sum
. - See Aggregation for more details.
- Sum scores
mean[x, y: (actual[x, y] - prediction[x, y]) ^ 2]
- Compute the
mse
for a machine learning solution. Note that the Standard Library definesmse
, which can be used asmse[actual, prediction]
.
- Compute the
x: edge(x, x)
- Unary relation of all nodes that have a self edge.
∃(c ∈ course: ∀(s ∈ student: enrolled(c, s)))
- The arguments to
∃
and∀
are relational abstractions.
- The arguments to
sum[date, store_nbr, item_nbr: train_unit_sales[date, store_nbr, item_nbr]]
- Sum up the sales for every unique
(date, store_nbr, item_nbr)
triple in thetrain_unit_sales
relation.
- Sum up the sales for every unique
Bindings can also be constants, as an alternative to introducing a separate variable. Examples of constants in a binding of a relational abstraction:
-
x, 1: abc(x)
- Value:
{("a", 1); ("b", 1); ("c", 1)}
.
- Value:
-
def count_abc = sum[x, 1: abc(x)]
- Sums the last elements of all tuples.
Since the last element of each tuple is
1
, this defines the count of the number of tuples inabc
, which is3
. - The Standard Library defines
count
in this way, and that relation can be used ascount[abc]
in this example.
- Sums the last elements of all tuples.
Since the last element of each tuple is
-
1: p(_)
- Value:
1
(given thatp
is not empty). - Note that the left-hand side of the relational abstraction does not have to contain an explicit variable.
- The underscore expression is syntactic sugar for an anonymous variable. See Underscore for more details.
- Value:
Note that relational abstraction can be nested, and that :
associates to the right.
Some simple examples:
// read query
@inline
def is_true[R] = if R then "yes" else "no" end
def output:A = is_true[equal((1 : 2 : 3), (1 : (2 : 3)))]
def output:B = is_true[equal((1 : 2 : 3), (1, 2, 3))]
// read query
def P = 2; 3; 20; 30
def output = x in P where 9 < x : y in P where y < 10 : x * y
Finally, it is important to note that a relational abstraction can be used directly in a relational application or a partial relational application:
// read query
{x in Int, y in Int : x * x + y}[2, 3]
The bindings play the role of the formal parameter specifications in a definition.
You can think of the variables introduced in the bindings — x
and y
in this example — as the parameters of the relational abstraction.
If you are familiar with programming languages, you will notice that in some ways relational abstractions are similar to lambda expressions.
An Alternative Form of Relational Abstraction
Relational abstractions are often used as parameters to higher-order abstractions, such as aggregations.
Typically the most important thing to see first in an aggregation is the value that is being aggregated.
To make this easier to see immediately, the order of the bindings and the body of a relational abstraction can be inverted.
The :
becomes a |
(or the keyword for
) in this case. This syntax is designed to mimic the common mathematical notation:
Expr := Expr "|" Bindings
Expr := Expr "for" Bindings
This is purely a syntactic convenience: e | b
is equivalent to e for b
which is equivalent to b : e
.
Examples:
-
x+1 | x in {1; 2; 3}
- Value
{(1, 2); (2, 3); (3, 4)}
. - Equivalent to
x in {1; 2; 3}: x+1
.
- Value
-
x+1 for x in {1; 2; 3}
- Equivalent to the previous example.
-
x^2 for x in {1; 2; 3}
- Value
{(1, 1); (2, 4); (3, 9)}
.
- Value
-
{x^2 | x ∈ {1; 2; 3}}
- Equivalent to the previous example.
- Just as in normal relational abstraction, optional curly braces can be used for clarity.
-
x^2, x^3 for x in {1; 2; 3}
- The initial expression is now a tuple.
The value of the entire expression is a ternary relation
{(1, 1, 1); (2, 4, 8); (3, 9, 27)}
. The comma operator is explained in more detail in Comma (Cartesian Product and Conjunction).
- The initial expression is now a tuple.
The value of the entire expression is a ternary relation
-
(x^2, x^3) for x in {1; 2; 3}
- Equivalent to the previous example. Parentheses are used for precedence and have no other semantic here.
To users familiar with list comprehension in programming languages such as Python, Julia, or Haskell it may seem strange that the variable introduced in the comprehension (for example, x
in x^2 for x in {1; 2; 3}
) is a part of the result.
In these programming languages, the value of the comprehension is a list of the values of the expression before the for
.
The distinction with Rel is that in Rel every value is a relation, and a relation is a set of tuples. This means that duplicates are eliminated. For some computations this is the intent, but often it is not. Some examples to illustrate this:
-
age[p] for p in employee
- This is a binary relation of
Person
andInt
, not a unary relation of ages. The latter would contain every age value only once.
- This is a binary relation of
-
max[age[p] for p in employee]
- This is the previous example used as an argument to the
max
aggregation. The result is a singleton containing the age of the oldest employee. Note that for this particular aggregation it does not matter whether age values are kept for all employees.
- This is the previous example used as an argument to the
-
sum[age[p] for p in employee]
- The same example, but now in a
sum
aggregation. Here it is very important that even if two different personsp1
andp2
have the same age, the age of both persons contributes to the sum. This is ensured by havingp
be a part of the relation that is being aggregated.
- The same example, but now in a
Note that max
and sum
compute, respectively, the maximum and the sum of the set of last elements of the tuples in the relation.
The relation can have any arity, or even contain tuples of different arities.
The next section introduces “from” expressions, which do not add the introduced variables to the result, and are thus more similar to comprehensions in programming languages.
“From” Expressions
“From” expressions are similar to relational abstractions, but the variables introduced by the binding are not a part of the resulting value. The arity of the expression is entirely determined by the expression.
Expr := Expr "from" Bindings
The key difference between for
(introduced in the previous section) and from
(introduced
here) with some examples:
Example | Description |
---|---|
x+1 for x in {10; 20; 30} | Binary relation {(10, 11); (20, 21); (30, 31)} . |
x+1 from x in {10; 20; 30} | Unary relation {11; 21; 31} . |
x, x+1 from x in {10; 20; 30} | Binary relation {(10, 11); (20, 21); (30, 31)} . |
x, x+1 for x in {10; 20; 30} | Ternary relation {(10, 10, 11); (20, 20, 21); (30, 30, 31)} . |
The third example above shows that from
can be used in a similar way as the for
construct by explicitly including the variables that are introduced in the expression that precedes the from
.
Expression | Value |
---|---|
x+1 from x in {1; 2; 3} | {2; 3; 4} |
x^2 from x in {1; 2; 3} | {1; 4; 9} |
{x^2 from x in {1; 2; 3}} | {1; 4; 9} |
x^2, x^3 from x in {1; 2; 3} | {(1, 1); (4, 8); (9, 27)} |
(x^2, x^3) from x in {1; 2; 3} | {(1, 1); (4, 8); (9, 27)} |
The from
construct is particularly useful if auxiliary variables are needed in some bigger definition.
Typically this comes up in the domain of a binding. For example, suppose that you want to define a binary relation that associates a person x
with the name of the father of x
.
The problem is that you must find the father of x
, but the father is not a part of the resulting relation.
-
def fathers_name = x where exists(f: father(x, f)): name[f]
(wrong)- This definition is incorrect. The variable
f
is introduced with an existential quantifier, but it cannot be used outside the scope of the quantifier, so you will get a compilation error with the information thatf
is undefined inname[f]
.
- This definition is incorrect. The variable
-
def fathers_name = name[f] | x: exists(f: father(x, f))
(wrong)- Using the reversed syntax for relational abstraction will not solve the problem.
There will be an error, because the occurrence of
f
inname[f]
is not in the scope of the existential quantifier.
- Using the reversed syntax for relational abstraction will not solve the problem.
There will be an error, because the occurrence of
-
def fathers_name = x: {name[f] from f where father(x, f)}
(correct)- This correct example introduces the variable
f
in a separatefrom
. The curly braces are optional, but may help to understand the precedence and scope off
.
- This correct example introduces the variable
-
def fathers_name = x, name[f] from x, f where father(x, f)
(correct)- If what is actually computed gets confusing, it can also help to just
explicitly list the entire desired result, in this case
x, name[f]
. The variablex
must now be introduced in thefrom
binding (otherwise it would be undefined).
- If what is actually computed gets confusing, it can also help to just
explicitly list the entire desired result, in this case
-
def fathers_name[x] = name[father[x]]
(correct)- In this example you actually did not need the variable
f
at all, so it could also have been written this way. The logic is not always this straightforward, though.
- In this example you actually did not need the variable
The construct from x ...
is technically a form of existential quantification.
The following two definitions are equivalent:
def fathers_name = x, f from x, f where father(x, f)
def fathers_name = y, nm : exists(x, f: father(x, f) and y = x and nm = f)
From-Where
To make Rel more accessible to users who are familiar with LINQ (Language-Integrated Query) or SQL, Rel supports a similar syntax for queries:
Expr := Expr "from" Bindings
Although where
does not occur in this grammar rule, it is a part of Bindings
and leads to a syntax very similar to LINQ.
s.name from s in students
where s.age > 12 and s.age < 20
p.partkey from p in part
where like[p.name, filename]
This example uses composite relations (for example, s.age
and p.name
).
For more details on the semantics of relational composition, also called navigation, see Composition.
Relational Abstraction Alternatives
Since there are a few variants of relational abstractions, it is important to know which one to use. When writing an abstraction, first consider what would be more readable: introducing the variables first, or beginning with the value that is computed. This splits the options into two categories with only a few alternatives.
In the following examples p
is defined by def p = {1; 2; 3}
.
Abstractions that have Bindings
first and Expr
second:
x in p: x+1 = {(1, 2); (2, 3); (3, 4)}
x in p where x < 3: x+1 = {(1, 2); (2, 3)}
Abstractions that have Expr
first and Bindings
second:
x+1 | x in p = {(1, 2); (2, 3); (3, 4)}
x+1 for x in p = {(1, 2); (2, 3); (3, 4)}
x+1 from x in p = {2; 3; 4}
x+1 from x in p where x < 3 = {2; 3}
The first and second are identical. The third variant is more SQL-like and lists the computed value separately.
Next: Partial Relational Application