Relational Abstractions

# Relational Abstractions

Relational abstraction expressions in Rel

## Bindings

``````Bindings := FormalParams
Bindings := FormalParams "where" Expr

FormalParams := FormalParam
FormalParams := "(" FormalParams ")"
FormalParams := FormalParams "," FormalParams

FormalParam := FormalId
| FormalId "in" Expr
| FormalId "∈" Expr
| Constant

FormalId := Id
| Id "..."``````

Bindings are the mechanism by which variables are introduced in Rel. It is often useful to immediately specify the expected domain of the new variables, and the binding syntax supports that with `where` and `in`.

Bindings are not expressions themselves, but are used in many language constructs that are expressions. It is best to understand the bindings syntax separately from the language constructs where it is used.

Examples of bindings:

• `x`

• Introduces a single variable named `x`.
• `1`

• A constant is also a binding.
• `x, y, z`

• Introduces three variables: `x`, `y`, and `z`.
• `x in A`

• Introduces a single variable `x` whose domain is restricted to `A`.
• `x ∈ A`

• Equivalent to the previous example.
• `x where A(x)`

• Equivalent to the previous example.
• `x ∈ Int`

• Introduces a single variable `x`, with domain `Int`. Domains do not have to be finite.
• `z in {1; 2; 3; 4; 5}`

• Domains can also use literal relations.
• `z in 1`

• A domain can be a singleton (or even empty, but that is not often useful).
• `t where team(t) and soccer(t)`

• A domain can be a more complex logical formula.
• `x in A, y in B, z in C` or equivalently `x ∈ A, y ∈ B, z ∈ C`

• Introduces three variables, where `x` has domain `A`, `y` has `B`, and `z` has `C`. `A`, `B`, and `C` must have arity 1.
• `x ∈ Red, y ∈ Green where edge(x, y)`

• `in` and `where` can be combined.
• `x, y where edge(x, y) and Red(x) and Green(y)`

• Equivalent to the previous example.
• `x, y where edge(x, y)`

• Introduces two variables `x` and `y` in the `edge` domain.

For the `where` case, the `Expr` must be arity 0 (a formula). The formula can use all the variables in the binding.

For the `in`/`∈` case, the expression must have arity 1 if the left-hand side is just an identifier, that is, the name of a variable rather than a vararg. If it is an identifier followed by three dots, that is, a vararg, then the arity of the expression can be greater. For example, in `x... in {(1, 2, 3)} : true` the expression has arity 3. If a vararg is used, the arity of the expression determines how many variables the vararg represents.

This is illustrated by the example below. The body of the relational abstraction is equivalent to `true`:

``````// read query

def P = 1; 2, 3; 4, 5, 6
def output = x... in P : ()``````

You can also consider an expression such as `x in P` — or `x... in P` — as a kind of relational application. For example, the following three definitions are equivalent:

``````def Q = x... in P : true
def Q(x... in P) = true
def Q(x...) = P(x...)``````

## Relational Abstraction

``Expr := Bindings ":" Expr``

If `:` is followed by an identifier, you should separate the two with white space. Otherwise the colon will be interpreted as beginning a Symbol.

For example, `def output {x: x = 7}` will produce `7`, but `def output {x:x = 7}` will result in an error message.

Relational abstraction defines a relation without a name. It is the most important construct in the language. You will see later that all syntactic variations translate into relational abstractions.

The expression that follows the colon is often called the body of the relational abstraction. The arity of a relational abstraction is the number of variables in the bindings plus the arity of the body. If the body is a formula — that is, has arity 0 — then only the variables from the bindings remain.

The examples in this section are based on the following definitions:

• `def p = {1; 2; 3}`.
• `def q = {2; 3; 4}`.
• `def r = {(1, "a"); (2, "b"); (3, "c")}`.
• `def abc = {"a"; "b"; "c"}`.
• `def parent = {("John", "Mary"); ("Mary", "Felix"); ("Felix", "George")}`.

It is common for a relational abstraction to have a formula as the expression. This is similar to a Datalog rule, but without a name for the relation that is defined. Each of the following examples use a formula:

• `x: p(x) and q(x)`
• Unary relation (one variable `x`).
• Value: `{2; 3}`.
• `x: p(x) or q(x)`
• Unary relation.
• Value: `{1; 2; 3; 4}`.
• `x, s: q(x) and r(x, s)`
• Binary relation.
• Value: `{(2, "b"); (3, "c")}`.
• `s: exists(x: q(x) and r(x, s))`
• Unary relation. The `x` is existentially quantified in the body of the relational abstraction.
• Value: `{"b"; "c"}`.
• `x, y: p(x) and q(y)`
• Binary relation.
• This is the Cartesian product of `p` and `q`, which is `{(1, 2); (1, 3); (1, 4); (2, 2); (2, 3); (2, 4), (3, 2); (3, 3); (3, 4)}`
• `x, t, y: parent(x, t) and parent(t, y)`
• Ternary relation.
• Value: `{("John", "Mary", "Felix"); ("Mary", "Felix", "George")}`.
• `x, y: parent(x, t) and parent(t, y)`
• This is an error because variable `t` is never introduced.
• `x, y: ∃(t: parent(x, t) and parent(t, y))`
• Value: `{("John", "Felix"); ("Mary", "George")}`.
• `x: Int(x) and x > 0`
• Value is the infinite relation `{1; 2; 3; 4; ...}`.
• `x, y: x = y`
• Binary relation of all values that are equal, for example, `(1, 1)` and `('a', 'a')`.
• `x: x = 1`
• Singleton relation `{(1,)}`.
• `x: true`
• Infinite unary relation (for example, contains `1` and `"abc"`).
• `x: false`
• Empty unary relation.
• `x...: p(x...) and q(x...)`
• Varargs are explained in Rel Primer: Advanced Syntax. In this case, `x...` will only represent a single variable, because `p` and `q` have arity 1. The value is `{2; 3}`.
• `{x: p(x) and q(x)}`
• If an expression contains curly braces that are not used to form a relation literal or parentheses that are not used to form a relational application, then the curly braces or parentheses are used solely to indicate precedence. Sometimes a relational abstraction is more clear when it is enclosed in curly braces. This does not change the semantics in any way.

The previous examples all use formulas for the body of the relational abstraction. Expressions with arity bigger than 0 are allowed as well:

• `x, y: x + y`
• Ternary, infinite relation.
• Example tuples: `(1, 1, 2)`, `(2, 3, 5)`, and `(3.5, 1.2, 4.7)`.
• `x: x + 1`
• Binary, infinite relation.
• Example tuples: `(1, 2)`, `(1.0, 2.0)`, and `(3.14, 4.14)`.
• `x in Int: x + 1`
• Binary, infinite relation.
• Example tuples: `(1, 2)`, `(2, 3)`, and `(3, 4)`.
• `x where Int(x): x + 1`
• Equivalent to the previous example.

Here are some more advanced examples:

• `p: today - date_of_birth[p]`
• Binary relation of person and their age. The arity is 2: 1 for `p` plus 1 for the body.
• `x: sum[v: scores(x, v)]`
• Sum scores `v`, grouped by `x`.
• Note that there are two relational abstractions here: one outermost, and one that is an argument of `sum`.
• See Aggregation for more details.
• `mean[x, y: (actual[x, y] - prediction[x, y]) ^ 2]`
• Compute the `mse` for a machine learning solution. Note that the Standard Library defines `mse`, which can be used as `mse[actual, prediction]`.
• `x: edge(x, x)`
• Unary relation of all nodes that have a self edge.
• `∃(c ∈ course: ∀(s ∈ student: enrolled(c, s)))`
• The arguments to `∃` and `∀` are relational abstractions.
• `sum[date, store_nbr, item_nbr: train_unit_sales[date, store_nbr, item_nbr]]`
• Sum up the sales for every unique `(date, store_nbr, item_nbr)` triple in the `train_unit_sales` relation.

Bindings can also be constants, as an alternative to introducing a separate variable. Examples of constants in a binding of a relational abstraction:

• `x, 1: abc(x)`

• Value: `{("a", 1); ("b", 1); ("c", 1)}`.
• `def count_abc = sum[x, 1: abc(x)]`

• Sums the last elements of all tuples. Since the last element of each tuple is `1`, this defines the count of the number of tuples in `abc`, which is `3`.
• The Standard Library defines `count` in this way, and that relation can be used as `count[abc]` in this example.
• `1: p(_)`

• Value: `1` (given that `p` is not empty).
• Note that the left-hand side of the relational abstraction does not have to contain an explicit variable.
• The underscore expression is syntactic sugar for an anonymous variable. See Underscore for more details.

Note that relational abstraction can be nested, and that `:` associates to the right. Some simple examples:

``````// read query

@inline
def is_true[R] = if R then "yes" else "no" end
def output:A = is_true[equal((1 : 2 : 3), (1 : (2 : 3)))]
def output:B = is_true[equal((1 : 2 : 3), (1, 2, 3))]``````
``````// read query

def P = 2; 3; 20; 30
def output = x in P where 9 < x : y in P where y < 10 : x * y``````

Finally, it is important to note that a relational abstraction can be used directly in a relational application or a partial relational application:

``````// read query

{x in Int, y in Int : x * x + y}[2, 3]``````

The bindings play the role of the formal parameter specifications in a definition. You can think of the variables introduced in the bindings — `x` and `y` in this example — as the parameters of the relational abstraction.

If you are familiar with programming languages, you will notice that in some ways relational abstractions are similar to lambda expressions.

## An Alternative Form of Relational Abstraction

Relational abstractions are often used as parameters to higher-order abstractions, such as aggregations. Typically the most important thing to see first in an aggregation is the value that is being aggregated. To make this easier to see immediately, the order of the bindings and the body of a relational abstraction can be inverted. The `:` becomes a `|` (or the keyword `for`) in this case. This syntax is designed to mimic the common mathematical notation:

``````Expr := Expr "|" Bindings
Expr := Expr "for" Bindings``````

This is purely a syntactic convenience: `e | b` is equivalent to `e for b` which is equivalent to `b : e`.

Examples:

• `x+1 | x in {1; 2; 3}`

• Value `{(1, 2); (2, 3); (3, 4)}`.
• Equivalent to `x in {1; 2; 3}: x+1`.
• `x+1 for x in {1; 2; 3}`

• Equivalent to the previous example.
• `x^2 for x in {1; 2; 3}`

• Value `{(1, 1); (2, 4); (3, 9)}`.
• `{x^2 | x ∈ {1; 2; 3}}`

• Equivalent to the previous example.
• Just as in normal relational abstraction, optional curly braces can be used for clarity.
• `x^2, x^3 for x in {1; 2; 3}`

• The initial expression is now a tuple. The value of the entire expression is a ternary relation `{(1, 1, 1); (2, 4, 8); (3, 9, 27)}`. The comma operator is explained in more detail in Comma (Cartesian Product and Conjunction).
• `(x^2, x^3) for x in {1; 2; 3}`

• Equivalent to the previous example. Parentheses are used for precedence and have no other semantic here.

To users familiar with list comprehension in programming languages such as Python, Julia, or Haskell it may seem strange that the variable introduced in the comprehension (for example, `x` in `x^2 for x in {1; 2; 3}`) is a part of the result. In these programming languages, the value of the comprehension is a list of the values of the expression before the `for`.

The distinction with Rel is that in Rel every value is a relation, and a relation is a set of tuples. This means that duplicates are eliminated. For some computations this is the intent, but often it is not. Some examples to illustrate this:

• `age[p] for p in employee`

• This is a binary relation of `Person` and `Int`, not a unary relation of ages. The latter would contain every age value only once.
• `max[age[p] for p in employee]`

• This is the previous example used as an argument to the `max` aggregation. The result is a singleton containing the age of the oldest employee. Note that for this particular aggregation it does not matter whether age values are kept for all employees.
• `sum[age[p] for p in employee]`

• The same example, but now in a `sum` aggregation. Here it is very important that even if two different persons `p1` and `p2` have the same age, the age of both persons contributes to the sum. This is ensured by having `p` be a part of the relation that is being aggregated.

Note that `max` and `sum` compute, respectively, the maximum and the sum of the set of last elements of the tuples in the relation. The relation can have any arity, or even contain tuples of different arities.

The next section introduces “from” expressions, which do not add the introduced variables to the result, and are thus more similar to comprehensions in programming languages.

## “From” Expressions

“From” expressions are similar to relational abstractions, but the variables introduced by the binding are not a part of the resulting value. The arity of the expression is entirely determined by the expression.

``Expr := Expr "from" Bindings``

The key difference between `for` (introduced in the previous section) and `from` (introduced here) with some examples:

ExampleDescription
`x+1 for x in {10; 20; 30}`Binary relation `{(10, 11); (20, 21); (30, 31)}`.
`x+1 from x in {10; 20; 30}`Unary relation `{11; 21; 31}`.
`x, x+1 from x in {10; 20; 30}`Binary relation `{(10, 11); (20, 21); (30, 31)}`.
`x, x+1 for x in {10; 20; 30}`Ternary relation `{(10, 10, 11); (20, 20, 21); (30, 30, 31)}`.

The third example above shows that `from` can be used in a similar way as the `for` construct by explicitly including the variables that are introduced in the expression that precedes the `from`.

ExpressionValue
`x+1 from x in {1; 2; 3}``{2; 3; 4}`
`x^2 from x in {1; 2; 3}``{1; 4; 9}`
`{x^2 from x in {1; 2; 3}}``{1; 4; 9}`
`x^2, x^3 from x in {1; 2; 3}``{(1, 1); (4, 8); (9, 27)}`
`(x^2, x^3) from x in {1; 2; 3}``{(1, 1); (4, 8); (9, 27)}`

The `from` construct is particularly useful if auxiliary variables are needed in some bigger definition. Typically this comes up in the domain of a binding. For example, suppose that you want to define a binary relation that associates a person `x` with the name of the father of `x`. The problem is that you must find the father of `x`, but the father is not a part of the resulting relation.

• `def fathers_name = x where exists(f: father(x, f)): name[f]` (wrong)

• This definition is incorrect. The variable `f` is introduced with an existential quantifier, but it cannot be used outside the scope of the quantifier, so you will get a compilation error with the information that `f` is undefined in `name[f]`.
• `def fathers_name = name[f] | x: exists(f: father(x, f))` (wrong)

• Using the reversed syntax for relational abstraction will not solve the problem. There will be an error, because the occurrence of `f` in `name[f]` is not in the scope of the existential quantifier.
• `def fathers_name = x: {name[f] from f where father(x, f)}` (correct)

• This correct example introduces the variable `f` in a separate `from`. The curly braces are optional, but may help to understand the precedence and scope of `f`.
• `def fathers_name = x, name[f] from x, f where father(x, f)` (correct)

• If what is actually computed gets confusing, it can also help to just explicitly list the entire desired result, in this case `x, name[f]`. The variable `x` must now be introduced in the `from` binding (otherwise it would be undefined).
• `def fathers_name[x] = name[father[x]]` (correct)

• In this example you actually did not need the variable `f` at all, so it could also have been written this way. The logic is not always this straightforward, though.

The construct `from x ...` is technically a form of existential quantification. The following two definitions are equivalent:

``````def fathers_name = x, f from x, f where father(x, f)
def fathers_name = y, nm : exists(x, f: father(x, f) and y = x and nm = f)``````

## From-Where

To make Rel more accessible to users who are familiar with LINQ (Language-Integrated Query) or SQL, Rel supports a similar syntax for queries:

``Expr := Expr "from" Bindings``

Although `where` does not occur in this grammar rule, it is a part of `Bindings` and leads to a syntax very similar to LINQ.

``````s.name from s in students
where s.age > 12 and s.age < 20``````
``````p.partkey from p in part
where like[p.name, filename]``````

This example uses composite relations (for example, `s.age` and `p.name`). For more details on the semantics of relational composition, also called navigation, see Composition.

## Relational Abstraction Alternatives

Since there are a few variants of relational abstractions, it is important to know which one to use. When writing an abstraction, first consider what would be more readable: introducing the variables first, or beginning with the value that is computed. This splits the options into two categories with only a few alternatives.

In the following examples `p` is defined by `def p = {1; 2; 3}`.

Abstractions that have `Bindings` first and `Expr` second:

``````     x in p:              x+1       = {(1, 2); (2, 3); (3, 4)}
x in p where x < 3:  x+1       = {(1, 2); (2, 3)}``````

Abstractions that have `Expr` first and `Bindings` second:

``````       x+1 |    x in p       = {(1, 2); (2, 3); (3, 4)}
x+1 for  x in p       = {(1, 2); (2, 3); (3, 4)}
x+1 from x in p       = {2; 3; 4}
x+1 from x in p where x < 3  = {2; 3}``````

The first and second are identical. The third variant is more SQL-like and lists the computed value separately.