Skip to content
Aggregation, Group-by, and Joins

Rel Primer: Aggregations, Group-By, and Joins

This Primer covers quantification, , group-by aggregations, and in .

Introduction

This Rel Primer section focuses on common database operations in Rel: quantification, aggregations, group-by aggregations, and joins.

See Basic Syntax for an introduction to the basic of Rel, a prerequisite for this Rel Primer section.

Quantification

Existential Quantification

Logical statements often require a to exist with a specific value such that the statement holds. You can interpret this statement in English as “there exists” or “for some.” Expressing such a requirement is called .

In Rel, existential quantification is expressed with exists. The following example expresses the logical statement:

There exist some values for x and y such that the tuple (x, y) can be found in the myrel and y is larger than x:

// read query
 
def myrel = { (1, 8); (2, 4); (3, 2); (4, 1) }
 
def output {
    exists(x, y :
        myrel(x, y) and y > x
    )
}

Other ways to indicate existential quantification are:

  • The keyword from,
  • The symbol _ () or the relation Any.

The example above using from reads:

// read query
 
def myrel = { (1, 8); (2, 4); (3, 2); (4, 1) }
 
def output {
    myrel(x, y) and y > x
    from x, y
}

The symbol _ can be used as an existentially quantified new variable that will match anything.

This is useful if you are not interested in all columns in a relation.

// read query
 
def myrel = { (1, 8); (2, 4); (3, 2); (4, 1) }
 
def output(x) {
    myrel(_, x)
}
🔎

Separate occurrences of _ in an expression are independent and don’t affect each other.

The example above requests all values x that occur in the second column in myrel. The symbol _ can be replaced with Any, because myrel(Any, x) is equivalent to myrel(y, x) and Any(y). Using Any is the recommended way to express information such that the data type can be anything.

Universal Quantification

Rel also supports , provided the quantified has a finite domain:

// read query
 
def output("yes") {
    forall(x in {4; 5; 6} : x < 10)
}

In general, to restrict the domain being quantified over, use in to restrict single variables, and where to restrict combinations of variables.

// read query
 
def output {
    forall(
        x in range[1, 10, 1], y in range[1, 80, 1] where y = x^2 :
        9*x - y > 0
    )
}

In the example, the forall statement tests that, for all numbers x up to 10 and y up to 80 where y is the square of x, the expression 9*x-y is positive. This is the case, so the response is true (()).

If the domain of the variable(s) is empty (false), the result is always true, regardless of the quantified formula:

// read query
 
def mydomain = {1; 2; 3; 4}
def output {
    forall(x where mydomain(x) and x < 0 : x > 10)
}

This must be so, because forall(x in D : F) is always equivalent to not exists(x in D: not F). If the domain D is empty, then exists(x in D: F) is false, regardless of F.

Aggregations

To demonstrate , you will import some basic information about soccer players from Spring 2021, store these data in a (player_csv), and install a few , making them available for future use.

First, you can import the data and save them in a base relation:

// write query
 
def config:data = """
    name,salary,age,plays_for,nationality
    Messi,70,32,BFC,Argentina
    Griezmann,45,28,BFC,France
    Busquets,15,31,BFC,Spain
    Pique,12,32,BFC,Spain
    Dembele,12,22,BFC,France
    Umtiti,12,25,BFC,France
    Cortois,7,28,RM,Belgium
    Carvajal,7,28,RM,Spain
    Ramos,15,34,RM,Spain
    Varane,7,27,RM,France
    Marcelo,7,32,RM,Brazil
    Kroos,10,30,RM,Germany
    Modric,10,35,RM,Croatia
    """
 
def config:schema = {
    (:name, "string");
    (:salary, "int");
    (:age, "int");
    (:plays_for, "string");
    (:nationality, "string")
}
 
def csv = load_csv[config]
def delete[:player_csv] = player_csv
def insert[:player_csv] = csv
def output = player_csv

In practice, config:data would be a file uploaded to a RAI worksheet as a , or config:path would be specified to point to a CSV file location. See CSV Import for details.

Now you can install some derived relations:

// model
 
def player(prop, name, val) =
    player_csv(prop, row, val)
    and player_csv(:name, row, name) from row
def name(x) = player(_, x, _)
def salary = player:salary
def age = player:age
def plays_for = player:plays_for
def nationality = player:nationality
def team(t) = player:plays_for(_, t) // Set of teams

Once installed, these definitions are available for — see Working With Models. For an explanation of _ and from see Existential Quantification. Expressions like player:salary use the Rel module syntax.

🔎

player_csv is a , created by the write query that loaded the CSV data and inserted them into player_csv, above. The player, name, salary, etc. are , sometimes known as views, based on the base relation data. The model above makes them available to subsequent queries.

Since player is now installed, you can query it:

// read query
 
def output = player

The Last Argument

When writing a table, it is natural to put the first and the last: players and their age (as above); players and their salary; or graph and their weight.

Rel provides a number of common operations that operate on the last of a , For example, the Standard Library includes utilities for the max, min, sum, and average of a relation, taking the last argument of the relation as the value to be aggregated. The basic salary stats for the installed player relation can be computed as follows:

// read query
 
def output = { (:sum, sum[salary]);
               (:average, average[salary]);
               (:count, count[salary]);
               (:argmax, argmax[salary]) }
 

Note that these two numbers are different:

// read query
 
def output = count[salary], count[x : salary(_, x)]

count[salary] is the number of rows in the salary relation. The relation x : salary(_, x) contains the unique values found in the second argument. As different players have the same salary, the second number is smaller. This is important to keep in mind when computing averages and sums, as you will see below in the section on group-by aggregations.

🔎

When computing , make sure you include , to avoid conflating equal values.

Aggregating Over Empty Relations

In most cases, aggregating over an empty gives an empty set, rather than, say, 0. For example, count[x in name : salary[x] < 0] is {}, rather than {0}. This is a design choice that simplifies the semantics of the language, and often results in more sparse intermediate data, where the default (0) does not have to be represented.

If you want to include the default, you can use the override operator, left_override, also known as <++, from the Rel Standard Library.

For example:

// read query
 
def output = c in {"RM"; "BFC"; "Chelsea"} :
             sum[p where plays_for(p, c) : salary[p] ] <++ 0

Without <++ 0, the row for “Chelsea” would not be included in the results. For more on <++, see Advanced Syntax.

Bindings

Many expressions result in a that has value keys first, and one or more metrics that follow. When writing these expressions in a Rel or , you can choose to have the go first and the values — or metrics — go last, or vice versa, depending on what feels more natural. This will not change the result, as shown below.

Consider an example from Basic Syntax, which uses :, so the bindings go first and the values go last:

// read query
 
def mydomain = range[1, 5, 1]
def output = x in mydomain, y in mydomain where x + y = 5 : x-y, x+y, x*y

You can move the bindings, which include the where and in constraints, to the other side of the : and still have an equivalent expression. The following section shows how to do this.

Using for or |

The construct for can be used instead of : to put the values first and the last, which sometimes makes things easier to read — for example, when the where condition is itself a complex expression. Note that the result is exactly the same.

// read query
 
def mydomain = range[1, 5, 1]
def output = x-y, x+y, x*y for x in mydomain, y in mydomain where x+y = 5

Even though they read differently, these two are equivalent. In both cases, the values of x and y will appear first in the resulting .

Going back to , suppose you want to compute the average salary of players under 30.

By choosing to use : or for, Rel lets you put the condition first and the metric — the value being aggregated, in this case, salary — last, or the metric first and the condition last.

Condition first, metric second, using : :

// read query
 
def output = average[x in name where age[x] < 30 : salary[x] ]

Metric first, condition last, using for (or its alias |) :

// read query
 
def output = average[salary[x] for x in name where age[x] < 30]

The results are the same.

In general, for an expression Expr and b, Expr for b is equivalent to b : Expr. For a more mathematical notation, Expr for b can also be written as Expr | b. For example:

// read query
 
def output = 100 * (x + y) | x in {1; 2}, y in {1; 3} where x + y = 3

You can read | as “such that”, remembering that the bound are included at the beginning of the resulting tuples, which is what you want for correct aggregation results.

🔎

In , use b : e ( at the left) or e for b (or e | b, bindings at the right) to make sure you are not conflating that have the same metric value. Use from only if you want to quantify away keys, and possibly remove duplicate values as a result.

🔎

Summary: e | b is equivalent to e for b, which is equivalent to b : e . In all three cases, the result keeps the from b, making it safe for aggregation.

Group-By

The : operator lets you do group-by aggregations easily. For example, to see the average age for each team:

// read query
 
def output = x in team : average[p where plays_for(p, x): age[p] ]

Note: The in team clause is not really needed, since plays_for constrains the values of x in the same way.

If you prefer, you can write this as:

def output[x in team] = average[p where plays_for(p, x): age[p] ]

As this is more readable, this Primer adopts this style below.

To see the average salary and count, grouped by age:

// read query
 
def output[a] = average[p where age(p, a) : salary[p]], count[p : age(p, a)]

Average salary and count, grouped by nationality:

// read query
 
def output[n] = average[p where nationality(p, n) : salary[p]],
                count[p : nationality(p, n)]

for and from

When aggregating, you will usually need for, or its equivalent |. For example:

// read query
 
def output:right = sum[salary[x] for x in name
                       where plays_for(x, "RM") and age[x] < 30]
def output:wrong = sum[salary[x] from x in name
                       where plays_for(x, "RM") and age[x] < 30]

notation is used here to make right and wrong subrelations of output. See Modules.

There are three players satisfying the condition, all with a salary of 7. The first takes the sum of this :

// read query
 
def output = salary[x] for x in name where plays_for(x, "RM") and age[x] < 30

The second aggregation, which away the player, just takes the sum of the relation {7}.

🔎

Use for to keep the and from to existentially quantify them away.

Joining Relations

In database parlance, a combines columns from different tables to build a new one, based on common in the rows of each table. In , this corresponds to defining new , based on common elements between the in the joined relations.

It is simple to join relations in Rel using and or ,.

For example, to get a list of players with their team and nationality, you can join the plays_for and nationality relations:

// read query
 
def output(player, team, country) = plays_for(player, team) and
                                    nationality(player, country)

You can use from to away variables you do not want in the result — see Existential Quantification. For example, if you just want to see the nationalities playing for each team, you can write:

// read query
 
def output(team, country) = plays_for(player, team) and
                            nationality(player, country) from player

Summary

This article has covered common database operations as expressed in : quantification, , group-by aggregations, and . For more in this Rel Primer series, see Advanced Syntax.

Was this doc helpful?