Updating Data: Working with EDB Relations

This concept guide describes how to save and update data in a RAI relational database using the Rel language.

Download this guide as a RAI notebook by clicking here.

Goal

Most databases are not just static collections of data, but evolve over time. New data is added, and old data is removed or changed. This concept guide describes how to save and update the data in a RAI database using the Rel language.

Prerequisites

This concept guide assumes that you are familiar with the Rel language. The how-to guides on importing CSV and JSON data cover some related material:

Introduction: IDB and EDB Relations

Generally speaking, a database can change in two ways:

  1. The data itself is updated (changed, added, or deleted), or
  2. The rules that define derived relations are updated (changed, added, or deleted).

It is important to distinguish relations stored on disk (often, raw data imported from other sources) from relations that are derived from them via rules. We use the technical terms from the Datalog literature:

  • EDB (“extensional”) relations are stored on disk (or other non-volatile media).
  • IDB (“intensional”) relations are defined by rules, computed on-demand, and are not persisted (though their definitions can be persisted, as installed code). These are analogous to views in other database systems.

Normally, EDB relations contain data from external sources that cannot be otherwise be captured; they are the “raw data” as far as the database is concerned. In contrast, IDB relations are derived, and can be reconstructed by applying rules to the EDB relations.

EDB relations often contain data loaded from CSV files or other external sources. Here we directly create a small EDB relation using insert, which we describe in more detail below:

update
def insert[:parent] = {("James", "Harry"); ("Harry", "Lily")}

Relation: parent

"Harry""Lily"
"James""Harry"

While parent(x,y) is an EDB binary relation, whose tuples are explicitly stored in the database, grandparent(x,y) might be an IDB relation, derived from parent and defined by the rule:

query
def grandparent(x, z) = parent(x,y) and parent(y, z) from y

Relation: grandparent

"James""Lily"

IDB relations can change if the rules change, or the underlying EDB relations change.

EDB relations only change when the underlying data changes. How to interact with and change EDBs is the main topic of this concept guide.

We will cover the basics of inserting and deleting data, describe how transactions update the database, and then cover the more advanced topic of reactive updates.

Adding EDB Data: insert

To add data to a new or existing EDB relation, we use the reserved insert relation. The first argument for insert should be a relation symbol, indicated by : . The right hand of the definition is a relation, with the data to add. Other than the (important) side effect of updating EDB relations, these definitions are just like any other rule in Rel, and all the usual Rel constructs can be used.

For example, consider a balance EDB relation, where balance(x, y) says that account x has balance y. To add data to balance, we can execute this update query:

update
def insert[:balance] = {("john", 20); ("jack", 30); ("jill", 60)}

Relation: balance

"jack"30
"jill"60
"john"20

This will add three pairs to the the balance relation, creating the relation if it does not already exist.

The insert relation is computed from the current state of the database, determining what gets added to the EDB relation.

It is even possible to query insert (see example below), but it is not persisted itself. The same is true for the delete relation discussed below.

When requested as a query output, these relations are computed from the new state of the DB (see Section Transactions below). After the above, insert will have the tuples {(:balance, "john", 20) ; (:balance, "jack", 30) ; (:balance, "jill", 60)}.

In general, insert is defined as any other relation, so the right-hand side of the definition can be a relational expression, defined from other EDB and IDB relations in the database. For example,

update
def insert[:pairs](x, y) = range(1, 10, 1, x) and x % 2 = 0 and y = x * 2

Relation: insert

:pairs24
:pairs48
:pairs612
:pairs816
:pairs1020

Relation: pairs

24
48
612
816
1020

Read-only vs. Write Transactions

The above discussions holds even if the transaction is read-only — the only difference is that the new DB state will not be saved at the end. (See Diagram X below.)

Removing EDB Data: delete

Removing data from the DB is analogous, using the reserved delete relation. For example, we can remove all balances lower than 50 with:

update
def delete[:balance](x, y) = balance(x,y) and y < 50

Relation: balance

"jill"60

This has removed the tuples ("jack", 30) and ("john", 20) from the balance relation. We can clear the entire balance relation with:

update
def delete[:balance] = balance

Relation: balance

In general, you can include def delete[:r] = r in a transaction to make sure that r is reset to empty before any new inserts happen. (This can produce an error the first time, if r is not previously defined, but the error can be ignored.)

Inserts and Deletes in the Same Transaction

Regardless of the order of the definitions, within a single transaction (described in more detail below), deletions are always executed before insertions.

When we issue overlapping inserts and deletes together, the inserts “win,” as can be expected if we think about the deletes happening first.

As an example, let’s first initialize a unary EDB relation t:

update
def insert[:t] = {1; 2}

Relation: t

1
2

We will now do an insert and a delete. (Note: in our guides, blocks of Rel code are executed as a single transaction.) Since they are in the same transaction, the order of these two lines does not matter:

update
def delete[:t] = {1; 2; 10; 20}
def insert[:t] = {1; 2; 3}

Relation: t

1
2
3

Relation: insert

:t1
:t2
:t3

Relation: delete

:t1
:t2
:t10
:t20

As we can see, 1 and 2 are still in t.

Deleting elements that don’t exist is not an error. This is why we can include extra elements in delete, such as 10 and 20 in this example, that are not in the t relation in the first place.

Updating Existing Values

In general, to update an existing value, the old value must be deleted — otherwise, the new value will simply be added, and the old and new values will co-exist.

For example, let’s give "jack" a balance of 60:

update
def delete[:balance] = balance
def insert[:balance]["jack"] = 60

Relation: balance

"jack"60

We can update the balance for "jack" with the following transaction, which also removes the old balance:

update
def delete[:balance]("jack", x) = balance("jack", x)
def insert[:balance]("jack", new) = balance("jack", old) and new = old + 40 from old

Relation: balance

"jack"100

Deleting from relations that don’t yet exist results in an error. It is not an error to include elements in delete[:r] that are not in r.

If balance is meant to be a function, we can add an integrity constraint for it. A functional dependency on a relation (indicating that the last argument is uniquely determined by the values of the previous ones) can be specified with stdlib’s function :

install
ic balance_is_a_function { function(balance) }

This IC will make sure that no customer ever has two different account balances — any transaction that violates it will fail. In Section Integrity Constraints we discuss a few other examples of ICs relevant to data updates.

Transactions

A database transaction is a unit of work that takes the database from one state S to a new state S_new. Transactions happen atomically, meaning that either the entire transaction succeeds, with no externally visible intermediate states, and no interference from other transactions, or else, the DB remains at S. (In the case of a read-only transaction, the two states are the same.)

Internally, a transaction can be divided into the following steps:

  1. Installed sources are updated, if applicable (reverted if the transaction fails).
  2. Deletes are applied, evaluating delete in state S.
  3. Inserts are applied, evaluating insert in state S.
  4. Integrity constraints are checked, in the state S_new resulting from applying the inserts and deletes
  5. If the integrity constraints fail, the entire transaction fails, and the database remains at state S.
  6. Outputs, if any, are evaluated at state S_new.
  7. Commit, if the transaction is not read-only. The database state is now S_new.
Transaction Steps
Transaction Steps

Fine print: Current and Next (Updated) DB State

Note that the body of an update refers to the “current” state of the DB, before any of the updates are carried out. For example, consider a DB where we initialize r to {1;2}, as follows:

update
def delete[:r] = r // clear r
def delete[:t] = t // clear t
def insert[:r] = {1; 2}

Relation: t

Relation: r

1
2

Let’s add some elements to r, but also set the new relation t to be equal to r:

update
def insert[:r] = {3; 4; 5}
def insert[:t] = r

Relation: insert

:r3
:r4
:r5
:t1
:t2
:t3
:t4
:t5

Relation: t

1
2

Relation: r

1
2
3
4
5

As we can see, the updated r will contain {1;2;3;4;5}, but t will contain the previous value of r, that is, {1;2}. This is because when evaluating def insert[:t] = r, the relation r is the previous one: the current transaction has not been committed, and r has not been changed yet.

If we wanted t to get the latest values of r, we should execute def insert[:t] = r in a separate transaction, after r has been updated.

update
def insert[:t] = r

Relation: t

1
2
3
4
5

Relation: r

1
2
3
4
5

The situation is different when we evaluate the output relation. As we can see, insert[:t] in the output will contain the new value of r ({1; 2; 3; 4; 5}), rather than the old one. Future versions of the system will let us distinguish between the new and old values, but for now, this means we cannot rely on the output value of insert and delete alone to see deltas. See the section Change Tracking for alternatives.

Loading and Updating CSV and JSON Data

CSV files are loaded using insert as well. See the CSV Import how-to guide for details; here we give a quick recap:

def insert[:my_csv_relation] = load_csv["/my/filename.csv"]
def insert[:my_json_relation] = load_json["/my/filename.json"]

Note that the above will add new data to any previously existing relation. To fully replace previously loaded data, we can do:

def delete[:my_csv_relation] = my_csv_relation
def insert[:my_csv_relation] = load_csv["/my/updated_filename.csv"]

The loaded relation can be transformed before being persisted — for example, if only a subset of the external data is persisted. Here, we keep only columns A and B:

@inline
def schema_map[CSV](k, v) = CSV(:A, pos, k) and CSV(:B, pos, v) from pos
def insert[:my_csv_relation] = schema_map[load_csv["/my/filename.csv"]]

Updating a Single Field in JSON Data

Updates can be used to modify loaded JSON data, as expected. To demonstrate, we first load a simple JSON object:

update
def config[:data] = """{ "a": {"b": {"c": 1, "d": 2}} }"""
def insert[:myjson] = load_json[config]

Relation: myjson

:a:b:c1
:a:b:d2

We can update one of its fields as follows:

update
def delete[:myjson][:a, :b, :c] = myjson[:a, :b, :c]
def insert[:myjson][:a, :b, :c] = 1234

Relation: myjson

:a:b:c1234
:a:b:d2

See the JSON Import and Export how-to guide for more on JSON data.

Reactive Updates: Example

Installed update rules can be used to implement reactive database logic, where the values in the database are updated according to external events.

As an example, consider the balance EDB relation again. We can clear it and give "jack" and "jill" a balance of 100, with:

update
def delete[:balance] = balance
def insert[:balance] = {("jack", 100); ("jill", 100)}

Relation: balance

"jack"100
"jill"100

To transfer money from one account to another, but only if balances do not become negative, we can install the following update logic:

install
def insert[:balance] = {(p1, b1_new) ; (p2, b2_new) }
from
p1, p2, val,
b1_new, b2_new, b1_old, b2_old
where
transfer(p1, p2, val)
and val != 0
and balance(p1, b1_old)
and balance(p2, b2_old)
and b1_new = b1_old - val
and b2_new = b2_old + val
and b1_new >= 0
and b2_new >= 0

def delete[:balance](x, y) = balance(x,y) and insert(:balance, x, new) from new

If the IDB relation transfer(p1, p2, val) holds a non-zero value val, the above logic “triggers” and we transfer an amount val from account p1 to account p2, provided neither account falls below 0.

Note that even though we have not defined transfer yet, we are able to install in the database the logic that will update the EDB relation balance if transfer is present. Once the relation transfer is defined, the balance transfer happens. The transfer relation is known as the trigger in some formulations, since the update does not happen without it.

Let’s trigger a balance transfer of 40 from "jack" to "jill" by simply doing:

update
def transfer = ("jack", "jill", 40)

Relation: balance

"jack"60
"jill"140

Since the installed code runs with every query, this tuple in the transfer relation caused the updates to balance to happen.

We will sanity-check our balance updates by installing these integrity constraints:

install
ic balance_check {
balance["jack"] + balance["jill"] = 200
}

ic nonnegative_balances(x, y) {
balance(x, y) implies y >= 0
}

Executing the same query again triggers another update to balance:

update
def transfer=("jack", "jill", 40)

Relation: balance

"jack"20
"jill"180

If we execute it one more time, balance will be unchanged, since there are not enough funds in the "jack" account:

update
def transfer=("jack", "jill", 40)

Relation: balance

"jack"20
"jill"180

The condition b1_new >= 0 prevented the insert from being triggered. Note that no error happened: the transaction succeeded, but no updates were made.

Installed Unconditional Updates

It is important to remember that installed insert or delete definitions are always executed with each transaction. This might lead to some confusing behavior, if one forgets that the code is installed.

If you install an unconditional insert, as in insert[:p] = 1, that value will always be present in the relation and cannot be removed, unless you uninstall the insert. Note that an installed delete can possibly be overridden by an insert, since inserts are executed after deletes. An unconditional insert can only be undone by uninstalling or modifying the corresponding source code.

This behavior might be useful to ensure that certain elements of a relation can never be deleted.

The console notebook lets you browse and edit installed sources.

If using the API, you can use RelationalAI.list_source(<connection>) to see what is installed in the database, and RelationalAI.delete_source(<connection>, <source_name>) to remove a particular source.

Change Tracking

We can debug our updates by using auxiliary EDB relations to see what was inserted and deleted. For example, we can install these rules for inserted and deleted:

install
def delete[:inserted] = inserted
def delete[:deleted] = deleted
def insert[:inserted](p, x) = insert[p](x) and p != :inserted
def insert[:deleted](p, x) = delete[p](x) and p != :deleted

If we update t, we can check for inserted and deleted afterwards:

update
def delete[:t] = t
def insert[:t] = {10;11;12}

Relation: inserted

:t10
:t11
:t12

Relation: deleted

:t1
:t2
:t3
:t4
:t5

Note that inserted and deleted are really supersets of what was changed in the relation. For an exact accounting, we can use a snapshot, as described below.

Snapshotting Relations

We can always snapshot an IDB or EDB relation, saving it to disk. Currently, the relation t is {10; 11; 12} (see above). We now snapshot and update it:

update
def delete[:t_snapshot] = t_snapshot
def insert[:t_snapshot] = t
def insert[:t] = {100; 101}
def delete[:t] = t

Relation: t_snapshot

10
11
12

Note that the relation is captured before the updates are applied. Later, we can compare the snapshot with the latest version of the relation. For example:

query
def changed[:added](x...) = t(x...) and not t_snapshot(x...)
def changed[:deleted](x...) = t_snapshot(x...) and not(t(x...))

Relation: changed

:added100
:added101
:deleted10
:deleted11
:deleted12

Note that this gives us more accurate information about what has actually changed in t, compared to looking only at insert and delete.

Integrity Constraints

Recall that integrity constraints are evaluated after all deletes and inserts are applied. If an update violates an integrity constraint, the violation is reported and the entire transaction fails. Integrity constraints can therefore ensure that properties of the database are maintained by any update. (See the Integrity Constraints concept guide for details.)

For example, we can make sure that a relation contains only integers with:

ic p_int { forall(x: p(x) implies Int(x) ) }

We can ensure that a relation is never empty by installing the constraint

ic p_nonempty { not empty(p) }

As the diagram in the section Transactions shows, integrity constraints are checked after all the updates are applied, and before the transaction commits. Thus, for example, the p_nonempty IC will not be triggered if, in the same transaction, we delete all the elements of p and then insert a new one:

def delete[:p] = p
def insert[:p] = 1

We can disallow all inserts or deletes with this integrity constraint:

ic readonly_database { not exists(x...: insert(x...)) and not exists(x... : delete(x...))}

See the Integrity Constraints concept guide for more examples of ICs.

Recursive Inserts and Deletes

The insert and delete relations can be defined recursively, to handle “cascading updates,” where an update can depend on previous ones. For example:

def insert[:ancestor](x, y) = parent(x, y)
def insert[:ancestor](x, y) = parent(x, t) and insert[:ancestor](t, y) from t

This populates an ancestor EDB relation, starting with the existing parent relation, and then recursively adding new pairs if the inserts imply it. (See the Recursion Tutorial for more details on recursion in Rel.)

As another example, assume that we have a binary part_of(x, y) relation among products, where x is a component of y, and that we want to mark products as imported if one of their parts is imported. Given a new_import fact, we can then update

def insert[:imported](x) = new_import(x)
def insert[:imported](y) = part_of(x, y) and insert[:imported](x) from x

Note that this only updates the imported relation for products connected to the new_import fact.

Other Notes

Listing and Removing EDB Relations

The SDK APIs can list all (or some) EDB relations. (Future releases will support this in Rel.) For example, in the Julia SDK:

RelationalAI.list_edb(conn::DBConnection, [relname]) -> Vector{RelKey}

The optional relname argument can be used to list the EDB relations with that name Otherwise all EDB relations are returned.

All EDB relations with a given name can be erased with

RelationalAI.delete_edb(conn::DBConnection, relname) -> Vector{RelKey}

Overloaded EDB relations

Creating both IDB and EDB relations with the same name is currenty not allowed. For example, this will fail:

def insert[:r] = {1;2}
def r = 3

Like IDB relations, EDB relations can be overloaded by arity and type. If we insert different types or arities into the same relation name, overloaded versions of the relation will be created.

update
def insert[:q] = 3.0
def insert[:q] = "a"
def insert[:q] = (1, "b")

Relation: q

"a"
3.0
1"b"