{
"cells": [
{
"source": "# Recursion",
"id": "0",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "The original source for this notebook is here.\n\n## Goal\n\nThe goal of this concept guide is to demonstrate how to write and successfully use recursive relations (which means relations that depend on themselves) in Rel.\n\n## Preliminaries\n\nThis concept guide uses the following Rel features:\n\n- Integrity constraints (ICs) (see the [IC concept guide](https://docs.relational.ai/rel/concepts/integrity-constraints))\n\nIt is good to know how ICs work, but not required for understanding this guide.\n\n## Introduction\n\nA program or a query is called *recursive* when a predicate is defined in terms of itself.\nFor many computational problems, recursion helps to reduce complexity and to express solutions in a cleaner and more understandable way.\nA common side effect is that recursive solutions are often shorter, more compact, and easier to maintain.\n\nRel supports recursive computations, which is also a key ingredient for making the language Turing complete.\nThis means Rel can be used to perform any computation that any other programming language supports.\nThis does not mean that expressions in Rel are more compact or easier to write even though we believe that is generally the case.\n\nWith the capability to express queries recursively, Rel is able to perform a variety of computations that at first glance one would not expect from a database language:\n\n- Iterative algorithms can be expressed as recursive Rel programs (e.g.: Fibonacci, Prim's Minimal spanning tree);\n\n- Relations requiring a variable recursion depth (aka an unbounded query length) can be expressed more compactly and efficiently (e.g., transitive closure, PageRank, moving averages).\n\nTo define recursive computations, we need the following ingredients:\n\n1. A base case, and\n2. A recursive rule.\n\nThe base case (or cases) is the starting point of the recursive computation.\nThe recursive rule (or rules) is the heart of the recursive computation, which references itself.\n\n### Termination of Recursive Computations\n\nComputationally, we need a third ingredient in order to have a successful recursive algorithm: *termination conditions*, which ensure that the recursive computation doesn't run forever.\n\nGenerally, a recursive computation in Rel terminates when a fixpoint is reached,\nmeaning that the elements in a relation don't change anymore from one iteration to the next\n(see Section [Behind the Scenes](#behind-the-scenes-recursion-and-fixpoints) for more details).\n\nMore formally, this means the solution to the recursive computation is a relation (or a set of relations) `R`, which is a fixpoint of the following definition:",
"id": "1",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "```\ndef R(x...) = base_case(x...) or recursive_rule[R](x...)\n```\n",
"id": "2",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "A recursive computation defining an infinite relation (or infinite function), such as the [Fibonacci sequence](#single-recursion), currently requires a domain restriction for the computation to terminate (see examples below).\n\nFor certain recursive computations no explicit domain restriction is needed because the definitions naturally reach a fixpoint.\nGraph reachability (see [below](#recursive-computation-on-graphs)) is such a problem because the output domain, which is all possible pairs of graph nodes, is finite and the recursion is monotonically increasing.\n\n(We say that the recursion is *monotonically increasing* if at each iteration we can only add new elements, and never remove any.)\n\nFuture releases of Rel will relax this restriction, and compute the desired results in an *on-demand* fashion,\nwithout having to compute the entire (infinite) relation.\n\n## Introductory Examples\n\n### Recursive Computation on Graphs\n\nOne of the most fundamental graph queries asks if there is a path between two nodes.\nSuch questions appear frequently in the real world.\nIn terms of international travel, this question could be\n\n> Can I fly from Providence to Honolulu?\n\nLet's try to answer that question in Rel. First we need a few airports:",
"id": "3",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "src_airport",
"source": "def airport = {\n \"Providence (PVD)\"; \"New York (JFK)\"; \"Boston (BOS)\";\n \"San Francisco (SFO)\"; \"Los Angeles (LAX)\"; \"Honolulu (HNL)\";\n}",
"id": "4",
"type": "install",
"inputs": []
},
{
"source": "and a flight network,",
"id": "5",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "src_flight_network",
"source": "def directional_flight = {\n (\"Providence (PVD)\", \"Boston (BOS)\"); (\"Boston (BOS)\", \"San Francisco (SFO)\");\n (\"San Francisco (SFO)\", \"Honolulu (HNL)\"); (\"New York (JFK)\", \"Los Angeles (LAX)\");\n}\n\ndef flight(a, b) = directional_flight(a, b) or directional_flight(b, a)\n\n// integrity constraint\nic flight_between_airports(a, b) = flight(a, b) implies airport(a) and airport(b)",
"id": "6",
"type": "install",
"inputs": []
},
{
"source": "where we introduced directional flights, `directional_flight`, and bidirectional flights, `flight`.\nThe integrity constraint checks that flights can only occur between airports.\n\nNow, we can calculate all pairs of airports that are (directly or indirectly) connected with each other,",
"id": "7",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "src_connected",
"source": "def connected(a, b) = flight(a, b) // the base case\ndef connected(a, b) = exists(c : connected(a, c) and flight(c, b))",
"id": "8",
"type": "install",
"inputs": []
},
{
"source": "which is the **recursive part** of the problem.\n\nThe original question now reads:",
"id": "9",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def output =\n if connected(\"Providence (PVD)\", \"Honolulu (HNL)\") then\n \"yes\"\n else\n \"no\"\n end\n\n// This integrity constraint will fail if we don't compute that\n// PVD and HNL are connected:\nic {equal(output, \"yes\")}",
"id": "10",
"type": "query",
"inputs": []
},
{
"source": "Let's ask another question and check if we can fly from `Boston (BOS)` to `Los Angeles (LAX)`.",
"id": "11",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def output =\n if connected(\"Boston (BOS)\", \"Los Angeles (LAX)\") then\n \"yes\"\n else\n \"no\"\n end\n\n// This integrity constraint will fail if we compute that\n// BOS and LAX are connected:\nic {equal(output, \"no\")}",
"id": "12",
"type": "query",
"inputs": []
},
{
"source": "The answer is no, because in our toy world LAX and JFK are connected with each other but isolated from all the other airports.\n\nFor the mathematical enthusiasts, the problem of computing all pairs of connected nodes in a graph is called the *transitive closure* problem.\nIn our example, the edges of the graph are defined in the `flight` relation.\n\n### Single Recursion\n\nOne of the simplest mathematical recursion formulas is the definition of the **[factorial](https://en.wikipedia.org/wiki/Factorial)** $n! = n \\cdot (n-1) \\cdot \\ldots \\cdot 1$ for $n\\in \\mathbb{N}_+$, which fulfills the recursive rule,\n$n! = n \\cdot (n-1)!$.\nIn Rel this reads",
"id": "13",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def F[1] = 1 // the base case\ndef F[n] = n * F[n - 1], n <= 10\n\ndef output = F",
"id": "14",
"type": "query",
"inputs": []
},
{
"source": "There are several interesting aspects worth mentioning about this recursive definition:\n\nRelation `F` has arity 2, where the first entry is the iteration step `n` and the second entry is the value of `n!`.\n\nThe definition contains also all ingredients for a successful recursion:\n\n- Base case: `def F[1] = 1` is the starting point and translates to `1! = 1`.\n- Recursive rule: `def F[n] = n * F[n - 1]` translates to $n! = n\\cdot(n-1)!$.\n- Termination condition: `n<=10` makes the output domain finite, forces the fixpoint to be reached when $n$ hits 10, and consequently terminates the recursive evaluation. (In the future, once demand transformations are implemented, this explicit statement of the termination condition will not be required anymore.)\n\n### Multiple recursion\n\n#### Fibonacci sequence\nMultiple recursion, where the recursive rule contains multiple self-references,\ncan also be easily expressed in Rel.\n\nThe most famous example is the **[Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_number)** $F(n) = F(n-1) + F(n-2)$ with $F(0)=0$ and\n$F(1)=1$.\nIn Rel this relation reads:",
"id": "15",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def F[0] = 0 // base cases\ndef F[1] = 1 // base cases\ndef F[n] = F[n-1] + F[n-2], n<=10\n\ndef output = F",
"id": "16",
"type": "query",
"inputs": []
},
{
"source": "We can easily see, all ingredients for a successful recursive computation are present.\n\n#### Connected Nodes in a Graph\n\nIn our previous international travel example, we could have defined the `connected` relation using multiple recursions:",
"id": "17",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def connected(a, b) = flight(a, b) // the base case\ndef connected(a, b) = connected[a].connected(b) // recursive rule",
"id": "18",
"type": "query",
"inputs": []
},
{
"source": "Here the recursive rule contains also two self-references and reads:\nTwo airports `a, b` are connected if there exists a third airport that is connected to both airports.\n\n### Negation in Recursion\n\nRecursive dependencies can be made as complex as needed, and can go well beyond just asking for a single element of a previous iteration (e.g.: `F[n - 1]`).\n\nThe **[Recamán sequence](https://en.wikipedia.org/wiki/Recam%C3%A1n%27s_sequence)** is a nice example to demonstrate a more complex recursive computation. The sequence is defined as:\n\n$$a_n = \\left\\\\{ \\begin{array}{ll} 0 & \\text{if}\\ n=0\n\\newline a_{n-1}-n & \\text{if}\\ a_{n-1}-n>0\\ \\text{and the value isn't already in the sequence}\n\\newline a_{n-1}+n & \\text{otherwise} \\newline \\end{array}\\right.$$\n\nThe recursive case requires that we check that the potential next value isn't already in the sequence, and if it is then assign a different value.\n\nThe Recamán sequence can be visualized in a very neat way by connecting subsequent values with an arch.\n\n![](https://docs.relational.ai/rel/concepts/recaman-sequence-drawing.png)\n\n[(CC license)](https://en.wikipedia.org/wiki/File:Reacam%C3%A1nSequenceDrawing.png)\n\nOne way to define the Recamán sequence In Rel is as follows.",
"id": "19",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "@inline\ndef reject(n,x) = x < 0 or exists(m: recaman(m, x) and m < n)\n\ndef recaman[0] = 0\ndef recaman[n](x) = recaman[n - 1] - n = x and not reject(n, x) and 0 < n < 20\ndef recaman[n](x) = recaman[n - 1] + n = x and reject(n, x - 2 * n) and 0 < n < 20\n\ndef output = recaman\n// This IC verifies the correctness of the results by comparing with hand-computed results.\nic recaman_sequence = equal(\n recaman[_],\n {0;1;3;6;2;7;13;20;12;21;11;22;10;23;9;24;8;25;43;62;}\n)",
"id": "20",
"type": "query",
"inputs": []
},
{
"source": "where we limited ourselves to calculating only the first 20 terms.\nWe used several logical elements here, i.e.:\n\n- negation,\n- existential quantification,\n- disjunction,\n\nto express the recursive Recamán rule.\n\nAdditionally, we used the `@inline` functionality to factor out the condition evaluation in a separate relation (`reject`) helping to keep the main recursive rule compact and readable.\n\nFinally, we added an integrity constraint (IC) to check that the set of computed values is correct.\n(Note that this IC does not check that the values are in the correct order, but still gives us additional confidence in the results.)\n\n## Advanced Examples\n\nRel supports complex recursive rules that may involve multiple relations (aka mutual recursion) that depend on each other and may be recursive themselves.\n\nBefore diving into them, let's look at relations that are not monotonically increasing in cardinality as the iteration advances.\nThis is possible because elements can be not just added but also *deleted* during an iterative step.\n\n### Recursive Rules that Eliminate Elements\n\nFor all the examples we have seen so far, the number of elements in the relation (aka cardinality) is larger after we have successfully exited the recursive evaluation.\nHowever, thanks to the Rel capability to also delete (not just add) elements during an iteration, it is also possible that we may end up with fewer elements in the relation than we started with.\n\nIn the following Rel program, we have two recursive rules for `D`.\nThe first one adds the elements 1 through 10, if `D` is empty.\nThe second one keeps only elements that are smaller than or equal to 3, or different from the maximum element in `D`.\nThe effect is that the\nlargest element is iteratively removed until only the elements `1; 2; 3` remain:",
"id": "21",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def D(n) = range[1, 10, 1](n) and empty(D)\ndef D(n) = D(n) and (n <= 3 or n != max[D])\n\ndef output = D\n@inline\ndef empty(R) = not exists(x... : R(x...))",
"id": "22",
"type": "query",
"inputs": []
},
{
"source": "Notice that we defined only the recursive rules. We did not explicitly state a base case,\nwhich means that the base case is, by default, an empty relation `D`.\n\nThe numbers 1 through 10 get added if and only if `D` is empty.\nWe used the opportunity to define a custom relation `empty(D)` that checks whether a relation `D` is empty.\n\nThe key difference between the two recursive rules is: the first rule probes the *non-existence* of elements whereas the second rule depends on an *aggregated view of existing elements*.\nEither of these aspects (*negation* or *aggregation*) is enough to achieve a non-monotonic recursion.\nIt is even true that in the absence of both features, recursion is always monotonic.\n\n### Asymptotic Convergence: PageRank\n\nAnother class of recursive formulas that we can easily implement in Rel is infinite series that converge asymptotically\n(in the infinite limit, $n→∞$).\nThat is in contrast to the example in the section [Recursive Computation on Graphs](#recursive-computation-on-graphs),\nwhere we reached convergence after a finite number of steps.\n\nIn practice, we define a convergence criterion,\nwhere we stop the iteration if the value(s) between consecutive iterations are closer than a\n(small) user-defined $ϵ$.\n\n$$\\| a_{n+1} - a_n \\|^2 < ϵ \\quad \\text{for}\\ n>N_\\text{converged} \\enspace,$$\n\nwhere $N_\\text{converged}$ is the number of iterations needed to achieve our desired accuracy.\n\nMany machine learning algorithms fall in this category, where we declare the model as trained once the model parameters and/or the accuracy of the model change only marginally between iterations.\n\nLet's take the **[PageRank](https://en.wikipedia.org/wiki/PageRank)** algorithm with damping as an example.\nIt is defined as\n\n$$PR_a = \\frac{1-d}{N} + d \\sum_{b \\in \\mathcal{S}_a} \\frac{PR_b}{L_b} \\enspace ,$$\n\nwhere $PR_a$ is the page rank of website $a$, $L_b$ is the number of outbound links from site $b$, $\\mathcal{S}_a$ is the collection of websites that have a link to website $a$, $N$ is the total number of websites, and $d$ is a damping parameters, which is usually set around 0.85.\n\nFirst, we need to define some pages and links between them.\nLet's build a toy example, inspired by [Wikipedia](https://en.wikipedia.org/wiki/PageRank).",
"id": "23",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "pagerank_setup",
"source": "module graphA\n def edge = {\n (\"N1\", \"B\"); (\"N1\", \"E\");\n (\"N2\", \"B\"); (\"N2\", \"E\");\n (\"N3\", \"B\"); (\"N3\", \"E\");\n (\"N4\", \"E\"); (\"N5\", \"E\");\n (\"E\", \"F\"); (\"F\", \"E\");\n (\"E\", \"B\"); (\"E\", \"D\");\n (\"D\", \"A\"); (\"D\", \"B\");\n (\"B\", \"C\"); (\"C\", \"B\");\n (\"F\", \"B\"); }\n\n def node = x : edge(x, _) or edge(_, x)\nend",
"id": "24",
"type": "install",
"inputs": []
},
{
"source": "Here is a visualization of the graph,\n\n![](https://docs.relational.ai/rel/concepts/page-rank.svg)\n\nwhere the size of a node is proportional to its PageRank.\n\nThe PageRank algorithm assumes that *sink* nodes, which have no outgoing edges,\nare connected to all other nodes in the graph.\nWe can derive a new graph from the original as follows:",
"id": "25",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "derived_graph",
"source": "module graphB\n def node = graphA:node\n def edge = graphA:edge\n def sink(n in node) = empty(graphA:edge[n])\n def edge(a in node, b in node) = sink(a) // add new edges\nend",
"id": "26",
"type": "install",
"inputs": []
},
{
"source": "In Rel, the PageRank algorithm may read:",
"id": "27",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "pagerank",
"source": "// parameters\ndef eps = 10.0^(-10)\ndef damping = 0.85\ndef node = graphB:node\ndef edge = graphB:edge\n\ndef node_count = count[node]\ndef outdegree[x] = count[edge[x]]\n\ndef MAX_ITER = 100\n\n// a version of sum that gives 0.0 for the empty set,\n// needed if there are edges with no incoming nodes:\n@inline\ndef sum_default[R] = sum[R] <++ 0.0\n\n// pagerank(iteration, site, rank)\ndef pagerank[0, a in node] = 1.0 / node_count //base case: equal rank\ndef pagerank[n, a in node] = // recursive rule\n (1-damping)/node_count + damping * sum_default[\n pagerank[n-1, b]/outdegree[b] for b where edge(b, a)\n ],\n pagerank(n-1, a, _) // grounding `n` and `a`\n and not converged(n-1) // termination condition\n and n<=MAX_ITER // safeguard\n\n// track convergence:\ndef converged(n) =\n forall(a in node : (pagerank[n, a] - pagerank[n-1, a])^2 < eps)\n and range(1, MAX_ITER, 1, n)",
"id": "28",
"type": "install",
"inputs": []
},
{
"source": "The relation `converged` stores the iteration numbers for which the convergence criterion is fulfilled.\nThe `not converged(n-1)` condition in the main recursive definition for `pagerank` ensures that the iteration stops once convergence is reached.\nWe also added a safeguard condition that stops the recursive computation if 100 iterations have been reached.\n\nNotice that relations `pagerank` and `converged` depend on each other and must be solved together.\n\nWe use the `sum_default` relation to ensure that the sum over an empty set returns 0.0 instead of `false`,\nwhich is the default behavior of `sum[]`.\n(This is needed if there are no sink nodes in the original graph, and a node with no incoming edges.)\n\nFinally, let's look at the results.\nAfter 62 iterations we have reached our desired accuracy level with the following results:",
"id": "29",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def max_iteration = max[n : pagerank(n, _, _)]\ndef did_converge = \"yes\", converged = max_iteration\n\ndef ranks = pagerank[max_iteration]\ndef sumranks = sum[pagerank[max_iteration]]",
"id": "30",
"type": "query",
"inputs": []
},
{
"source": "Since PageRank corresponds to a probability distribution over nodes, the ranks should sum to (approximately) 1.0.\n\n### Prim’s Minimum Spanning Tree\n\nPrim’s algorithm is a [greedy algorithm](https://en.wikipedia.org/wiki/Greedy_algorithm) to calculate the [minimum spanning tree (MST)](https://en.wikipedia.org/wiki/Minimum_spanning_tree) performing the following steps:\n\n> 1. Initialize a tree with a single vertex, chosen arbitrarily from the graph.\n> 2. Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.\n> 3. Repeat step 2 (until all vertices are in the tree).\n\nAs a note, it is possible to implement this algorithm with one recursive relation.\nTo demonstrate the interplay of several recursive relations, we will implement the algorithm using two recursive relations that track the nodes that are and aren't already in the MST.\n\nFirst, lets define an undirected weighted graph inspired by [www.geeksforgeeks.org](https://www.geeksforgeeks.org/prims-minimum-spanning-tree-mst-greedy-algo-5/).\n\n![](https://docs.relational.ai/rel/concepts/graph-prims.svg)",
"id": "31",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "prim_setup",
"source": "def graph[:vertex] = range[0, 8, 1]\n\ndef graph[:edge_weight] = {\n (0, 1, 4); (1, 2, 9); (2, 3, 7); (3, 4, 9);\n (4, 5, 10);(5, 6, 2); (6, 7, 1); (7, 8, 7);\n (7, 0, 8); (1, 7, 11); (5, 2, 4);(5, 3, 14);\n (2, 8, 2); (6, 8, 6);\n}\n\n// make edges undirected\ndef graph[:edge_weight](x, y, w) = graph[:edge_weight](y, x, w)",
"id": "32",
"type": "install",
"inputs": []
},
{
"source": "where `graph[:vertex] ` contains the graph nodes and `graph[:edge_weight]` contains the edge information, i.e., the two nodes connected by the edge and the edge weight, which is the last element in the tuple.\n\nBelow is one way to implement Prim's algorithm:",
"id": "33",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def mst[:vertex] = 0 // initialize MST with a vertex\n// collect all vertices that have been already reached by the MST\ndef mst[:vertex](x) = mst[:edge](x, _) or mst[:edge](_, x)\n\n// main recursive rule\ndef mst[:edge](x, y) =\n mst[:vertex](x) and\n unvisited(y) and\n graph[:edge_weight, x, y] = lightest_edge_weight[unvisited, mst[:vertex]]\n\n// don't remove edges that have been added to the MST\ndef mst[:edge](x, y) = mst[:edge](x, y)\n\n// list of vertices not in the MST yet\ndef unvisited(x) = graph[:vertex](x) and not mst[:vertex](x)\n\n// weight of the lightest edge\n// between visited and unvisited vertices\n@inline\ndef lightest_edge_weight[S, T] = min[v : v=graph[:edge_weight][S, T]]\n\n// check that all vertices are visited:\nic {empty(unvisited)}",
"id": "34",
"type": "query",
"inputs": []
},
{
"source": "The main recursive part of this algorithm involves the three relations `mst[:vertex]`, `mst[:edge]`, and `unvisited`.\nAt each iteration, the `mst[:edges]` includes all edges we already computed, `mst[:edge](x, y) = mst[:edge](x, y)`, plus one new edge, which is the lightest edge connecting a visited vertex, `mst[:vertex](x)`, to an unvisited vertex, `unvisited(y)`.\n\nThe recursive dependencies is quite non-trivial here because the three relations refer to each other in a cyclical way\nwhere `unvisited` depends on `mst[:vertex]` which depends on `mst[:edge]` which in turn depends on `mst[:vertex]` and `unvisited` again.\nThe only direct recursive dependence is `def mst[:edge](x, y) = mst[:edge](x, y) `.\nThe purpose of this rule is to ensure that edge entries, once inserted, aren't removed in a future iteration.\n\nThis safeguard is needed because the edge tuple `(x,y)` in `mst[:edge]` requires that `y` is not in `mst[:vertex]` but it might be in a subsequent iteration.\nIf this is the case, then the edge would be removed again from `mst[:edge]` in a future iteration and potentially also the vertices attached to this edge.\n\nNow, the condition that `y` should not be in `mst[:vertex]` might be fulfilled again and we would enter a cycle that never ends.\n(See section [Common Pitfalls](#common-pitfalls) for more details on never-ending recursive computations).\nThis safeguard prevents all that.\n\nFurthermore, we have a nice example in `unvisited` of a relation that decreases in size/cardinality as the recursive evaluation proceeds\n(see also the section [Recursive Rules that Eliminate Elements](#recursive-rules-that-eliminate-elements).\nInitially, `unvisited` contains all vertices in the graph,\nbut once the recursive evaluation is done, `unvisited` is actually empty,\nwhich we also double check with an integrity constraint, `ic {not unvisited(_)}`.\n\nThe edge weights of the MST are not stored in `mst` because we can look them up in `graph[:edge_weight]`.\nLet's visualize the minimal spanning tree we just calculated.\n\n![](https://docs.relational.ai/rel/concepts/graph-prims-mst.svg)\n\n\n### Logging Recursive Progress\n\nNormally, we can only access the relation after the recursive evaluation has finished (aka converged).\nIt would be great to be able to see also the intermediate progress to better understand and debug the logic we write.\n\nLet's, for example, take our example above where elements are successively removed until only three elements are left\n(see the section [Recursive Rules that Eliminate Elements](#recursive-rules-that-eliminate-elements).\nIf we remove the the condition, `n<=3`, we create a recursive algorithm that never converges (see section [Common Pitfalls](#common-pitfalls).\n\nLet's see if this is actually what is happening by creating a logging relation `Log` that records the size of `D` after each iteration.\nTo avoid the endless loop we add a condition that forces the recursive evaluation to terminate after 10 iterations have been reached.",
"id": "35",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "log",
"source": "// number of iterations\ndef Niter = 10\n\n// endless loop\ndef D(n) = n=range[1, 3, 1] and not D(_) and continue\ndef D(n) = D(n) and n!=max[D] and continue\n\n// logging\ndef Log = 0,0 : not Log(_, _) // dummy initialization\ndef Log(n, D_size) =\n n = max[first[Log]]+1 and\n D_size = sum[count[D]; 0] and\n continue\n\ndef Log(x, y) = Log(x, y) and x > 0 // don't remove entries\n\n// termination condition\ndef continue = count[Log] < Niter",
"id": "36",
"type": "install",
"inputs": []
},
{
"source": "Note, we also modified the relation a bit and we insert `1;2;3` instead of `1;2;...;10` from the original version above.\nLet's have look if `Log` really contains the size/cardinality of relation `D` after each iteration:",
"id": "37",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "Log",
"id": "38",
"type": "query",
"inputs": []
},
{
"source": "This confirms that `D` starts with 3 elements and decreases to 0 (aka empty), at which point the first recursive rule holds again and `D` is again initialized with `1;2;3`.\nThis cycle repeats forever, except that in our case we terminate the iteration when the relation `continue` turns false, which is the case when our logging relation `Log` contains at least `Niter=10` entries.\n\n\n## Common Pitfalls\n\nProbably the **single most common problem** users will face when using recursion is **non-convergent computation**.\n\nThere are two main types of non-convergence:\n\n- **Unbounded domain**: Recursive computations where there is no fixpoint or a fixpoint that takes infinitely many iterations to reach.\n [PageRank](#asymptotically-converging-series) is an example where the fixpoint lies at infinity.\n The [Fibonacci Sequence](#single-recursion) and calculating the [Length of All Paths in a Cyclical Graph](#length-of-all-paths-in-a-cyclical-graph) are examples of computation with no fixpoint.\n\n This requires restricting the domain to make sure that the recursive computation terminates after a finite number of iterations.\n\n (This will be addressed in the future with the *demand transformation* feature ---\n see section [Running to Infinity](#running-to-infinity-bad).\n\n- **Oscillations**: No fixpoints exist even though the output domain is finite. This can occur if the recursive computation gets stuck in a cycle.\n This situation is much harder to spot and can be quite subtle as the discussion in the [MST example](#prims-minimum-spanning-tree) above demonstrated.\n In the future, the query evaluator will be able to detect this behavior and report a failure after a while, when we repeat a solution we have seen before.\n (See section [Stuck in a Cycle](#stuck-in-a-cycle)).\n\n\n### Running to Infinity\n\n#### Factorial\n\nHow can one end up with an infinite loop?\nOne obvious way is, of course, when one forgets to restrict the output domain in a recursive rule that explores a larger and larger domain space as the recursive computation progresses.\nAn example is the factorial, $n!$.\nImagine that we write the factorial but don't state at which $n$ to stop:",
"id": "39",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def F[0] = 1\ndef F[n] = n * F[n-1]",
"id": "40",
"type": "query",
"inputs": []
},
{
"source": "Since Rel computes fixpoints in a *bottom-up* fashion, deriving new facts from existing ones,\nthe iteration will go on forever, as $n$ grows larger and larger.\nIn the future, this issue will be solved with *on-demand* evaluation,\nwhich only performs computations needed when a specific set of values are requested.\n\n#### Length of All Paths in a Cyclical Graph\n\nAlso in graph problems, we can run in to infinite recursion even though the graph itself is finite.\nFor cyclical graphs, for instance, calculating the lengths of all paths will not terminate and will run forever because the cyclical nature of the graph ensures that one can always find a path of arbitrary length by passing through the loop (that exists in the graph) multiple times.\n\nTo demonstrate that point, let's construct a minimal graph with 3 vertices\nand one loop between nodes `1` and `2`.",
"id": "41",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"name": "path_length_example",
"source": "def cyclical_graph[:vertex] = {1; 2; 3}\ndef cyclical_graph[:edge] = {(1, 2); (2, 1); (2, 3)}\n\n// path_length(start_node, end_node, distance)\ndef path_length = cyclical_graph[:edge], 1 // base case\ndef path_length(a, c, len) =\n path_length(a, b, len1) and\n path_length(b, c, len2) and\n len = len1 + len2 and\n len <= 4 // termination to avoid infinite iteration\n from b, len1, len2",
"id": "42",
"type": "install",
"inputs": []
},
{
"source": "To avoid an endless loop, we insert the condition `len <= 4` such that only paths with length not longer than 4 are included.\nInspecting the relation `path_length`\nshows that between each node pair we have multiple path lengths increasing in steps of 2, which is the length of the loop in our cyclical graph:",
"id": "43",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "path_length",
"id": "44",
"type": "query",
"inputs": []
},
{
"source": "### Stuck in a Cycle\n\nThere exists another way that one can get stuck in an infinite recursive computation.\nElements can be added *and* removed during the iteration.\nHence, it is possible to end up in a *cycle* that never ends.\n\nLet's take the example in section [Recursive Rules that Eliminate Elements](#recursive-rules-that-eliminate-elements) where we start with the elements 1 to 10 and remove iteratively the largest one but only if the largest element is larger than 3. If we take this condition away,",
"id": "45",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def D = n : n=range[1, 10, 1] and not D(_)\ndef D(n) = D(n) and n != max[D]",
"id": "46",
"type": "query",
"inputs": []
},
{
"source": "we will end up in a endless cycle.\nWhy?\nBecause once we removed the last element from `D`, we are again fulfilling the initial condition again used to populate the relation in the first place.\nNow, we are back at the beginning and the cycle repeats itself forever.\nSuch a recursive relation has no fixpoint and will never terminate.\n\nAs mentioned above, this problem will be solved in the future and Rel will be able to recognize that a solution has been already visited but the recursive computation has not converged yet.\n\n## Behind the Scenes: Recursion and Fixpoints\n\nIn this section, we discuss in more detail how recursiv rules are evaluated in Rel.\nThe content discussed here is not needed to write successful recursive relations.\nIt might be, however, very helpful for writing complex recursive algorithms and understanding why Rel returns what it does.\n\nAt a high level, Rel solves the recursive relations $R$ by starting at step 0 with the base case(s), $R_0$, which may be the empty set (in case no explicit base case is defined).\nThe relation $R_0$ represents the starting point of our recursive computation.\nIf $f$ stands for a function mapping relation $S$ to relation $T$, then we can express the converged relation $R$ in the following way:\n\n$R = f(R_{N-1}) = (f \\circ f)(R_{N-2}) = \\underbrace{(f \\circ \\ldots \\circ f)}_{N}(R_{0})$,\n\nwhere after $N$ iterative applications of $f$ we have converged to a final result by reaching the fixpoint, $f(R) = R$, and the iteration stops as a repetitive application of $f$ doesn't change the content of $R$.\n\nRel solves the recursive computations in a bottom-up fashion where we start with the base case(s) and then apply iteratively the recursive rule until we reach a fixpoint.\nHence, it is currently difficult to implement a top-down algorithm.\nMany (naive) implementations of [Divide-and-conquer algorithms](https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm) (e.g.: [mergesort](https://en.wikipedia.org/wiki/Merge_sort), [FFT](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm), and [Tower of Hanoi](https://en.wikipedia.org/wiki/Tower_of_Hanoi)) are solved in a top-down fashion.\n\nBesides the [common pitfalls](#common-pitfalls) discussed above,\nwe also need to be aware of scenarios where we might have multiple fixpoints.\n\n### Multiple Fixpoints\n\nA function $f$ can have multiple fixpoints, that is, more than one value of $x$ where $f(x)=x \\enspace.$\nThis is easy to see with mathematical functions, such as polynomials.\nTake, for instance, the function $f(x) = -2(x-1)^2+2 \\enspace.$\nIt has the fixpoints $x=0$ and $x=\\frac{3}{2}$, which are the two points where the function intersects the $y=x$ line.\nWhether iterative methods can find these values may depend on the initial conditions chosen.\n\n\nThis applies to relations in general, and we can come across it in Rel, particularly when using negation,\nas we now describe.\n\n#### Negations and Cycles\n\nConsider the following example, featuring recursion and negation:",
"id": "47",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def man = \"Dilbert\"\ndef single(x) = man(x) and not husband(x)\ndef husband(x) = man(x) and not single(x)\ndef output:single = single\ndef output:husband = husband",
"id": "48",
"type": "query",
"inputs": []
},
{
"source": "We have two different least fixpoints:\n\n- Dilbert is single and not a husband: `single(Dilbert) and not husband(Dilbert)`\n- Dilbert is a husband and not single: `husband(Dilbert) and not single(Dilbert)`\n\nThey are *incomparable*: neither is smaller than the other.\nRel computes one of them, but it can be argued that they are both equally valid.\n\nNote that `single` depends on `husband`, which in turn depends on `single` again, so we have a\ndependency cycle, with intervening negations.\n\nModels with a cyclical definition that invoves negation are called *non-stratifiable*, and should be avoided when possible.\nIn this case, they lead to multiple least fixpoints. In others, the cycles can lead to non-termination,\nwhere Rel will throw an exception.\n\n\n\nIn general,**negations and aggregations should be used with caution in recursive definitions**,\nsince they can result in non-monotonic recursion and unintended behaviors.\n\n### Symmetry-Induced Fixpoints\n\nSymmetry is a fundamental concept in mathematics and physics,\nand is relevant to properties like [the law of conservation of energy](https://en.wikipedia.org/wiki/Conservation_of_energy).\nSymmetries in the problem definition are a common cause for multiple fixpoints.\nThe multiple fixpoints in the [Dilbert example](#negations-and-cycles) above come from\nthe symmetry between the definitions of `husband` and `single`.\n\nIn more complex situations the underlying symmetry might be not that easy to spot.\nFor example:",
"id": "49",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
},
{
"source": "def circle = 1, 0\ndef circle = -y, x from x, y where circle(x, y)",
"id": "50",
"type": "query",
"inputs": []
},
{
"source": "We will reach the fixpoint after four iterations.\nWhy? Because we will be back at $(1,0)$ after visiting the points $(0,1), (-1,0)$, and $(0,-1)$.\n\nThe reason for this cyclical behavior is that we have encoded a rotational symmetry in our problem.\nWith each iteration we perform a $90^\\circ$ rotation starting at $(1,0)$.\n\nHence, once we visited all the points of the rotational group $C_4$ (modulo the starting point)\nwe will have reached the fixpoint and no new elements will be added to `circle`.\n\nThis toy example shows how the underlying symmetry in our algorithm enforces that we stay within a restricted domain and all elements in our relation are elements of that domain.\nOnly symmetry-breaking operations will be able to escape this domain.\n\nIf the existence of the symmetry is unwanted, or the developer is unaware, this restriction may lead to unintended behaviors since not the entire argument domain will be explored.\nHowever, this behavior is often beneficial, because this restrictions can be exploited and can lead to significant speed-ups in calculations without loss of generality.",
"id": "51",
"isCodeFolded": true,
"type": "markdown",
"inputs": []
}
],
"metadata": {
"notebookFormatVersion": "0.0.1"
}
}