Graphviz

This how-to guide demonstrates how to create graphical representations of data with Rel and Graphviz.

Goal

This how-to guide shows you how to visualize your graph data in Rel using GraphViz and RAI worksheets.

Introduction

In this how-to guide, learn how to use Graphviz (opens in a new tab) from within Rel in order to visualize graph data. The code presented here can be easily adapted to different kinds of chart types and graphs.

Creating a chart using Rel and Graphviz (opens in a new tab) is very simple. All you have to do is use the appropriate chart type, specifically graphviz, and apply it to some data, which in Rel take the form of a relation, representing a graph.

// read query
 
// Define the `graph` relation.
module graph
    def node = {"A"; "B"; "C"; "D"}
    def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
end
 
// Plot it.
def output = ::std::display::graphviz[graph]

The following sections discuss in more detail how to define and use graph data from different sources as well as how to configure and plot different graphs over the data.

Note that the code presented in this how-to guide produces graphical output for the RAI Console worksheets. Everywhere else, for example, the RelationalAI SDKs, relations are returned in the standard, non-graphical form.

Graph Options

Consider the different options available for plotting graphs with Graphviz (opens in a new tab).

At a high level, graphviz needs a module graph as shown in the previous section. As a minimum, graph contains information on the nodes (the graph:node relation) and the edges (the graph:edge relation). The default output is a directed graph containing the given nodes and edges. To create an undirected graph, you need to specify additional options to graphviz, again, through the graph module. For example, here is the same graph data plotted as an undirected graph:

// read query
 
// Define the `graph` relation.
module graph
    def node = {"A"; "B"; "C"; "D"}
    def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
    def directed = boolean_false
end
 
// Plot it.
def output = ::std::display::graphviz[graph]

Graphviz (opens in a new tab) provides a variety of parameters used to configure the graph visualization. This section gives an overview of the parameters that can be added to the graph module, followed by a set of examples for drawing different graphs.

Key	Description
`:node`	A unary relation of `node_id` identifiers (usually strings or integers) that represent the node IDs.
`:edge`	A binary relation of edges, represented as (`from`, `to`) pairs of node IDs. `from` and `to` need to match an identifier in the `:node` relation.
`:node_attribute`	A ternary relation of node attributes as (`node_id`, `attribute`, `value`), where `node_id` matches an identifier in the `:node` relation.
`:edge_attribute`	An arity-4 relation of edge attributes as (`from`, `to`, `attribute`, `value`), where (`from`, `to`) matches an identifier pair in the `:edge` relation.
`:attribute:graph`	A binary relation of graph attributes as (`attribute`, `value`). These attributes serve as the default for the graph/subgraphs.
`:attribute:node`	A binary relation of node attributes as (`attribute`, `value`). These attributes serve as the default for all nodes in the graph.
`:attribute:edge`	A binary relation of edge attributes as (`attribute`, `value`). These attributes serve as the default for all edges in the graph.
`:directed`	A boolean (`boolean_true` or `boolean_false`) indicating whether the graph is directed or undirected. The default value is `boolean_true`, i.e., directed.
`:clustered`	A boolean (`boolean_true` or `boolean_false`) indicating whether the subgraphs in the graph should have their ID prepended with `cluster_` so that the subgraphs are each rendered in a bounding box. The default is `boolean_true`, i.e., clustered.
`:subgraph`	A relation with an identifier followed by a graph definition (`:node`, `:edge`, `:attribute`, `:node_attribute`, `:edge_attribute`). There are the following exceptions: `:directed` is not supported because it applies to the whole graph and not a subgraph. `:clustered` is not supported because it applies to the whole graph and not a subgraph. If you want to only have a subset of the subgraphs be clusters, you should set the root key `:clustered` to `boolean_false` and prepend `"cluster_"` to the subgraph ID that you would like to be clusters. `:subgraph` cannot be recursively defined. `:parent` can be added only to subgraphs with the value set to the ID of another subgraph. This indicates that this subgraph should be nested within that parent subgraph.
`:layout`	A string indicating the layout engine (opens in a new tab) to use. Valid options are: `"dot"`: The dot (opens in a new tab) engine creates hierarchical or layered directed graphs. The engine attempts to create groups without edge overlapping. This is the default engine. `"neato"`: The neato (opens in a new tab) engine creates spring model layouts. `"twopi"`: The twopi (opens in a new tab) engine creates radial layouts with nodes placed in concentric circles. `"circo"`: The circo (opens in a new tab) engine creates a circular layout. This engine is useful for graphs with cyclic structures, such as telecommunication networks. `"fdp"`: The fdp (opens in a new tab) engine creates spring model layouts similar to `"neato"`. `"osage"`: The osage (opens in a new tab) engine draws clustered graphs. `"patchwork"`: The patchwork (opens in a new tab) engine draws clustered graphs using a squarified treemap layout. `"sfdp"`: The sfdp (opens in a new tab) engine is a multiscale version of `"fdp"` for large graphs.

For the full list of supported attributes and values, see the Graphviz Attribute Documentation (opens in a new tab).

Feature Examples

Node Width, Font, and Color

The first example creates a small graph of five nodes and varies the size and color of the nodes. More specifically, the width of the nodes increases with the ID of the node. Even-ID nodes are painted blue and odd-ID nodes are painted red. Finally, the font size of the red nodes is increased to 40 and their font changed to "Comic Sans MS".

// read query
 
module graph
    def node = range[1,5,1]
    def edge(a in node, b in node) = b = a + 1
    def node_attribute[n in graph:node, "width"] = n
    def node_attribute[n in graph:node, "color"] = "blue", n%2 = 0
    def node_attribute[n in graph:node, "color"] = "red", n%2 = 1
    def node_attribute[n in graph:node, "fontsize"] = "40", n%2 = 1
    def node_attribute[n in graph:node, "fontname"] = "Comic Sans MS", n%2 = 1
end
 
def output = ::std::display::graphviz[graph]

Different Layout Engines

The next example shows how the same graph can be displayed using different engines. You can begin with an example using the default (i.e., "dot") engine for an artificial graph with 20 nodes:

// read query
 
module graph
    def node = n : range(1,20,1,n)
    def edge(a in node, b in node) = a%b = 0 and a!=b
end
 
def output = ::std::display::graphviz[graph]

You can now plot the same graph in a radial format:

// read query
 
module graph
    def node = n : range(1,20,1,n)
    def edge(a in node, b in node) = a%b = 0 and a!=b
    def layout = "circo"
end
 
def output = ::std::display::graphviz[graph]

Subgraphs

The subgraphs feature of Graphviz (opens in a new tab) can also be useful when you want to highlight subsets of the graph. Here is an example of a graph with ten nodes that separates out the nodes with odd IDs into their own subgraph, enclosed by a box:

// read query
 
module oddsubgraph
    def node = n : range(1,10,1,n) and n % 2 = 1
end
 
module biggraph
    def node = n : range(1,10,1,n)
    def edge(a in node, b in node) = a%b = 0 and a!=b
    def subgraph = "myoddsubgraph", oddsubgraph
end
 
def output = ::std::display::graphviz[biggraph]

Clustered Subgraphs

The subgraphs feature also allows you to display clustered (that is, hierarchically grouped) graphs. The next example defines two small graphs, with a parent relation between them, and then specifies that they are part of a larger graph:

// read query
 
module subgraph1
    def node = n : range(10,16,1,n)
    def edge(a in node, b in node) = a % b = 3 or a % b = 5
end
 
module subgraph2
    def node = n : range(2,6,1,n)
    def edge(a in node, b in node) = a % b = 2 or a % b = 4
    def parent = "subgraph1"
end
 
module combinedgraph
    def node = n : range(6,11,1,n)
    def edge(a in node, b in node) = a % b = 2 or a % b = 3
    def subgraph = "subgraph1", subgraph1
    def subgraph = "subgraph2", subgraph2
end
 
def output = ::std::display::graphviz[combinedgraph]

Test Case: Visualizing PageRank

The next example computes and visualizes the PageRank (opens in a new tab) values of a graph. At a high level, PageRank captures the popularity of a set of interlinked Web pages, which can be represented as a graph. This guide doesn’t go into the details of implementing PageRank in Rel. You can implement PageRank using a standard matrix multiplication approach. For an alternative recursive implementation of PageRank, see Asymptotic Convergence: PageRank.

You can begin by defining the graph that you will later compute PageRank for:

// model
 
module graph
    def edge = {
        ("books.com", "chess.com"); ("chess.com", "books.com");
        ("chess.com", "chess.com");
        ("docker.com", "amazon.com"); ("docker.com", "books.com");
        ("edmunds.com", "f.org"); ("f.org", "edmunds.com");
        ("edmunds.com", "books.com"); ("edmunds.com", "docker.com");
        ("f.org", "books.com");
        ("groupon.com", "books.com"); ("groupon.com", "edmunds.com");
        ("health.gov", "books.com"); ("health.gov", "edmunds.com");
        ("itunes.com", "books.com"); ("itunes.com", "edmunds.com");
        ("jobs.com", "edmunds.com"); ("kbb.com", "edmunds.com");
    }
 
    def node = x : edge(x, _) or edge(_, x)
end

// read query
 
::std::display::graphviz[graph]

PageRank assigns “importance” from one page to other pages that it points to, and it does so recursively until convergence. In this approach, however, Web pages that don’t have outgoing edges (called sink nodes) are problematic and are therefore considered to point to all other pages. You can add these additional edges to your graph from nodes that don’t have any sink nodes. You can draw the additional edges in gray.

PageRank also assigns “importance” to the outgoing pages of a given Web page, where the “importance” is proportional to the number of outgoing links. For example, if a Web page has two outgoing links, each of the two Web pages it points to will get 0.5 of the “importance”. In addition to the new edges, you can display on each edge the fraction of the “importance” that will be going from the Web page to the other pages it points to:

// model
 
module graph_nosink
    def node = graph:node
    def edge = graph:edge
    def sink(n in node) = empty(graph:edge[n])
    def edge(a in node, b in node) = sink(a)
    def edge_attribute[a in sink, b in node , "color"] = "gray"
    def edge_attribute[
        a in graph_nosink:node,
        b in graph_nosink:edge[a],
        "label"
    ] = decimal[64, 3, 1.0 / count[graph_nosink:edge[a]]]
end

// read query
 
::std::display::graphviz[graph_nosink]

Note that the layout has changed. This is because Graphviz (opens in a new tab) is attempting to lay out the graph as best as possible so that everything is clearly visible.

As a next step, you can compute the PageRank of the nodes – Web pages — in your graph:

// model
 
// Helper relations on vectors
@inline def norm[V] = sqrt[sum[i in domain[V]: V[i]^2]]
@inline def mult[M, V][j] = sum[M[i,j] * V[i] for i]
@inline def vdiff[V1, V2][x] = V1[x] - V2[x]
@inline def vdistance[V1,V2] = norm[vdiff[V1,V2]]
 
// Setup: definition of graph and parameters
def edge = graph_nosink:edge
def node = graph_nosink:node
def numnodes = count[node]
def damping = 0.85
def outdegree[n] = count[edge[n]]
def k = 64	// Number of iterations
 
// Matrix formulation
def prmatrix[a in node, b in node] =
    damping/n + (1-damping)/numnodes
    from n where edge(a,b) and outdegree[a] = n
def prmatrix[a in node, b in node] =
    (1.0-damping)/numnodes, not edge(a,b)
 
def rank[0, i in node] = 1.0/numnodes
def rank[iteration in range[1, k, 1]] = mult[prmatrix, rank[iteration-1]]
def pagerank = rank[k]

You can now take a look at the computed rank values and check that the rank values sum up to 1.0 (or close to it):

// read query
 
pagerank

// read query
 
sum[pagerank]

You can now visualize the PageRank results for your original graph. As a first example, you can change the size of the node according to its PageRank, i.e., the higher the value the larger the size of the node:

// model
 
module graph_PR_size
    def node = graph:node
    def edge = graph:edge
 
    def node_attribute[n in graph_PR_size:node, "width"] = 10*pagerank[n]
    def node_attribute[n in graph_PR_size:node, "height"] = 10*pagerank[n]
    def node_attribute[n in graph_PR_size:node, "fontsize"] = 30
end

// read query
 
def output = ::std::display::graphviz[graph_PR_size]

Since all ranks are less than 1.0, the example above multiplied them by 10 in order to have useful numbers for sizes.

As a second example, you can paint each node with a different shade depending on its PageRank. You can use darker blue for nodes with higher values and lighter hues of blue for nodes with lower values. For this, you can use the blue9 Graphviz color scheme (opens in a new tab). This color scheme has nine shades of blue. To paint the nodes of your graph accordingly, you can sort them based on their PageRank values and then scale their ranks in the range 1 to 9. Finally, for better visibility, you can change the font of the nodes to white (i.e., 1 in the blue9 color scheme (opens in a new tab)) for the darker hues:

// model
 
module graph_PR_color
    def num_of_colors = 9
    def node = graph:node
    def edge = graph:edge
    def hues = p, i : sort[transpose[pagerank]](i, _, p)
    def attribute:node = {"style", "filled"; "colorscheme", "blues9";}
    def node_attribute[n in graph_PR_color:node, "fillcolor"] =
        floor[
            1+(num_of_colors-1)*(graph_PR_color:hues[n]-1)/(numnodes - 1)
        ]
    def node_attribute[
        n in (m : graph_PR_color:hues[m] > num_of_colors/2), "fontcolor"
    ] = 1
end

// read query
 
def output = ::std::display::graphviz[graph_PR_color]

Handling Large Graphs

There are limits on how useful a Graphviz rendering of a very large graph can be. In cases like this, you can use Rel to extract a subgraph of interest and render that instead.

For example:

// model
 
module largegraph
    def node = range[1,10000,1]
    def edge(x in node, y in node) = x != y, y % x = 0
end
 
module smallsubset
    def node = 150; 200
    def edge = largegraph:edge :> node
end

Rel’s suffix_join operator, :>, is useful to restrict the set of edges to those originating in the chosen subset of nodes.

The first graph has 100,000 nodes and 83,668 edges; the second is much smaller.

// read query
 
def output = ::std::display::graphviz[smallsubset]

Summary

This guide has shown how to generate graph visualizations with Rel. Rel, combined with Graphviz, provides a powerful and convenient tool for visualizing and exploring data. More information on Rel can be found in the Rel Language Reference. More information on the different types of Graphviz charts and their parameters can be found in the Graphviz documentation (opens in a new tab).