Data Visualization: Graphviz
This howto guide demonstrates how to create graphical representations of data with Rel and Graphviz.
Goal
This howto guide shows you how to visualize your graph data in Rel using GraphViz and the RAI notebook.
Introduction
In this howto guide, learn how to use Graphviz from within Rel in order to visualize graph data. The code presented here can be easily adapted to different kinds of chart types and graphs.
Creating a chart using Rel and Graphviz is very simple.
All you have to do is use the appropriate chart type,
specifically graphviz
, and apply it to some data, which in Rel take the form of a relation, representing a graph.
// query
// define graph relation
module graph
def node = {"A"; "B"; "C"; "D"}
def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
end
// plot it
def output = graphviz[graph]
The following sections discuss in more detail how to define and use graph data from different sources as well as how to configure and plot different graphs over the data.
Note that the code presented in this howto guide produces graphical output in the RelationalAI notebook. Everywhere else, for example, RelationalAI SDK, relations are returned in the standard, nongraphical form.
Graph Options
Consider the different options available for plotting graphs with Graphviz.
At a high level, graphviz
needs a module graph
as shown in the previous section.
As a minimum, graph
contains information on the nodes (the graph:node
relation) and the edges (the graph:edge
relation).
The default output is a directed graph containing the given nodes and edges.
To create an undirected graph, you need to specify additional options to graphviz
, again, through the graph
module.
For example, here is the same graph data plotted as an undirected graph:
// query
// define graph relation
module graph
def node = {"A"; "B"; "C"; "D"}
def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
def directed = boolean_false
end
// plot it
def output = graphviz[graph]
Graphviz provides a variety of parameters used to configure the graph visualization.
This section gives an overview of the parameters that can be added to the graph
module,
followed by a set of examples for drawing different graphs.
Key  Description 

:node  A unary relation of node_id identifiers (usually strings or integers) that represent the node IDs. 
:edge  A binary relation of edges, represented as (from , to ) pairs of node IDs. from and to need to match an identifier in the :node relation. 
:node_attribute  A ternary relation of node attributes as (node_id , attribute , value ), where node_id matches an identifier in the :node relation. 
:edge_attribute  An arity4 relation of edge attributes as (from , to , attribute , value ), where (from , to ) matches an identifier pair in the :edge relation. 
:attribute:graph  A binary relation of graph attributes as (attribute , value ). These attributes serve as the default for the graph/subgraphs. 
:attribute:node  A binary relation of node attributes as (attribute , value ). These attributes serve as the default for all nodes in the graph. 
:attribute:edge  A binary relation of edge attributes as (attribute , value ). These attributes serve as the default for all edges in the graph. 
:directed  A boolean (boolean_true or boolean_false ) indicating whether the graph is directed or undirected. The default value is boolean_true , i.e., directed. 
:clustered  A boolean (boolean_true or boolean_false ) indicating whether the subgraphs in the graph should have their ID prepended with cluster_ so that the subgraphs are each rendered in a bounding box. The default is boolean_true , i.e., clustered. 
:subgraph  A relation with an identifier followed by a graph definition (:node , :edge , :attribute , :node_attribute , :edge_attribute ). There are the following exceptions:

:layout  A string indicating the layout engine to use. Valid options are:

For the full list of supported attributes and values, see the Graphviz Attribute Documentation.
Feature Examples
Node Width, Font, and Color
The first example creates a small graph of five nodes and varies the size and color of the nodes.
More specifically, the width of the nodes increases with the ID of the node.
EvenID nodes are painted blue and oddID nodes are painted red.
Finally, the font size of the red nodes is increased to 40
and their font changed to "Comic Sans MS"
.
// query
module graph
def node = range[1,5,1]
def edge(a in node, b in node) = b = a + 1
def node_attribute[n in graph:node, "width"] = n
def node_attribute[n in graph:node, "color"] = "blue", n%2 = 0
def node_attribute[n in graph:node, "color"] = "red", n%2 = 1
def node_attribute[n in graph:node, "fontsize"] = "40", n%2 = 1
def node_attribute[n in graph:node, "fontname"] = "Comic Sans MS", n%2 = 1
end
def output = graphviz[graph]
Different Layout Engines
The next example shows how the same graph can be displayed using different engines.
You can begin with an example using the default (i.e., "dot"
) engine for an artificial graph with 20
nodes:
// query
module graph
def node = n : range(1,20,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
end
def output = graphviz[graph]
You can now plot the same graph in a radial format:
// query
module graph
def node = n : range(1,20,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
def layout = "circo"
end
def output = graphviz[graph]
Subgraphs
The subgraphs
feature of Graphviz can also be useful when you want to
highlight subsets of the graph.
Here is an example of a graph with ten nodes that separates out the nodes with odd IDs
into their own subgraph, enclosed by a box:
// query
module oddsubgraph
def node = n : range(1,10,1,n) and n % 2 = 1
end
module biggraph
def node = n : range(1,10,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
def subgraph = "myoddsubgraph", oddsubgraph
end
def output = graphviz[biggraph]
Clustered Subgraphs
The subgraphs
feature also allows you to display clustered (that is, hierarchically grouped) graphs.
The next example defines two small graphs, with a parent relation between them,
and then specifies that they are part of a larger graph:
// query
module subgraph1
def node = n : range(10,16,1,n)
def edge(a in node, b in node) = a % b = 3 or a % b = 5
end
module subgraph2
def node = n : range(2,6,1,n)
def edge(a in node, b in node) = a % b = 2 or a % b = 4
def parent = "subgraph1"
end
module combinedgraph
def node = n : range(6,11,1,n)
def edge(a in node, b in node) = a % b = 2 or a % b = 3
def subgraph = "subgraph1", subgraph1
def subgraph = "subgraph2", subgraph2
end
def output = graphviz[combinedgraph]
Test Case: Visualizing PageRank
The next example computes and visualizes the PageRank values of a graph. At a high level, PageRank captures the popularity of a set of interlinked Web pages, which can be represented as a graph. This guide doesn’t go into the details of implementing PageRank in Rel. You can implement PageRank using a standard matrix multiplication approach. For an alternative recursive implementation of PageRank, see the Asymptotic Convergence: PageRank section of the Rel Recursion Concept Guide.
You can begin by defining the graph that you will later compute PageRank for:
// install
module graph
def edge = {
("books.com", "chess.com"); ("chess.com", "books.com");
("chess.com", "chess.com");
("docker.com", "amazon.com"); ("docker.com", "books.com");
("edmunds.com", "f.org"); ("f.org", "edmunds.com");
("edmunds.com", "books.com"); ("edmunds.com", "docker.com");
("f.org", "books.com");
("groupon.com", "books.com"); ("groupon.com", "edmunds.com");
("health.gov", "books.com"); ("health.gov", "edmunds.com");
("itunes.com", "books.com"); ("itunes.com", "edmunds.com");
("jobs.com", "edmunds.com"); ("kbb.com", "edmunds.com");
}
def node = x : edge(x, _) or edge(_, x)
end
// query
graphviz[graph]
PageRank assigns “importance” from one page to other pages that it points to, and it does so recursively until convergence. In this approach, however, Web pages that don’t have outgoing edges (called sink nodes) are problematic and are therefore considered to point to all other pages. You can add these additional edges to your graph from nodes that don’t have any sink nodes. You can draw the additional edges in gray.
PageRank also assigns “importance” to the outgoing pages of a given Web page, where the “importance” is proportional to the number of outgoing links.
For example, if a Web page has two outgoing links, each of the two Web pages it points to will get 0.5
of the “importance”.
In addition to the new edges, you can display on each edge the fraction of the “importance” that will be going from the Web page to the other pages it points to:
// install
module graph_nosink
def node = graph:node
def edge = graph:edge
def sink(n in node) = empty(graph:edge[n])
def edge(a in node, b in node) = sink(a)
def edge_attribute[a in sink, b in node , "color"] = "gray"
def edge_attribute[
a in graph_nosink:node,
b in graph_nosink:edge[a],
"label"
] = decimal[64, 3, 1.0 / count[graph_nosink:edge[a]]]
end
// query
graphviz[graph_nosink]
Note that the layout has changed. This is because Graphviz is attempting to lay out the graph as best as possible so that everything is clearly visible.
As a next step, you can compute the PageRank of the nodes – Web pages — in your graph:
// install
// helper relations on vectors
@inline def norm[V] = sqrt[sum[i in domain[V]: V[i]^2]]
@inline def mult[M, V][j] = sum[M[i,j] * V[i] for i]
@inline def vdiff[V1, V2][x] = V1[x]  V2[x]
@inline def vdistance[V1,V2] = norm[vdiff[V1,V2]]
// setup: definition of graph and parameters
def edge = graph_nosink:edge
def node = graph_nosink:node
def numnodes = count[node]
def damping = 0.85
def outdegree[n] = count[edge[n]]
def k = 64 // number of iterations
// matrix formulation
def prmatrix[a in node, b in node] =
damping/n + (1damping)/numnodes
from n where edge(a,b) and outdegree[a] = n
def prmatrix[a in node, b in node] =
(1.0damping)/numnodes, not edge(a,b)
def rank[0, i in node] = 1.0/numnodes
def rank[iteration in range[1, k, 1]] = mult[prmatrix, rank[iteration1]]
def pagerank = rank[k]
You can now take a look at the computed rank values and check that the rank values sum up to 1.0
(or close to it):
// query
pagerank
// query
sum[pagerank]
You can now visualize the PageRank results for your original graph. As a first example, you can change the size of the node according to its PageRank, i.e., the higher the value the larger the size of the node:
// install
module graph_PR_size
def node = graph:node
def edge = graph:edge
def node_attribute[n in graph_PR_size:node, "width"] = 10*pagerank[n]
def node_attribute[n in graph_PR_size:node, "height"] = 10*pagerank[n]
def node_attribute[n in graph_PR_size:node, "fontsize"] = 30
end
// query
def output = graphviz[graph_PR_size]
Since all ranks are less than 1.0
, the example above multiplied them by 10
in order to have useful numbers for sizes.
As a second example, you can paint each node with a different shade depending on its PageRank.
You can use darker blue for nodes with higher values and lighter hues of blue for nodes with lower values.
For this, you can use the blue9 Graphviz color scheme.
This color scheme has nine shades of blue.
To paint the nodes of your graph accordingly, you can sort them based on their PageRank values and then scale their ranks in the range 1
to 9
.
Finally, for better visibility, you can change the font of the nodes to white (i.e., 1
in the blue9 color scheme) for the darker hues:
// install
module graph_PR_color
def num_of_colors = 9
def node = graph:node
def edge = graph:edge
def hues = p, i : sort[transpose[pagerank]](i, _, p)
def attribute:node = {"style", "filled"; "colorscheme", "blues9";}
def node_attribute[n in graph_PR_color:node, "fillcolor"] =
floor[
1+(num_of_colors1)*(graph_PR_color:hues[n]1)/(numnodes  1)
]
def node_attribute[
n in (m : graph_PR_color:hues[m] > num_of_colors/2), "fontcolor"
] = 1
end
// query
def output = graphviz[graph_PR_color]
Handling Large Graphs
There are limits on how useful a Graphviz rendering of a very large graph can be. In cases like this, you can use Rel to extract a subgraph of interest and render that instead.
For example:
// install
module largegraph
def node = range[1,10000,1]
def edge(x in node, y in node) = x != y, y % x = 0
end
module smallsubset
def node = 150; 200
def edge = largegraph:edge :> node
end
Rel’s suffix_join
operator, :>
, is useful to restrict the set of edges to those originating in the chosen
subset of nodes.
The first graph has 100,000 nodes and 83,668 edges; the second is much smaller.
// query
def output = graphviz[smallsubset]
Summary
This guide has shown how to generate graph visualizations with Rel. Rel, combined with Graphviz, provides a powerful and convenient tool for visualizing and exploring data. More information on Rel can be found in the Rel Language Reference. More information on the different types of Graphviz charts and their parameters can be found in the Graphviz documentation.
See Also
To make charts, Rel also provides a relational interface to the VegaLite library, described in the VegaLite Gallery HowTo Guide.
If you need to import data into Rel, see the CSV Import Guide and the JSON Import and Export Guide.