• REL
• HOW-TO GUIDES
• Data Visualization: Graphviz

Data Visualization: Graphviz

This how-to guide demonstrates how to create graphical representations of data with Rel and Graphviz.

Goal

This how-to guide shows you how to visualize your graph data in Rel using GraphViz and the RAI notebook.

Introduction

In this how-to guide, learn how to use Graphviz from within Rel in order to visualize graph data. The code presented here can be easily adapted to different kinds of chart types and graphs.

Creating a chart using Rel and Graphviz is very simple. All you have to do is use the appropriate chart type, specifically graphviz, and apply it to some data, which in Rel take the form of a relation, representing a graph.

// query

// define graph relation
module graph
def node = {"A"; "B"; "C"; "D"}
def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
end

// plot it
def output = graphviz[graph]

The following sections discuss in more detail how to define and use graph data from different sources as well as how to configure and plot different graphs over the data.

Note that the code presented in this how-to guide produces graphical output in the RelationalAI notebook. Everywhere else, for example, RelationalAI SDK, relations are returned in the standard, non-graphical form.

Graph Options

Consider the different options available for plotting graphs with Graphviz.

At a high level, graphviz needs a module graph as shown in the previous section. As a minimum, graph contains information on the nodes (the graph:node relation) and the edges (the graph:edge relation). The default output is a directed graph containing the given nodes and edges. To create an undirected graph, you need to specify additional options to graphviz, again, through the graph module. For example, here is the same graph data plotted as an undirected graph:

// query

// define graph relation
module graph
def node = {"A"; "B"; "C"; "D"}
def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
def directed = boolean_false
end

// plot it
def output = graphviz[graph]

Graphviz provides a variety of parameters used to configure the graph visualization. This section gives an overview of the parameters that can be added to the graph module, followed by a set of examples for drawing different graphs.

KeyDescription
:nodeA unary relation of node_id identifiers (usually strings or integers) that represent the node IDs.
:edgeA binary relation of edges, represented as (from, to) pairs of node IDs. from and to need to match an identifier in the :node relation.
:node_attributeA ternary relation of node attributes as (node_id, attribute, value), where node_id matches an identifier in the :node relation.
:edge_attributeAn arity-4 relation of edge attributes as (from, to, attribute, value), where (from, to) matches an identifier pair in the :edge relation.
:attribute:graphA binary relation of graph attributes as (attribute, value). These attributes serve as the default for the graph/subgraphs.
:attribute:nodeA binary relation of node attributes as (attribute, value). These attributes serve as the default for all nodes in the graph.
:attribute:edgeA binary relation of edge attributes as (attribute, value). These attributes serve as the default for all edges in the graph.
:directedA boolean (boolean_true or boolean_false) indicating whether the graph is directed or undirected. The default value is boolean_true, i.e., directed.
:clusteredA boolean (boolean_true or boolean_false) indicating whether the subgraphs in the graph should have their ID prepended with cluster_ so that the subgraphs are each rendered in a bounding box. The default is boolean_true, i.e., clustered.
:subgraphA relation with an identifier followed by a graph definition (:node, :edge, :attribute, :node_attribute, :edge_attribute). There are the following exceptions:
• :directed is not supported because it applies to the whole graph and not a subgraph.
• :clustered is not supported because it applies to the whole graph and not a subgraph. If you want to only have a subset of the subgraphs be clusters, you should set the root key :clustered to boolean_false and prepend "cluster_" to the subgraph ID that you would like to be clusters.
• :subgraph cannot be recursively defined.
• :parent can be added only to subgraphs with the value set to the ID of another subgraph. This indicates that this subgraph should be nested within that parent subgraph.
:layoutA string indicating the layout engine to use. Valid options are:
• "dot": The dot engine creates hierarchical or layered directed graphs. The engine attempts to create groups without edge overlapping. This is the default engine.
• "neato": The neato engine creates spring model layouts.
• "twopi": The twopi engine creates radial layouts with nodes placed in concentric circles.
• "circo": The circo engine creates a circular layout. This engine is useful for graphs with cyclic structures, such as telecommunication networks.
• "fdp": The fdp engine creates spring model layouts similar to "neato".
• "osage": The osage engine draws clustered graphs.
• "patchwork": The patchwork engine draws clustered graphs using a squarified treemap layout.
• "sfdp": The sfdp engine is a multiscale version of "fdp" for large graphs.

For the full list of supported attributes and values, see the Graphviz Attribute Documentation.

Feature Examples

Node Width, Font, and Color

The first example creates a small graph of five nodes and varies the size and color of the nodes. More specifically, the width of the nodes increases with the ID of the node. Even-ID nodes are painted blue and odd-ID nodes are painted red. Finally, the font size of the red nodes is increased to 40 and their font changed to "Comic Sans MS".

// query

module graph
def node = range[1,5,1]
def edge(a in node, b in node) = b = a + 1
def node_attribute[n in graph:node, "width"] = n
def node_attribute[n in graph:node, "color"] = "blue", n%2 = 0
def node_attribute[n in graph:node, "color"] = "red", n%2 = 1
def node_attribute[n in graph:node, "fontsize"] = "40", n%2 = 1
def node_attribute[n in graph:node, "fontname"] = "Comic Sans MS", n%2 = 1
end

def output = graphviz[graph]

Different Layout Engines

The next example shows how the same graph can be displayed using different engines. You can begin with an example using the default (i.e., "dot") engine for an artificial graph with 20 nodes:

// query

module graph
def node = n : range(1,20,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
end

def output = graphviz[graph]

You can now plot the same graph in a radial format:

// query

module graph
def node = n : range(1,20,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
def layout = "circo"
end

def output = graphviz[graph]

Subgraphs

The subgraphs feature of Graphviz can also be useful when you want to highlight subsets of the graph. Here is an example of a graph with ten nodes that separates out the nodes with odd IDs into their own subgraph, enclosed by a box:

// query

module oddsubgraph
def node = n : range(1,10,1,n) and n % 2 = 1
end

module biggraph
def node = n : range(1,10,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
def subgraph = "myoddsubgraph", oddsubgraph
end

def output = graphviz[biggraph]

Clustered Subgraphs

The subgraphs feature also allows you to display clustered (that is, hierarchically grouped) graphs. The next example defines two small graphs, with a parent relation between them, and then specifies that they are part of a larger graph:

// query

module subgraph1
def node = n : range(10,16,1,n)
def edge(a in node, b in node) = a % b = 3 or a % b = 5
end

module subgraph2
def node = n : range(2,6,1,n)
def edge(a in node, b in node) = a % b = 2 or a % b = 4
def parent = "subgraph1"
end

module combinedgraph
def node = n : range(6,11,1,n)
def edge(a in node, b in node) = a % b = 2 or a % b = 3
def subgraph = "subgraph1", subgraph1
def subgraph = "subgraph2", subgraph2
end

def output = graphviz[combinedgraph]

Test Case: Visualizing PageRank

The next example computes and visualizes the PageRank values of a graph. At a high level, PageRank captures the popularity of a set of interlinked Web pages, which can be represented as a graph. This guide doesn’t go into the details of implementing PageRank in Rel. You can implement PageRank using a standard matrix multiplication approach. For an alternative recursive implementation of PageRank, see the Asymptotic Convergence: PageRank section of the Rel Recursion Concept Guide.

You can begin by defining the graph that you will later compute PageRank for:

// install

module graph
def edge = {
("books.com", "chess.com"); ("chess.com", "books.com");
("chess.com", "chess.com");
("docker.com", "amazon.com"); ("docker.com", "books.com");
("edmunds.com", "f.org"); ("f.org", "edmunds.com");
("edmunds.com", "books.com"); ("edmunds.com", "docker.com");
("f.org", "books.com");
("groupon.com", "books.com"); ("groupon.com", "edmunds.com");
("health.gov", "books.com"); ("health.gov", "edmunds.com");
("itunes.com", "books.com"); ("itunes.com", "edmunds.com");
("jobs.com", "edmunds.com"); ("kbb.com", "edmunds.com");
}

def node = x : edge(x, _) or edge(_, x)
end
// query

graphviz[graph]

PageRank assigns “importance” from one page to other pages that it points to, and it does so recursively until convergence. In this approach, however, Web pages that don’t have outgoing edges (called sink nodes) are problematic and are therefore considered to point to all other pages. You can add these additional edges to your graph from nodes that don’t have any sink nodes. You can draw the additional edges in gray.

PageRank also assigns “importance” to the outgoing pages of a given Web page, where the “importance” is proportional to the number of outgoing links. For example, if a Web page has two outgoing links, each of the two Web pages it points to will get 0.5 of the “importance”. In addition to the new edges, you can display on each edge the fraction of the “importance” that will be going from the Web page to the other pages it points to:

// install

module graph_nosink
def node = graph:node
def edge = graph:edge
def sink(n in node) = empty(graph:edge[n])
def edge(a in node, b in node) = sink(a)
def edge_attribute[a in sink, b in node , "color"] = "gray"
def edge_attribute[
a in graph_nosink:node,
b in graph_nosink:edge[a],
"label"
] = decimal[64, 3, 1.0 / count[graph_nosink:edge[a]]]
end
// query

graphviz[graph_nosink]

Note that the layout has changed. This is because Graphviz is attempting to lay out the graph as best as possible so that everything is clearly visible.

As a next step, you can compute the PageRank of the nodes – Web pages — in your graph:

// install

// helper relations on vectors
@inline def norm[V] = sqrt[sum[i in domain[V]: V[i]^2]]
@inline def mult[M, V][j] = sum[M[i,j] * V[i] for i]
@inline def vdiff[V1, V2][x] = V1[x] - V2[x]
@inline def vdistance[V1,V2] = norm[vdiff[V1,V2]]

// setup: definition of graph and parameters
def edge = graph_nosink:edge
def node = graph_nosink:node
def numnodes = count[node]
def damping = 0.85
def outdegree[n] = count[edge[n]]
def k = 64	// number of iterations

// matrix formulation
def prmatrix[a in node, b in node] =
damping/n + (1-damping)/numnodes
from n where edge(a,b) and outdegree[a] = n
def prmatrix[a in node, b in node] =
(1.0-damping)/numnodes, not edge(a,b)

def rank[0, i in node] = 1.0/numnodes
def rank[iteration in range[1, k, 1]] = mult[prmatrix, rank[iteration-1]]
def pagerank = rank[k]

You can now take a look at the computed rank values and check that the rank values sum up to 1.0 (or close to it):

// query

pagerank
// query

sum[pagerank]

You can now visualize the PageRank results for your original graph. As a first example, you can change the size of the node according to its PageRank, i.e., the higher the value the larger the size of the node:

// install

module graph_PR_size
def node = graph:node
def edge = graph:edge

def node_attribute[n in graph_PR_size:node, "width"] = 10*pagerank[n]
def node_attribute[n in graph_PR_size:node, "height"] = 10*pagerank[n]
def node_attribute[n in graph_PR_size:node, "fontsize"] = 30
end
// query

def output = graphviz[graph_PR_size]

Since all ranks are less than 1.0, the example above multiplied them by 10 in order to have useful numbers for sizes.

As a second example, you can paint each node with a different shade depending on its PageRank. You can use darker blue for nodes with higher values and lighter hues of blue for nodes with lower values. For this, you can use the blue9 Graphviz color scheme. This color scheme has nine shades of blue. To paint the nodes of your graph accordingly, you can sort them based on their PageRank values and then scale their ranks in the range 1 to 9. Finally, for better visibility, you can change the font of the nodes to white (i.e., 1 in the blue9 color scheme) for the darker hues:

// install

module graph_PR_color
def num_of_colors = 9
def node = graph:node
def edge = graph:edge
def hues = p, i : sort[transpose[pagerank]](i, _, p)
def attribute:node = {"style", "filled"; "colorscheme", "blues9";}
def node_attribute[n in graph_PR_color:node, "fillcolor"] =
floor[
1+(num_of_colors-1)*(graph_PR_color:hues[n]-1)/(numnodes - 1)
]
def node_attribute[
n in (m : graph_PR_color:hues[m] > num_of_colors/2), "fontcolor"
] = 1
end
// query

def output = graphviz[graph_PR_color]

Handling Large Graphs

There are limits on how useful a Graphviz rendering of a very large graph can be. In cases like this, you can use Rel to extract a subgraph of interest and render that instead.

For example:

// install

module largegraph
def node = range[1,10000,1]
def edge(x in node, y in node) = x != y, y % x = 0
end

module smallsubset
def node = 150; 200
def edge = largegraph:edge :> node
end

Rel’s suffix_join operator, :>, is useful to restrict the set of edges to those originating in the chosen subset of nodes.

The first graph has 100,000 nodes and 83,668 edges; the second is much smaller.

// query

def output = graphviz[smallsubset]

Summary

This guide has shown how to generate graph visualizations with Rel. Rel, combined with Graphviz, provides a powerful and convenient tool for visualizing and exploring data. More information on Rel can be found in the Rel Language Reference. More information on the different types of Graphviz charts and their parameters can be found in the Graphviz documentation.