Graphviz
This how-to guide demonstrates how to create graphical representations of data with Rel and Graphviz.
Goal
This how-to guide shows you how to visualize your graph data in Rel using GraphViz and RAI worksheets.
Introduction
In this how-to guide, learn how to use Graphviz (opens in a new tab) from within Rel in order to visualize graph data. The code presented here can be easily adapted to different kinds of chart types and graphs.
Creating a chart using Rel and Graphviz (opens in a new tab) is very simple.
All you have to do is use the appropriate chart type,
specifically graphviz
, and apply it to some data, which in Rel take the form of a relation, representing a graph.
// read query
// Define the `graph` relation.
module graph
def node = {"A"; "B"; "C"; "D"}
def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
end
// Plot it.
def output = ::std::display::graphviz[graph]
The following sections discuss in more detail how to define and use graph data from different sources as well as how to configure and plot different graphs over the data.
Note that the code presented in this how-to guide produces graphical output for the RAI Console worksheets. Everywhere else, for example, the RelationalAI SDKs, relations are returned in the standard, non-graphical form.
Graph Options
Consider the different options available for plotting graphs with Graphviz (opens in a new tab).
At a high level, graphviz
needs a module graph
as shown in the previous section.
As a minimum, graph
contains information on the nodes (the graph:node
relation) and the edges (the graph:edge
relation).
The default output is a directed graph containing the given nodes and edges.
To create an undirected graph, you need to specify additional options to graphviz
, again, through the graph
module.
For example, here is the same graph data plotted as an undirected graph:
// read query
// Define the `graph` relation.
module graph
def node = {"A"; "B"; "C"; "D"}
def edge = {("A", "B") ; ("B", "C"); ("C", "D"); ("A", "D")}
def directed = boolean_false
end
// Plot it.
def output = ::std::display::graphviz[graph]
Graphviz (opens in a new tab) provides a variety of parameters used to configure the graph visualization.
This section gives an overview of the parameters that can be added to the graph
module,
followed by a set of examples for drawing different graphs.
Key | Description |
---|---|
:node | A unary relation of node_id identifiers (usually strings or integers) that represent the node IDs. |
:edge | A binary relation of edges, represented as (from , to ) pairs of node IDs. from and to need to match an identifier in the :node relation. |
:node_attribute | A ternary relation of node attributes as (node_id , attribute , value ), where node_id matches an identifier in the :node relation. |
:edge_attribute | An arity-4 relation of edge attributes as (from , to , attribute , value ), where (from , to ) matches an identifier pair in the :edge relation. |
:attribute:graph | A binary relation of graph attributes as (attribute , value ). These attributes serve as the default for the graph/subgraphs. |
:attribute:node | A binary relation of node attributes as (attribute , value ). These attributes serve as the default for all nodes in the graph. |
:attribute:edge | A binary relation of edge attributes as (attribute , value ). These attributes serve as the default for all edges in the graph. |
:directed | A boolean (boolean_true or boolean_false ) indicating whether the graph is directed or undirected. The default value is boolean_true , i.e., directed. |
:clustered | A boolean (boolean_true or boolean_false ) indicating whether the subgraphs in the graph should have their ID prepended with cluster_ so that the subgraphs are each rendered in a bounding box. The default is boolean_true , i.e., clustered. |
:subgraph | A relation with an identifier followed by a graph definition (:node , :edge , :attribute , :node_attribute , :edge_attribute ). There are the following exceptions:
|
:layout | A string indicating the layout engine (opens in a new tab) to use. Valid options are:
|
For the full list of supported attributes and values, see the Graphviz Attribute Documentation (opens in a new tab).
Feature Examples
Node Width, Font, and Color
The first example creates a small graph of five nodes and varies the size and color of the nodes.
More specifically, the width of the nodes increases with the ID of the node.
Even-ID nodes are painted blue and odd-ID nodes are painted red.
Finally, the font size of the red nodes is increased to 40
and their font changed to "Comic Sans MS"
.
// read query
module graph
def node = range[1,5,1]
def edge(a in node, b in node) = b = a + 1
def node_attribute[n in graph:node, "width"] = n
def node_attribute[n in graph:node, "color"] = "blue", n%2 = 0
def node_attribute[n in graph:node, "color"] = "red", n%2 = 1
def node_attribute[n in graph:node, "fontsize"] = "40", n%2 = 1
def node_attribute[n in graph:node, "fontname"] = "Comic Sans MS", n%2 = 1
end
def output = ::std::display::graphviz[graph]
Different Layout Engines
The next example shows how the same graph can be displayed using different engines.
You can begin with an example using the default (i.e., "dot"
) engine for an artificial graph with 20
nodes:
// read query
module graph
def node = n : range(1,20,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
end
def output = ::std::display::graphviz[graph]
You can now plot the same graph in a radial format:
// read query
module graph
def node = n : range(1,20,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
def layout = "circo"
end
def output = ::std::display::graphviz[graph]
Subgraphs
The subgraphs
feature of Graphviz (opens in a new tab) can also be useful when you want to
highlight subsets of the graph.
Here is an example of a graph with ten nodes that separates out the nodes with odd IDs
into their own subgraph, enclosed by a box:
// read query
module oddsubgraph
def node = n : range(1,10,1,n) and n % 2 = 1
end
module biggraph
def node = n : range(1,10,1,n)
def edge(a in node, b in node) = a%b = 0 and a!=b
def subgraph = "myoddsubgraph", oddsubgraph
end
def output = ::std::display::graphviz[biggraph]
Clustered Subgraphs
The subgraphs
feature also allows you to display clustered (that is, hierarchically grouped) graphs.
The next example defines two small graphs, with a parent relation between them,
and then specifies that they are part of a larger graph:
// read query
module subgraph1
def node = n : range(10,16,1,n)
def edge(a in node, b in node) = a % b = 3 or a % b = 5
end
module subgraph2
def node = n : range(2,6,1,n)
def edge(a in node, b in node) = a % b = 2 or a % b = 4
def parent = "subgraph1"
end
module combinedgraph
def node = n : range(6,11,1,n)
def edge(a in node, b in node) = a % b = 2 or a % b = 3
def subgraph = "subgraph1", subgraph1
def subgraph = "subgraph2", subgraph2
end
def output = ::std::display::graphviz[combinedgraph]
Test Case: Visualizing PageRank
The next example computes and visualizes the PageRank (opens in a new tab) values of a graph. At a high level, PageRank captures the popularity of a set of interlinked Web pages, which can be represented as a graph. This guide doesn’t go into the details of implementing PageRank in Rel. You can implement PageRank using a standard matrix multiplication approach. For an alternative recursive implementation of PageRank, see Asymptotic Convergence: PageRank.
You can begin by defining the graph that you will later compute PageRank for:
// model
module graph
def edge = {
("books.com", "chess.com"); ("chess.com", "books.com");
("chess.com", "chess.com");
("docker.com", "amazon.com"); ("docker.com", "books.com");
("edmunds.com", "f.org"); ("f.org", "edmunds.com");
("edmunds.com", "books.com"); ("edmunds.com", "docker.com");
("f.org", "books.com");
("groupon.com", "books.com"); ("groupon.com", "edmunds.com");
("health.gov", "books.com"); ("health.gov", "edmunds.com");
("itunes.com", "books.com"); ("itunes.com", "edmunds.com");
("jobs.com", "edmunds.com"); ("kbb.com", "edmunds.com");
}
def node = x : edge(x, _) or edge(_, x)
end
// read query
::std::display::graphviz[graph]
PageRank assigns “importance” from one page to other pages that it points to, and it does so recursively until convergence. In this approach, however, Web pages that don’t have outgoing edges (called sink nodes) are problematic and are therefore considered to point to all other pages. You can add these additional edges to your graph from nodes that don’t have any sink nodes. You can draw the additional edges in gray.
PageRank also assigns “importance” to the outgoing pages of a given Web page, where the “importance” is proportional to the number of outgoing links.
For example, if a Web page has two outgoing links, each of the two Web pages it points to will get 0.5
of the “importance”.
In addition to the new edges, you can display on each edge the fraction of the “importance” that will be going from the Web page to the other pages it points to:
// model
module graph_nosink
def node = graph:node
def edge = graph:edge
def sink(n in node) = empty(graph:edge[n])
def edge(a in node, b in node) = sink(a)
def edge_attribute[a in sink, b in node , "color"] = "gray"
def edge_attribute[
a in graph_nosink:node,
b in graph_nosink:edge[a],
"label"
] = decimal[64, 3, 1.0 / count[graph_nosink:edge[a]]]
end
// read query
::std::display::graphviz[graph_nosink]
Note that the layout has changed. This is because Graphviz (opens in a new tab) is attempting to lay out the graph as best as possible so that everything is clearly visible.
As a next step, you can compute the PageRank of the nodes – Web pages — in your graph:
// model
// Helper relations on vectors
@inline def norm[V] = sqrt[sum[i in domain[V]: V[i]^2]]
@inline def mult[M, V][j] = sum[M[i,j] * V[i] for i]
@inline def vdiff[V1, V2][x] = V1[x] - V2[x]
@inline def vdistance[V1,V2] = norm[vdiff[V1,V2]]
// Setup: definition of graph and parameters
def edge = graph_nosink:edge
def node = graph_nosink:node
def numnodes = count[node]
def damping = 0.85
def outdegree[n] = count[edge[n]]
def k = 64 // Number of iterations
// Matrix formulation
def prmatrix[a in node, b in node] =
damping/n + (1-damping)/numnodes
from n where edge(a,b) and outdegree[a] = n
def prmatrix[a in node, b in node] =
(1.0-damping)/numnodes, not edge(a,b)
def rank[0, i in node] = 1.0/numnodes
def rank[iteration in range[1, k, 1]] = mult[prmatrix, rank[iteration-1]]
def pagerank = rank[k]
You can now take a look at the computed rank values and check that the rank values sum up to 1.0
(or close to it):
// read query
pagerank
// read query
sum[pagerank]
You can now visualize the PageRank results for your original graph. As a first example, you can change the size of the node according to its PageRank, i.e., the higher the value the larger the size of the node:
// model
module graph_PR_size
def node = graph:node
def edge = graph:edge
def node_attribute[n in graph_PR_size:node, "width"] = 10*pagerank[n]
def node_attribute[n in graph_PR_size:node, "height"] = 10*pagerank[n]
def node_attribute[n in graph_PR_size:node, "fontsize"] = 30
end
// read query
def output = ::std::display::graphviz[graph_PR_size]
Since all ranks are less than 1.0
, the example above multiplied them by 10
in order to have useful numbers for sizes.
As a second example, you can paint each node with a different shade depending on its PageRank.
You can use darker blue for nodes with higher values and lighter hues of blue for nodes with lower values.
For this, you can use the blue9 Graphviz color scheme (opens in a new tab).
This color scheme has nine shades of blue.
To paint the nodes of your graph accordingly, you can sort them based on their PageRank values and then scale their ranks in the range 1
to 9
.
Finally, for better visibility, you can change the font of the nodes to white (i.e., 1
in the blue9 color scheme (opens in a new tab)) for the darker hues:
// model
module graph_PR_color
def num_of_colors = 9
def node = graph:node
def edge = graph:edge
def hues = p, i : sort[transpose[pagerank]](i, _, p)
def attribute:node = {"style", "filled"; "colorscheme", "blues9";}
def node_attribute[n in graph_PR_color:node, "fillcolor"] =
floor[
1+(num_of_colors-1)*(graph_PR_color:hues[n]-1)/(numnodes - 1)
]
def node_attribute[
n in (m : graph_PR_color:hues[m] > num_of_colors/2), "fontcolor"
] = 1
end
// read query
def output = ::std::display::graphviz[graph_PR_color]
Handling Large Graphs
There are limits on how useful a Graphviz rendering of a very large graph can be. In cases like this, you can use Rel to extract a subgraph of interest and render that instead.
For example:
// model
module largegraph
def node = range[1,10000,1]
def edge(x in node, y in node) = x != y, y % x = 0
end
module smallsubset
def node = 150; 200
def edge = largegraph:edge :> node
end
Rel’s suffix_join
operator, :>
, is useful to restrict the set of edges to those originating in the chosen
subset of nodes.
The first graph has 100,000 nodes and 83,668 edges; the second is much smaller.
// read query
def output = ::std::display::graphviz[smallsubset]
Summary
This guide has shown how to generate graph visualizations with Rel. Rel, combined with Graphviz, provides a powerful and convenient tool for visualizing and exploring data. More information on Rel can be found in the Rel Language Reference. More information on the different types of Graphviz charts and their parameters can be found in the Graphviz documentation (opens in a new tab).
See Also
To make charts, Rel also provides a relational interface to the Vega-Lite (opens in a new tab) library, described in the Vega-Lite Gallery how-to guide.
If you need to import data into Rel, see the CSV Import and the JSON Import guides.