# Working With Algorithms

## Goal

This guide demonstrates how to execute graph algorithms and work with their results using RelationalAI’s SQL Graph Library for Snowflake.

## Introduction

After you create a graph, you may execute algorithms on that graph using the various functions described in the Overview of Graph Algorithms. This guide explains how to:

You can find several examples of the patterns in practice in the Examples section, including how to:

## Executing Algorithms

When you call an algorithm function from RAI’s SQL Graph Library, all of the computation happens in RAI — not in Snowflake. The function blocks until the results of the algorithm are returned to Snowflake from RAI.

This section describes the concepts associated with running graph algorithms in RAI from Snowflake.

### RAI Engines

Executing algorithms on a graph requires a RAI engine. You can either set the engine in a RAI context with the `use_rai_engine` procedure, or provide the name of the engine to the function’s `arguments` parameter:

``````-- Set the RAI context.
CALL RAI.use_rai_database('my_rai_db');
CALL RAI.use_rai_engine('my_rai_engine');

-- Create an undirected graph with three nodes and two edges.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Get the number of nodes in 'my_graph'.
-- The engine 'my_rai_engine' from the RAI context is used,
-- since no engine name is explicitly provided.
SELECT RAI.num_nodes('my_graph')
/*+---+
| 3 |
+---+  */

-- Get the number of edges in 'my_graph' using an engine that is different
-- from the engine set in the RAI context. The engine name 'my_other_rai_engine'
-- is explicitly provided to the function.
SELECT RAI.num_edges('my_graph', {'rai_engine': 'my_other_rai_engine'});
/*+---+
| 2 |
+---+  */``````

If no engine with the provided name exists, an error is returned and the algorithm is not executed. See Required Versus Optional Arguments for more information on function arguments.

### Scalar Versus Tabular Functions

Some algorithms return a scalar value — such as `min_degree`, which returns the minimum degree of a graph. Other functions return a table — such as `degree`, which returns the degree of each node. Tabular functions must be called with `SELECT * FROM TABLE(...)`, as in the following example:

``````-- Create an undirected graph with three nodes and two edges.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Compute the minimum degree of 'my_graph'.
-- The return value is a scalar.
SELECT RAI.min_degree('my_graph');
/*+---+
| 1 |
+---+  */

-- Find the degree of each node in 'my_graph'.
-- The return value is a table.
SELECT * FROM TABLE(RAI.degree('my_graph'));
/*+------+--------+
| NODE | DEGREE |
|------+--------+
| 1    | 1      |
| 2    | 2      |
| 3    | 1      |
+------+--------+ */``````

The table returned by tabular functions has named columns to help identify the meaning of each row. For example, the table returned by the `degree` function has a `node` column and a `degree` column, since each row pairs a node with its degree. See Results Schema for more information.

### Required Versus Optional Arguments

Every algorithm function has two parameters:

1. `graph_name`: the name of the graph on which to execute the algorithm.
2. `arguments`: a JSON object containing additional arguments.

The `graph_name` is always required. Some functions, like the `min_degree` and `degree` functions in the preceding section’s example, do not require an `arguments` object. Other algorithms, such as `shortest_path_length`, do require `arguments` to be provided:

``````-- Create an undirected graph with three nodes and two edges.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Compute the distance between node 1 and every other node in the graph.
-- shortest_path_length requires a 'source' node to be specified in the
-- arguments object.
SELECT * FROM TABLE(RAI.shortest_path_length('my_graph', {'source': 1}));
/*+--------+--------+--------+
| SOURCE | TARGET | LENGTH |
|--------+--------+--------|
| 1      | 1      | 0      |
| 1      | 2      | 1      |
| 1      | 3      | 2      |
+--------+--------+--------+ */``````

If you call a function that requires an `arguments` object with only the graph name, Snowflake displays an error:

``````-- The required 'source' node is missing, so an error is returned.
SELECT * FROM TABLE(RAI.shortest_path_length('my_graph'));
/* +----------------------------------------------------------------+
| Invalid argument types for function 'RAI.SHORTEST_PATH_LENGTH' |
+----------------------------------------------------------------+ */``````
🔎

Required arguments for each function are documented in the SQL Library Reference. You may access each algorithm’s reference documentation by clicking the function name in the Overview of Graph Algorithms.

### Array Arguments

In many cases, arguments that accept a node — such as the `shortest_path_length` function’s `source` argument, or the `degree` function’s `node` argument — can also accept an array of nodes. For instance, when an array of two or more nodes is passed to the `degree` function’s `node` argument, the degrees of each node in the array are returned:

``````-- Create an undirected graph with three nodes and two edges.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Get the degree of node 2 in 'my_graph'.
-- A single node is provided to the 'node' argument.
SELECT * FROM TABLE(RAI.degree('my_graph'), {'node': 2})
/*+------+--------+
| NODE | DEGREE |
|------+--------+
| 2    | 2      |
+------+--------+ */

-- Get the degree of nodes 1 and 3 in 'my_graph'.
-- Both nodes are provided to the 'node' argument in an array.
SELECT * FROM TABLE(RAI.degree('my_graph'), {'node': [1, 3]});
/*+------+--------+
| NODE | DEGREE |
|------+--------+
| 1    | 1      |
| 3    | 1      |
+------+--------+ */``````

Some functions, such as the `shortest_path_length` function, can accept multiple array arguments. For example, you may call `shortest_path_length` with arrays provided to both its `source` and `target` arguments:

``````-- Compute the length of a shortest path in 'my_graph' starting at either
-- node 1 or node 2 and ending at either node 3 or node 4.
SELECT * FROM TABLE(RAI.shortest_path_length('my_graph', {'source': [1, 2], 'target': [3, 4]}));
/*+--------+--------+--------+
| SOURCE | TARGET | LENGTH |
|--------+--------+--------|
| 1      | 3      | 2      |
| 1      | 4      | 3      |
| 2      | 3      | 1      |
| 2      | 4      | 2      |
+--------+--------+--------+ */``````

When multiple arrays are passed to two or more arguments, rows containing all possible node combinations are returned. For example, in the preceding statement, the array `[1, 2]` is passed to the `source` argument and the array `[3, 4]` is passed to the `target` argument. The table contains rows for all possible `(source, target)` pairs containing nodes from the arrays: `(1, 3)`, `(1, 4)`, `(2, 3)`, and `(2, 4)`.

Many functions accept a `tuples` argument that can be used to compute the desired value for specific node combinations. For example, to compute the shortest path length between the `(source, target)` pairs `(1, 3)` and `(2, 3)`, you need to pass the array `[[1, 3], [2, 3]]` to the `tuples` argument:

``````-- Compute the shortest path length in 'my_graph' between nodes 1 and 3
-- and nodes 2 and 3 using the 'tuples' argument.
SELECT * FROM TABLE(RAI.shortest_path_length('my_graph', {'tuples': [[1, 3], [2, 3]]]}));
/*+--------+--------+--------+
| SOURCE | TARGET | LENGTH |
|--------+--------+--------|
| 1      | 3      | 2      |
| 2      | 3      | 1      |
+--------+--------+--------+ */``````

Arguments that accept arrays are documented in the SQL Library Reference. See the Overview of Graph Algorithms to access each function’s reference documentation.

Algorithms may be executed as a Snowflake task (opens in a new tab). As such, you may schedule algorithms to run on specific dates and times or repeat with predetermined intervals. In the following example, the `pagerank` algorithm is executed every five minutes and the results are stored in a Snowflake table:

``````-- Create a task.
WAREHOUSE = my_sf_warehouse
SCHEDULE = '5 MINUTE'
AS
BEGIN
SELECT * FROM TABLE(RAI.pagerank('my_graph', {
'rai_engine': 'my_rai_engine',
'result_table': '<my_sf_db>.<my_sf_schema>.pagerank_results'
}));
END;

-- After five minutes, the results are available.
SELECT * FROM pagerank_results;``````

When executing an algorithm in a task, you must:

• Specify the Snowflake warehouse that executes the task.

• Provide an engine name to the algorithm’s `rai_engine` argument, even if you have previously set a RAI context since the task cannot access the current session variables when it is executed.

• Provide the fully qualified (opens in a new tab) name of a Snowflake table in which the results of the algorithm will be stored to the algorithm’s `result_table` argument.

### Limitations

Executing algorithms takes time and, for large enough graphs, the execution time may exceed statement timeouts set for your Snowflake account. If you encounter timeouts, contact your Snowflake administrator to request an increase to your timeout limits.

## Working With Results

Once an algorithm has run and the results are in Snowflake, you’ll want to do something with those results. This section describes concepts associated with using the results of algorithms run in RAI.

### Results Schema

Tabular functions, such as `degree`, return tables with named columns. For instance, `degree` returns a table with a `node` column and a `degree` column:

``````-- Create an undirected graph with three nodes and two edges.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Find the degree of each node in 'my_graph'.
-- The return value is a table with two columns named NODE and DEGREE.
SELECT * FROM TABLE(RAI.degree('my_graph'));
/*+------+--------+
| NODE | DEGREE |
|------+--------+
| 1    | 1      |
| 2    | 2      |
| 3    | 1      |
+------+--------+ */``````
🔎

The schema for each algorithm’s return table is available in the Overview of Graph Algorithms.

### Saving Results in Snowflake

There are two ways to store the results of an algorithm in a Snowflake table:

#### `CREATE_TABLE`

You can execute an algorithm and create a table from the results using the `CREATE TABLE` (opens in a new tab) command:

``````-- Compute the degree of each node in 'my_graph' and store the results
-- in a Snowflake table named 'my_results_table'. Rather than using
-- the `result_table` argument, the table is created directly.
CREATE OR REPLACE TABLE my_result_table(node INT, degree INT) AS (
SELECT * FROM RAI.degree('my_graph')
);

-- The result table has the correct schema.
SELECT * FROM my_result_table;
/*+------+--------+
| NODE | DEGREE |
|------+--------+
| 1    | 1      |
| 2    | 2      |
| 3    | 1      |
+------+--------+ */``````

The schema of `my_result_table` should match the algorithm’s result schema. In the preceding example, this means that the table should have a `node` and `degree` column, both of which are integers, since that is the schema of the table returned by the `degree` function.

#### `result_table`

You may execute an algorithm and store the results in a Snowflake table by providing the table’s fully qualified (opens in a new tab) name to the algorithm’s `result_table` argument:

``````-- Compute the degree of each node in 'my_graph' and store the results
-- in a Snowflake table named 'my_result_table'. Note that the function
-- returns an empty table when `result_table` is set.
SELECT * FROM TABLE(RAI.degree(
'my_graph', {'result_table': '<my_sf_db>.<my_sf_schema>.my_result_table'}
));
/*+------+--------+
| NODE | DEGREE |
|------+--------+
| Empty table.  |
+------+--------+ */

-- Note that the columns of the result table are given the generic names
-- COL1 and COL2, instead of NODE and DEGREE.
SELECT * FROM my_result_table;
/*+------+------+
| COL1 | COL2 |
|------+------+
| 1    | 1    |
| 2    | 2    |
| 3    | 1    |
+------+------+ */``````

When you send the results of an algorithm to a Snowflake table using the `result_table` argument, the provided table is either created if it does not already exist, or is replaced with the new results. The columns of the table are given generic names, such as `COL1` and `COL2`.

All data sent to a table using the `result_table` argument are converted to text.

The `result_table` argument is necessary for storing results of an algorithm executed in a scheduled task.

The table generated with `result_table` loses the algorithm’s schema and is not well suited for storing intermediate results in multi-step workloads. Using the `CREATE TABLE` command is a better option.

Storing the results returned by an algorithm in a Snowflake table allows you to work with the results in other queries. In particular, you may create multi-step graph workloads that build off of previous results.

For example, the following snippet computes the PageRank of the neighbors of a given node:

``````-- Create an undirected graph with three nodes and two edges.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Find the neighbors of node 2 in 'my_graph' and store the results
-- in a table called neighbor_results.
CREATE OR REPLACE TABLE neighbor_results(node1 INT, node2 INT) AS (
SELECT * FROM TABLE(RAI.neighbor('my_graph', {'node': 2}))
)

-- Compute the PageRank of the neighbors of node 2. The array of
-- neighbors is built from the neighbor_results table.
SELECT * FROM TABLE(RAI.pagerank('my_graph', {
'node': (SELECT array_agg(node2) FROM neighbor_results)
}));
/*+------+--------------+
| NODE | VALUE        |
|------+--------------+
| 1    | 0.2567563672 |
| 3    | 0.2567563672 |
+------+--------------+ */

-- Remove the intermidate results table.
DROP TABLE neighbor_results;``````

In the preceding example, the array provided to the `pagerank` function’s `node` argument is prepared by selecting the `neighbor_results` table’s `node2` column and using Snowflake’s built-in `array_agg` (opens in a new tab) function to build an array.

## Examples

This section illustrates the concepts described in Executing Algorithms and Working With Results using several examples of common scenarios.

### Plot a Degree Histogram

Viewing a graph’s degree histogram shows you how the degrees are distributed among nodes and may provide some insight into the structure of the graph.

You can count the number of nodes with each degree in a graph using the `degree_histogram` function:

``````USE ROLE my_sf_role;
USE WAREHOUSE my_sf_warehouse;
USE DATABASE my_sf_database;

CREATE OR REPLACE SCHEMA degree_histogram;
USE SCHEMA degree_histogram;

-- Set the RAI context.
CALL use_rai_engine('my_rai_engine');
CALL use_rai_database('my_rai_db');

-- Create an undirected graph.
CREATE OR REPLACE TABLE my_edges(node1 INT, node2 INT) AS (
SELECT * FROM VALUES
(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 3),
(2, 6), (3, 7), (4, 5), (4, 8), (5, 9)
);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Count the number of nodes with each degree.
SELECT * FROM TABLE(RAI.degree_histogram('my_graph'));
/*+--------+-------+
| DEGREE | COUNT |
|--------+-------+
| 1      | 3     |
| 2      | 1     |
| 3      | 4     |
| 5      | 1     |
+--------+-------+ */``````

To view a histogram, click Chart above the results table for your query. You can use the configuration options on the right-hand panel to view the results as a bar chart with bars from the `COUNT` column plotted against the `DEGREE` column on the X-Axis:

You may customize the histogram using the Snowflake UI, including adding buckets. See Visualizing Worksheet Data (opens in a new tab) in the Snowflake documentation for more details.

### Rank Nodes by Centrality

Centrality algorithms are used to measure a node’s importance or influence in a graph. The RAI SQL Graph Library implements several centrality algorithms, including:

This example uses the `eigenvector_centrality` function to sort a subset of nodes in the graph by importance. However, you can replace it with any of the other centrality functions. The example illustrates concepts such as array arguments and working with results schema.

``````USE ROLE my_sf_role;
USE WAREHOUSE my_sf_warehouse;
USE DATABASE my_sf_database;

CREATE OR REPLACE SCHEMA centrality_rank;
USE SCHEMA centrality_rank;

-- Set the RAI context.
CALL RAI.use_rai_database('my_rai_db');
CALL RAI.use_rai_engine('my_rai_engine');

-- Create an undirected graph with five nodes and five edges.
-- Visually, the graph looks like:  1 -- 3 -- 5
--                                  |    |
--                                  2 -- 4
CREATE OR REPLACE TABLE my_edges(node1 INT, node2 INT)
AS (
SELECT * FROM VALUES
(1, 2), (1, 3), (2, 4), (3, 4), (3, 5)
);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Sort nodes 1, 3, and 5 by their eigenvector centrality
-- from highest to lowest.
SELECT node
FROM TABLE(RAI.eigenvector_centrality(
'my_graph', {'node': [1, 3, 5]}
))
ORDER BY value DESC;
/*+------+
| NODE |
|------+
| 3    |
| 1    |
| 5    |
+------+ */``````

You can find more information about centrality algorithms in the Centrality section of the Overview of Graph Algorithms.

### Find Similar Nodes

The RAI SQL Graph Library implements several similarity algorithms, including:

This example uses the `jaccard_similarity` function to find the most similar node that is not a neighbor of a given node. However, you can replace it with any of the other similarity functions.

``````USE ROLE my_sf_role;
USE WAREHOUSE my_sf_warehouse;
USE DATABASE my_sf_database;

CREATE OR REPLACE SCHEMA similar_nodes;
USE SCHEMA similar_nodes;

-- Set the RAI context.
CALL RAI.use_rai_database('my_rai_db');
CALL RAI.use_rai_engine('my_rai_engine');

-- Create an undirected graph with five nodes and five edges.
-- Visually, the graph looks like:  1 -- 3 -- 5
--                                  |    |
--                                  2 -- 4
CREATE OR REPLACE TABLE my_edges(node1 INT, node2 INT) AS (
SELECT * FROM VALUES
(1, 2), (1, 3), (2, 4), (3, 4), (3, 5)
);
CALL RAI.create_data_stream('my_edges');
CALL RAI.create_graph('my_graph', 'my_edges');

-- Compute the most similar node to node 1 that is not already a neighbor of node 1.
SELECT jacc.node2
FROM
TABLE(RAI.jaccard_similarity('my_graph', {'node1': 1})) as jacc
WHERE
jacc.node2 NOT IN (
SELECT node2 FROM TABLE(RAI.neighbor('my_graph', {'node1': 1}))
)
AND jacc.node2 != 1
ORDER BY jacc.score DESC LIMIT 1;
/*+------------+
| JACC.NODE2 |
|------------+
| 4          |
+------------+ */``````