Skip to content

Overview

RelationalAI’s Integration for Snowflake contains functions for creating and managing graphs, as well as a SQL Graph Analytics Library with a wide range of functions for common graph analytics tasks.

Getting Started

You can create graphs in three steps:

-- Set a RAI context.
CALL RAI.use_rai_engine(<rai_engine_name>)
CALL RAI.use_rai_database(<rai_db_name>)
 
-- STEP 1: Create a Snowflake table (or view) that represents the graph's edges.
-- Note that node IDs must be integers.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
 
-- STEP 2: Create a data stream.
CALL RAI.create_data_stream('my_edges');
 
-- STEP 3: Create a graph. By default, `my_graph` is undirected.
CALL RAI.create_graph('my_graph', 'my_edges');

Both undirected and directed graphs are supported. See Managing Graphs for more information about creating, updating, and deleting graphs.

💡

Typically, the edge table or view is created from existing Snowflake data that have been transformed into integer nodes using the Schema Mapping Library. See Creating Graphs for more details.

You can run algorithms on graphs by passing the graph’s name to a SQL function, such as degree_histogram, which returns a table containing the number of nodes with each degree in a graph:

SELECT * FROM TABLE(RAI.degree_histogram('my_graph'));
/*+----------------+
  | DEGREE | COUNT |
  +--------+-------+
  | 1      | 2     |
  | 2      | 1     |
  +--------+-------+  */

You can configure algorithms through various optional arguments. See Working With Algorithms for more information on calling algorithms and working with results.

Basic Operations

The following functions are used to find basic properties of graphs, such as counting the number of nodes and edges and finding neighbors and degrees of nodes:

FunctionDescriptionReturn Value
num_nodesReturns the number of nodes in the graph.INT
num_edgesReturns the number of edges in the graph.INT
neighborFinds the neighbors of each node in the graph.TABLE(node1 INT, node2 INT)
degreeFinds the degree of each node in the graph.TABLE(node INT, degree INT)
min_degreeReturns the minimum degree of the graph.INT
max_degreeReturns the maximum degree of the graph.INT
average_degreeReturns the average degree of the graph.FLOAT
degree_histogramCounts the number of nodes with each degree.TABLE(degree INT, count INT)

Paths

The Graph Library implements the following algorithms related to paths in graphs:

FunctionDescriptionReturn Value
shortest_path_lengthComputes the distance (length of the shortest path) between nodes.TABLE(source INT, target INT, length INT)
transitive_closureComputes the transitive closure of the edges in a graph. It may be used to determine which nodes are reachable from each other.TABLE(source INT, target INT)

Centrality

Centrality algorithms assign ranks to nodes, often for the purpose of measuring a node’s influence or importance in the graph. The Graph Library implements three centrality algorithms:

FunctionDescriptionReturn Value
eigenvector_centralityMeasures a node’s importance in such a way that connections to more important nodes contribute more to a node’s score than connections to less important nodes.TABLE(node INT, value FLOAT)
pagerankMeasures a node’s importance in a graph. pagerank is similar to eigenvector_centrality, but with an additional scaling factor.TABLE(node INT, value FLOAT)
degree_centralityMeasures a node’s importance based on its degree. Unlike pagerank and eigenvector_centrality, degree_centrality does not consider the importance of a node’s neighbors when ranking a node.TABLE(node INT, value FLOAT)
betweenness_centrlityMeasure a node’s importance in a graph based on how many shortest paths go through it.TABLE(node INT, value FLOAT)

Similarity

Similarity algorithms are used to cluster nodes and predict links between nodes. The Graph Library implements a number of algorithms related to similarity:

FunctionDescriptionReturn Value
jaccard_similarityMeasures the similarity of two nodes based on the number of neighbors common to both nodes. Values range from 0 to 1, inclusive.TABLE(node1 INT, node2 INT, score FLOAT)
cosine_similarityMeasures the similarity of two nodes as a function of the angle between vector representations of their neighborhoods. Values range from -1 to 1, inclusive.TABLE(node1 INT, node2 INT, score FLOAT)
preferential_attachmentComputes the “closeness” of two nodes u and v as the number of neighbors of u times the number of neighbors of v. Higher scores indicate that two nodes are “closer” than lower scores.TABLE(node1 INT, node2 INT, score INT)
adamic_adarComputes the “closeness” of two nodes by computing the inverse logarithmic sum of the degrees of neighbors common to both nodes. Higher scores indicate that two nodes are “closer” than lower scores.TABLE(node1 INT, node2 INT, score FLOAT)
common_neighborFinds common neighbors of nodes in a graph.TABLE(node1 INT, node2 INT, node3 INT)

Community

These algorithms are used to determine how nodes are clustered in a graph:

FunctionDescriptionReturn Value
is_connectedComputes whether or not a graph is connected.BOOLEAN
weakly_connected_componentComputes the weakly connected components of a graph.TABLE(node INT, component INT)
triangle_communityFinds K-clique communities (with K=3) using the percolation method.TABLE(node INT, community INT)
label_propagationFind communities using the label propagation algorithm.TABLE(node INT, community INT)
unique_triangleComputes triples of nodes, unique up to order, that form a triangle in the graph. Use unique_triangle to find unique triangles containing a given node or pair of nodes.TABLE(node1 INT, node2 INT, node3 INT)
num_trianglesComputes the number of unique triangles in a graph.INT
triangle_countComputes the number of unique triangles each node belongs to.TABLE(node INT, count INT)
diameter_rangeEstimates the diameter of a graph by giving a minimum and maximum bound.OBJECT
local_clustering_coefficientMeasures how close the neighbors of a node are to forming a clique.TABLE(node INT, value FLOAT)
Was this doc helpful?