Overview
RelationalAI’s Integration for Snowflake contains functions for creating and managing graphs, as well as a SQL Graph Analytics Library with a wide range of functions for common graph analytics tasks.
Getting Started
You can create graphs in three steps:
-- Set a RAI context.
CALL RAI.use_rai_engine(<rai_engine_name>)
CALL RAI.use_rai_database(<rai_db_name>)
-- STEP 1: Create a Snowflake table (or view) that represents the graph's edges.
-- Note that node IDs must be integers.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
-- STEP 2: Create a data stream.
CALL RAI.create_data_stream('my_edges');
-- STEP 3: Create a graph. By default, `my_graph` is undirected.
CALL RAI.create_graph('my_graph', 'my_edges');
Both undirected and directed graphs are supported. See Managing Graphs for more information about creating, updating, and deleting graphs.
Typically, the edge table or view is created from existing Snowflake data that have been transformed into integer nodes using the Schema Mapping Library. See Creating Graphs for more details.
You can run algorithms on graphs by passing the graph’s name to a SQL function,
such as degree_histogram
, which returns a table containing
the number of nodes with each degree in a graph:
SELECT * FROM TABLE(RAI.degree_histogram('my_graph'));
/*+----------------+
| DEGREE | COUNT |
+--------+-------+
| 1 | 2 |
| 2 | 1 |
+--------+-------+ */
You can configure algorithms through various optional arguments. See Working With Algorithms for more information on calling algorithms and working with results.
Basic Operations
The following functions are used to find basic properties of graphs, such as counting the number of nodes and edges and finding neighbors and degrees of nodes:
Function | Description | Return Value |
---|---|---|
num_nodes | Returns the number of nodes in the graph. | INT |
num_edges | Returns the number of edges in the graph. | INT |
neighbor | Finds the neighbors of each node in the graph. | TABLE(node1 INT, node2 INT) |
degree | Finds the degree of each node in the graph. | TABLE(node INT, degree INT) |
min_degree | Returns the minimum degree of the graph. | INT |
max_degree | Returns the maximum degree of the graph. | INT |
average_degree | Returns the average degree of the graph. | FLOAT |
degree_histogram | Counts the number of nodes with each degree. | TABLE(degree INT, count INT) |
Paths
The Graph Library implements the following algorithms related to paths in graphs:
Function | Description | Return Value |
---|---|---|
shortest_path_length | Computes the distance (length of the shortest path) between nodes. | TABLE(source INT, target INT, length INT) |
transitive_closure | Computes the transitive closure of the edges in a graph. It may be used to determine which nodes are reachable from each other. | TABLE(source INT, target INT) |
Centrality
Centrality algorithms assign ranks to nodes, often for the purpose of measuring a node’s influence or importance in the graph. The Graph Library implements three centrality algorithms:
Function | Description | Return Value |
---|---|---|
eigenvector_centrality | Measures a node’s importance in such a way that connections to more important nodes contribute more to a node’s score than connections to less important nodes. | TABLE(node INT, value FLOAT) |
pagerank | Measures a node’s importance in a graph. pagerank is similar to eigenvector_centrality , but with an additional scaling factor. | TABLE(node INT, value FLOAT) |
degree_centrality | Measures a node’s importance based on its degree. Unlike pagerank and eigenvector_centrality , degree_centrality does not consider the importance of a node’s neighbors when ranking a node. | TABLE(node INT, value FLOAT) |
betweenness_centrlity | Measure a node’s importance in a graph based on how many shortest paths go through it. | TABLE(node INT, value FLOAT) |
Similarity
Similarity algorithms are used to cluster nodes and predict links between nodes. The Graph Library implements a number of algorithms related to similarity:
Function | Description | Return Value |
---|---|---|
jaccard_similarity | Measures the similarity of two nodes based on the number of neighbors common to both nodes. Values range from 0 to 1, inclusive. | TABLE(node1 INT, node2 INT, score FLOAT) |
cosine_similarity | Measures the similarity of two nodes as a function of the angle between vector representations of their neighborhoods. Values range from -1 to 1, inclusive. | TABLE(node1 INT, node2 INT, score FLOAT) |
preferential_attachment | Computes the “closeness” of two nodes u and v as the number of neighbors of u times the number of neighbors of v . Higher scores indicate that two nodes are “closer” than lower scores. | TABLE(node1 INT, node2 INT, score INT) |
adamic_adar | Computes the “closeness” of two nodes by computing the inverse logarithmic sum of the degrees of neighbors common to both nodes. Higher scores indicate that two nodes are “closer” than lower scores. | TABLE(node1 INT, node2 INT, score FLOAT) |
common_neighbor | Finds common neighbors of nodes in a graph. | TABLE(node1 INT, node2 INT, node3 INT) |
Community
These algorithms are used to determine how nodes are clustered in a graph:
Function | Description | Return Value |
---|---|---|
is_connected | Computes whether or not a graph is connected. | BOOLEAN |
weakly_connected_component | Computes the weakly connected components of a graph. | TABLE(node INT, component INT) |
triangle_community | Finds K-clique communities (with K=3 ) using the percolation method. | TABLE(node INT, community INT) |
label_propagation | Find communities using the label propagation algorithm. | TABLE(node INT, community INT) |
unique_triangle | Computes triples of nodes, unique up to order, that form a triangle in the graph. Use unique_triangle to find unique triangles containing a given node or pair of nodes. | TABLE(node1 INT, node2 INT, node3 INT) |
num_triangles | Computes the number of unique triangles in a graph. | INT |
triangle_count | Computes the number of unique triangles each node belongs to. | TABLE(node INT, count INT) |
diameter_range | Estimates the diameter of a graph by giving a minimum and maximum bound. | OBJECT |
local_clustering_coefficient | Measures how close the neighbors of a node are to forming a clique. | TABLE(node INT, value FLOAT) |