Managing Graphs
Goal
This guide demonstrates how to create, view, update, and delete graphs using RelationalAI’s SQL Graph Library for Snowflake.
Introduction
Graphs have two components:
- A data stream, which sends data for the graph’s edges to RAI.
- A graph object, which is built from the data stream and can be used to perform graph analyses.
Although edge data streams and graphs are stored in a RAI database, you can create and manage graphs without leaving Snowflake. In this guide, you’ll learn how to:
Creating Graphs
There are three steps to creating a graph in RAI:
Prepare Snowflake Data
Before you create a graph, you must have a Snowflake table or view containing the edge data. The table must have two columns: each row represents an edge and each column contains an integer value representing a node in the graph.
Typically, edge views are created using RAI’s Schema Mapping Library to map entities in your Snowflake data to nodes that can be used to describe the graph’s edges:
USE DATABASE my_snowflake_db;
USE SCHEMA my_snowflake_schema;
-- Create a table representing products purchased by clients.
CREATE TABLE Purchase(Client_ID INT, Product_ID INT)
AS SELECT * FROM VALUES
(1, 1), (1, 3), (2, 1);
-- Create `'Client'` and `'Product'` entities.
CALL RAI.create_entity('Client', 'id');
CALL RAI.create_entity('Product', 'id');
-- Create the entity lookup table.
CALL RAI.create_lookup('Client', 'Purchase', 'Client_ID');
CALL RAI.create_lookup('Product', 'Purchase', 'Product_ID');
-- Create a view with the edge set for the graph.
CREATE VIEW my_edges(source, target) AS (
SELECT RAI.node('Client', Client_ID), RAI.node('Product', Product_ID)
FROM Purchase
);
The advantage of transforming your data this way is that nodes
are automatically represented as integers.
In the preceding example, the Purchase
table has
two integer columns and looks like it could represent edges in a graph.
However, the domains of the nodes overlap,
since there is both a Product
and a Client
with ID 1
.
This overlap is automatically handled by the
RAI.node
function.
See Schema Mapping for details on transforming Snowflake data into entities and working with lookup tables.
Create an Edge Data Stream
The next step is to create a data stream for the graph’s edges using the
create_data_stream
procedure.
Data streams require a RAI database,
which can be set using the
use_rai_database
procedure.
-- Choose the RAI database in which to store the edge data stream.
CALL RAI.use_rai_database('my_rai_db');
/*+-------------------+
| USE_RAI_DATABASE |
+-------------------+
| my_rai_database |
+-------------------+ */
-- Create a RAI data stream from the `'my_edges'` view.
-- By default, the data stream is named `'my_edges'`.
CALL RAI.create_data_stream('my_edges');
/*+------------------------------------------------------------------------------+
| {"account": ***, "id": ***, |
| "name": "my_snowflake_db.rai-my_snowflake_db.my_snowflake_schema.my_edges", |
| ... |
| } |
+------------------------------------------------------------------------------+ */
By default, the functions and procedures in the SQL Graph Library for Snowflake
are contained in the RAI
schema in your Snowflake database.
Note that this schema name may vary
depending on how your administrator set up the RAI integration.
The my_edges
data stream sends data from the my_edges
Snowflake table
to the RAI database my_rai_db
.
Any graph created from the data stream is stored in the same RAI database.
Note that it may take several minutes for the data stream to become available.
See
Create a Data Stream
for more information on creating data streams and checking their state.
Currently, the my_edges
data stream describes a graph with two nodes
— for the clients with IDs 1
and 2
—
and three edges representing the two products purchased by client 1
and the product purchased by client 2
.
If the data contained in the Snowflake source view my_edges
changes,
then the my_edges
data stream, and any graph created from it,
is automatically updated.
Create a Graph
Once the data stream is established,
you can create a graph from the stream using the
create_graph
procedure.
A RAI engine is required to create a graph and may be set with the
use_rai_engine
procedure:
-- Choose the RAI engine that will be used to create the graph.
CALL RAI.use_rai_engine('my_rai_engine');
/*+-----------------+
| USE_RAI_ENGINE |
+-----------------+
| my_rai_engine |
+-----------------+ */
-- Create a graph in RAI called `'my_graph'`
-- with edges from the `'my_edges'` data stream.
CALL RAI.create_graph('my_graph', 'my_edges');
/*+--------------------------------------------------------------------------------+
| {"message":"Graph 'my_graph' created in database 'my_rai_db'.","success":true} |
+--------------------------------------------------------------------------------+ */
By default, my_graph
is undirected.
To create a directed graph, pass {'directed': true}
to the create_graph
procedure’s optional third parameter:
-- Create a directed graph in RAI called `'my_directed_graph'` from the `'my_edges'` data stream.
CALL RAI.create_graph('my_directed_graph', 'my_edges', {'directed': true});
/*+-----------------------------------------------------------------------------------------+
| {"message":"Graph 'my_directed_graph' created in database 'my_rai_db'.","success":true} |
+-----------------------------------------------------------------------------------------+ */
It’s possible to create multiple graphs from the same data stream.
All graphs created with the same schema,
such as the RAI
schema in the preceding examples,
must have unique names,
even if they are created from different data streams.
Graph names are case sensitive.
How Graphs Are Stored
When you create a graph,
the actual graph object on which algorithms are run
is stored in a RAI database.
A catalog of available graphs is maintained in a Snowflake database in the
Snowflake schema from which the SQL Graph Library functions are called,
such as the RAI
schema in the preceding examples.
Graphs stored in this catalog are identified by their name,
so graph names must be unique to the Snowflake schema in which they are created.
Viewing Graph Information
The SQL Graph Library provides three functions for checking whether or not graphs exist and viewing information about graphs:
is_graph_created
, for checking whether or not a graph exists.get_graph
, for viewing information about an existing graph.list_graphs
, for viewing all available graphs.
Check If a Graph Exists
To check whether or not a graph exists,
pass the graph name as a string to the
is_graph_created
function:
SELECT RAI.is_graph_created('my_graph');
/*+------+
| TRUE |
+------+ */
Calling the RAI.is_graph_created
function returns the BOOLEAN
value TRUE
if a graph with the given name has been created using the RAI
schema,
and FALSE
otherwise.
View Info About a Graph
You can view info about a specific graph
by passing the graph’s name to the
get_graph
function:
SELECT RAI.get_graph('my_graph');
/*+------------------------------------------------------------------+
| { |
| "DIRECTED": false, |
| "EDGE_STREAM": "MY_SNOWFLAKE_DB.MY_SNOWFLAKE_SCHEMA.my_edges", |
| "NAME": "my_graph", |
| "RAI_DATABASE": "my_rai_db" |
| } |
+------------------------------------------------------------------+ */
The output is a JSON object that tells you:
- Whether or not the graph is directed.
- The fully qualified name of the Snowflake table or view linked to the graph’s edge data stream.
- The name of the graph.
- The RAI database that the graph is stored in.
If no graph with the given name exists, get_graph
returns NULL
.
List All Available Graphs
To view all available graphs,
use the
list_graphs
function:
SELECT RAI.list_graphs();
/*+--------------------------------------------------------------------+
| [ |
| { |
| "DIRECTED": false, |
| "EDGE_STREAM": "MY_SNOWFLAKE_DB.MY_SNOWFLAKE_SCHEMA.my_edges", |
| "NAME": "my_graph", |
| "RAI_DATABASE": "my_rai_db" |
| }, |
| { |
| "DIRECTED": true, |
| "EDGE_STREAM": "MY_SNOWFLAKE_DB.MY_SNOWFLAKE_SCHEMA.my_edges", |
| "NAME": "my_directed_graph", |
| "RAI_DATABASE": "my_rai_db" |
| } |
| ] |
+--------------------------------------------------------------------+ */
The output is an array of JSON objects.
Updating Graphs
After a graph has been created, you can alter some of its properties. For example, you can:
In each case, you will use the
update_graph
procedure.
This procedure removes the existing graph and replaces it with a new graph
with the same properties as the old graph except for the requested changes.
It’s not possible to change a graph’s name.
The update_graph
procedure is not meant to be used
to add or remove edges from a graph.
This is achieved by adding or removing rows in the Snowflake source table
of the graph’s edge data stream.
These changes are automatically synced between Snowflake and RAI
and do not require a manual update.
Updating a graph requires a RAI engine.
You can either pass the engine name to the update_graph
procedure’s rai_engine
argument
or set the engine in a
RAI context.
Change the Edge Data Stream
You may change the data stream that is associated with a graph
by passing the name of the new data stream to the
edge_stream_name
argument.
-- Update the graph `'my_graph'` to use the data stream `'my_data_stream'`.
-- Note that `update_graph` requires a RAI engine.
CALL RAI.update_graph(
'my_graph', {'rai_engine': 'my_engine', 'edge_stream_name': 'my_data_stream'}
);
/*+-------------------------------------------------------------------------+
| { |
| "message": "Graph 'my_graph' created in RAI database 'my_rai_db'.", |
| "success": true |
| } |
+-------------------------------------------------------------------------+ */
The output message indicates that the new, updated graph has been created
in the RAI database my_rai_db
.
This is the same RAI database that the original graph was stored in
and may not be changed.
Switch a Graph From Undirected to Directed
To change an undirected graph to a directed graph,
set the the update_graph
procedure’s directed
argument to true
:
-- Update the graph `'my_graph'` to be a directed graph.
-- Note that `update_graph` requires a RAI engine.
CALL RAI.update_graph(
'my_graph', {'rai_engine': 'my_engine', 'directed': 'true'}
);
/*+-------------------------------------------------------------------------+
| { |
| "message": "Graph 'my_graph' created in RAI database 'my_rai_db'.", |
| "success": true |
| } |
+-------------------------------------------------------------------------+ */
You can change multiple properties of the graph simultaneously
by providing values to both the directed
and edge_stream_name
arguments.
Deleting Graphs
You can delete a graph by passing the graph’s name to the
delete_graph
procedure.
Deleting a graph requires a RAI engine.
You can either pass the engine name to the delete_graph
procedure’s
rai_engine
argument or set the engine in a
RAI context:
-- Delete `'my_graph'`. Note that `delete_graph` requires a RAI engine.
CALL RAI.delete_graph('my_graph', {'rai_engine': 'my_rai_engine'});
/*+---------------------------------------------------------------------------+
| { |
| "message": "Graph 'my_graph' removed from RAI database 'my_rai_db'.", |
| "success": true |
| } |
+---------------------------------------------------------------------------+ */
You should delete every graph associated with a data stream before deleting the stream.
See Also
See Working With Algorithms to learn more about running graph analytics workloads. See the Overview of Graph Algorithms to explore every algorithm implemented in RelationalAI’s SQL Graph Library for Snowflake.