Supported Data Sources

This guide describes the supported data sources for importing data into the Relational Knowledge Graph System (RKGS).

Goal

By following this guide, you’ll learn which data sources are supported by the system, allowing you to import data into the RKGS.

This guide complements the other Data Import and Export guides.

Supported Data Sources

These are the different sources you can use to import data into the RKGS:

Source	Description	Data Type
Strings	Load data from strings formatted in a certain data type.	String in Rel source code.
Local files	Load files locally from your computer.	Binary, CSV, JSON, JSON Lines, and Parquet files.
Cloud	Load data from the supported cloud providers.	Binary, CSV, JSON, JSON Lines, Parquet files and Iceberg tables.
Snowflake	Load data from Snowflake through RAI data streams.	Snowflake object, which can be a SQL table (opens in a new tab) or a view (opens in a new tab).

The Configuration Module

When importing data from the cloud or strings, loading is controlled via a configuration module. This module is passed as an argument to the loading relations:

module config
    // ...
end
 
def my_json = load_json[config]

See Data Types for more details on supported data types and the available loading relations.

🔎

The configuration relation is usually named config but any other name can be used.

The following configuration options are common to all Rel data-loading relations and control where the data are loaded from.

Option	Description	Used For
`data`	A formatted string value.	String.
`path`	A URL for the data.	Cloud.
`integration`	Credentials, if needed, to access private data.	Cloud.

🔎

Data-loading relations for specific data types, like CSV, may have additional configuration options for import and export.

The option path is a string that specifies the location and name of the file you want to import. It can be combined with integration for specifying credentials when accessing private data in cloud storage. See Cloud for more details.

The option data is a string specifying the actual data to be imported. You use this to load data from strings.

💡

The path option takes precedent over the data option. If both are present, the data option is ignored.

Strings

You can import data as a formatted String in Rel source code. This option is convenient for experimental examples and small datasets.

Here’s an example of loading some JSON data:

// read query
 
module config
    def data = 
    """
      {"name": "John", "phone": [{"no.":"1234"}, {"no.":"4567"}]}
    """
end
 
def output = load_json[config]

Local Files

You can load files directly from your computer using:

RAI Interface	Details
RAI Console	Load data using the import functionality.
CLI	Load CSV and JSON data using the loading data commands.
RelationalAI SDKs	Load CSV and JSON data using the loading data functionality.
RelationalAI VS Code Extension (opens in a new tab)	Load CSV and JSON data either via RelationalAI View > Admin > Commands or the Command Palette.

Here’s an example of loading a CSV file using the Python SDK:

file_name = "my_data_file.csv"
with open(file_name) as fp:
    data = fp.read()
 
api.load_csv(ctx, database, engine, "my_csv", data)

⚠

You can load local files up to 64MB in size. To upload larger files, use the cloud.

Cloud

You can load data from several cloud providers. The system supports importing both public and private data. Importing private data requires setting up certain credentials within the loading configuration module.

Here’s an example of loading an Iceberg table from AWS S3:

module config
    def path = "s3://my-s3-bucket/path/to/table_iceberg"
end
 
def my_iceberg = load_iceberg[config]

See Accesing the Cloud for more details on supported providers, regions, and examples.

Snowflake

The RAI Integration Services are a collection of features designed to enable RAI’s advanced modeling and querying capabilities to analyze your Snowflake data in new ways, including graph analytics.

A RAI data stream synchronizes your data from a Snowflake database to a RAI database. You can think of a RAI data stream as a materialized view that connects your Snowflake data with RAI. To use it, you need access to a RAI integration.

🔎

To load data this way, you will use Snowflake’s interface and write some SQL.

Here’s an example that creates a RAI data stream between the Snowflake table product and the RAI relation rai_product:

CREATE OR REPLACE TABLE product (ID INT, Name STRING, Category STRING) 
    AS SELECT * FROM VALUES
    (11, 'Laptop', 'Electronics'),
    (27, 'Shirt', 'Clothing');
 
CALL RAI.create_data_stream('my_sf_db.my_sf_schema.product', 'my_rai_db', 'rai_product');

See RAI Data Streams for more details.

Supported Data Sources

Goal

Supported Data Sources

The Configuration Module

Strings

Local Files

Cloud

Snowflake

See Also