Skip to content
Working With JSON Data
JSON Data Formats

JSON Data Formats

This concept guide introduces two approaches to working with JSON data in the Relational Knowledge Graph System (RKGS): JSON with a general schema and JSON with a data-defined schema.


The RAI RKGS currently supports two ways of representing and working with JSON data. The two representations store data in the RGKS using different approaches, but they are otherwise equivalent in terms of the operations and computations that you can perform over the data.

Consider the following JSON data , which you will use as a running example for the remainder of this guide:

  "first_name": "Jack",
  "last_name": "Bauer",
  "address": { "city": "Los Angeles",
               "state": "CA" },
  "phone": [
    { "type": "home",
      "number": "(310) 242 4242" },
    { "type": "cell",
      "number": "(310) 280 3992" }

To load the data in Rel, you can specify their location in the cloud as follows:

// model
def config:path = "azure://"

The RAI RKGS expects the JSON data that are going to be loaded to have the correct format. The data loading process is atomic. This means that if there is an error in the data, the system does not attempt to recover any of the data. In this case, the whole transaction is aborted with an error message and the data are not loaded.

JSON With a General Schema

This representation of JSON data allows for arbitrary schemas and stores the data in Graph Normal Form (GNF). This creates relations with more depth, i.e., more rows than the previous format. In this case, each node in the JSON tree is represented by a unique ID (a hash), and each node is connected to its children through these IDs.

You can still envision the data in a tree structure. Here, however, each node can either be the root, a child, a value, or an object. For example, consider the following data:

  "first_name": "Jack",
  "address": { "city": "Los Angeles",
               "state": "CA" }

The image below represents these JSON data as a tree with different levels: the name (green portion), the address (blue portion), the city (orange portion), and the state (purple portion).

JSON dag

Here is a detailed explanation of the relations used in this representation:

root(Entity)This is the root of the loaded JSON file.
child(Entity, Entity, Entity)This relation contains JSON objects and arrays. In this case, the first argument is the parent, and the second argument is the name (for objects) or index (for arrays). The actual string name or index is in the :value relation. The third argument is the child node.
value(Entity, <value>)This relation captures data stored in the leaves of the example tree. In this case, :value is one of the supported JSON data types, i.e., String, Int64, Float64, Boolean, and Missing.
array(Entity)This relation captures JSON arrays and can be used to distinguish an empty array from an empty object.
object(Entity)This relation captures JSON objects. In a given JSON document, identical subtrees use the same entity. In this case, the overall representation is a directed acyclic graph (DAG). When analyzing a JSON document, you should use caution and identify cases where there are shared subtrees containing the same entity.

Although the representation may seem elaborate, it is generic and allows for better scaling. Essentially, each relation described in the table above has a “pointer” — in terms of an entity ID — that points to the parent nodes in the JSON tree.

To load and work with JSON data that use a general schema and are represented in Graph Normal Form, you can use the built-in Rel relation load_json_general. This is useful for cases when you require JSON data of arbitrary depth. In this case, load_json_general allows you to load such data without any scaling issues even when there are several distinct property names in the data.


Note that loading and representing data in GNF is the recommended way to work with JSON data in the RAI RKGS.

When you load data in this way, the RAI RGKS creates six subrelations under the my_json_general relation, namely root, child, value, array, index, and object.

Here is an example that uses a general schema to load the exact same data as before:

// write query
def insert:my_json_general = load_json_general[config]
def output = my_json_general

The output now has many more rows, but fewer columns, compared to the representation with a data-defined schema that you will see in the next section. At a high level, each individual node in the JSON tree is represented in a separate row.

Here’s how to get the output of the root relation:

// read query
with my_json_general use root
def output = root

You can get the output of the rest of the relations in a similar way:

// read query
with my_json_general use value, child, root, array, object, index
def output:root = root
def output:child = child
def output:value = value
def output:array = array
def output:object = object
def output:index = index

To work in this general schema you can use Rel joins.

Here is an example that queries the two phone numbers in the data:

// read query
with my_json_general use value, child, root
def index[v, x] = value(x, v) and Int(v)
def name[s, x] = value(x, s) and String(s)
def phones[id] = root.child[name["phone"]].child[index[id]].child[name["number"]].value
def output = phones

JSON With a Data-Defined Schema

To represent JSON data in the RKGS, the system extracts the schema from the data then stores the data in a wide format. In other words, each key within the JSON data will have its respective values and children arranged in a wide relation.

Here’s a conceptual representation of the data above in a tree structure:


In this case, each value at the leaf of the JSON tree is represented by one row in the relation.

To load and work with JSON data using the schema within the data, you can use the built-in Rel relation load_json. This loads the data either from a given cloud URL — currently only Azure is supported — or from a string constant, and represents them in a wide format. Conceptually, this format is similar to the one in the image below :

JSON wide relation

Here is an example that loads the JSON data from the previous section:

// write query
def insert:my_json = load_json[config]
def output = my_json

This approach creates a representation using the data’s schema. The output shows that each path in the JSON tree above is represented as one row in the my_json relation.

Note that when the data have an array, the :[] symbol is used in one of the columns in the relation, immediately followed by an integer. The integer indicates the position of the remaining items within the array. The data in the previous example have one array, but nested arrays are also supported.


There are two approaches to representing JSON data in the RAI RKGS. Each approach depends on the type of schema used. In general, however, loading and representing data in GNF is the recommended way to work with JSON data in the RAI RKGS. As a next step, see the JSON With a General Schema and the JSON With a Data-Defined Schema concept guides.

Was this doc helpful?