Skip to content

JSON Representations

This guide presents the supported JSON representations in the Relational Knowledge Graph System (RKGS).

Introduction

Currently, there are two schema representations you can use to work with JSON data in the RKGS:

JSON SchemaDescription
Data-definedExtracts the schema from the data and stores them in a wide format.
GeneralStores the JSON data as a tree, the nodes of which are treated as entities.

The two representations store data in the RGKS using different approaches, but they are otherwise equivalent in terms of the operations and computations that you can perform over the data.

Data Types
Generally in both formats, the data type conversions happen automatically during the loading and exporting operations.

🔎

See the JSON Data Types guide for more information on how JSON native types map to Rel types.

Loading and Exporting
Each format has its own set of data loading and exporting operations. Note that the data-defined JSON schema has more support than the general schema.

💡

Currently, you can’t export JSON data in the general format.

TaskData-Defined SchemaGeneral Schema
Loadingload_json and load_jsonlinesload_json_general and load_jsonlines_general
Exportingexport_jsonn/a
String conversionjson_stringn/a

Data-Defined Schema

Consider the following JSON data:

{
    "first_name": "John",
    "last_name": "Smith",
    "address":
    {
        "city": "Seattle",
        "state": "WA"
    },
    "phone":
    [
        {
            "type": "home",
            "number": "206-456"
        },
        {
            "type": "work",
            "number": "206-123"
        }
    ]
}

Here’s their tree representation:

JSON Tree
JSON Tree

You can load them using a data-defined schema as follows:

// read query
 
def my_json = load_json["azure://raidocs.blob.core.windows.net/working-with-json/json_example.json"]
def output = my_json

Note how each key within the JSON data has its respective values and children arranged in a wide relation within my_json. See the JSON Data With a Data-Defined Schema guide for more details.

See also the JSON Data Types guide for more information on how JSON native types map to Rel types.

General Schema

Here are the same JSON data, represented using the general schema:

// read query
 
def my_json = load_json_general["azure://raidocs.blob.core.windows.net/working-with-json/json_example.json"]
def output = my_json

Note how the general representation creates relations with more rows than the data-defined schema. This is because each node in the JSON tree is represented by a unique ID (a hash) and stored as a separate entry in the relation.

Additionally, when loading the data into the general schema, the relationships between nodes — most notably child and index — are also stored within the relation my_json. See the JSON Data With a General Schema guide for more details.

Summary

There are two representations of JSON data in the RKGS. The data-defined schema extracts the schema from the data and represents the data in a wide relation with many columns. The general schema approach stores the JSON data as a tree, where the nodes of the tree are treated as entities.

For more information, see the JSON With a General Schema and the JSON With a Data-Defined Schema concept guides.

Was this doc helpful?