JSON Import
This how-to guide demonstrates how to import JSON data using Rel.
Goal
By following this guide, you will learn how to import JSON (opens in a new tab) data into a database using Rel.
This guide complements the JSON Export guide and the Working With JSON Data guides, where you can find all the relevant information about supported JSON data types and representations as well as how to work with them.
See the Data Import and Export guides for more information on how to import and export other types of data.
Importing JSON Data
The RKGS provides two JSON representations. Each representation has its own data loading operation:
Rel Relation | JSON Schema | Description |
---|---|---|
load_json | Data-defined | Extracts the schema from the data and stores them in a wide format. |
load_json_general | General | Allows for arbitrary schemas and stores the data in Graph Normal Form (GNF). |
The two JSON loading operations share the same configuration options, which are typically defined in a relation called config
:
def config = ...
def my_json = load_json[config]
def my_json_general = load_json_general[config]
The relation config
— which is an arbitrary name — is used to define all JSON load configurations.
For details on all the available options, see Import Options below.
See the Working With JSON Data guides for more details on how to work with JSON data using these two schema representations.
Getting Started
Consider the following JSON file person.json (opens in a new tab) hosted in the cloud:
{
"first_name": "John",
"last_name": "Smith",
"id": 10E+02,
"gender": null,
"address":
{
"city": "Seattle",
"state": "WA"
},
"phone":
[
{
"type": "home",
"number": 206456
},
{
"type": "work",
"number": 206-123
}
]
}
You can load this JSON file into the RKGS as follows:
// read query
def json = load_json["azure://raidocs.blob.core.windows.net/csv-import/person.json"]
def output = json
Note how the nested data structure around the object address
is represented in the data-defined schema with additional columns in the relation.
The field phone
is an array.
Also, some data type conversions happen automatically during the loading.
For example, the null
value in the field gender
is converted to missing
, while the scientific notation in the field id
is converted to a Float64
.
See the JSON Data Types guide for more information on how JSON native types map to Rel types.
Using the data-defined schema in Rel turns JSON keys into relation names of type RelName
.
Arrays are relations as well, using :[]
as the relation name.
Positional information within an array is indicated explicitly with an integer argument, as shown in the example above.
You can load it using the general schema as follows:
// read query
def json = load_json_general["azure://raidocs.blob.core.windows.net/csv-import/person.json"]
def output = json
Notice how the general representation creates relations with more depth — more rows than the data-defined schema. This is because each node in the JSON tree is represented by a unique ID (a hash), and each node is connected to its children through these IDs. See the JSON Data With a General Schema guide for more details.
Import Options
In situations where it’s not required to specify a configuration, you can just provide the file path:
def my_json = load_json["azure://...."]
Or:
def my_json_general = load_json_general["azure://...."]
For more customized imports, load_json
and load_json_general
take a configuration relation as an argument:
module config
// ...
end
def my_json = load_json[config]
def my_json_general = load_json_general[config]
The configuration relation is typically named config
, but you can also use any other name.
Within the load configuration, you can specify:
Option | Description |
---|---|
path | A URL for the JSON data. |
data | An alternative way to specify the data itself. A JSON formatted string value. |
integration | Credentials — if needed — to access the data. |
You can use the first two options to import data from the cloud and the data
option to import local files.
Path
The import option path
is a string that specifies the location and name of the file you want to import.
Currently, this string can point to azure://...
(Microsoft Azure) or s3://...
(Amazon S3) URLs.
See the Accessing the Cloud guide for more details.
module config
def path = "azure://myaccount.blob.core.windows.net/container/file.json"
end
Data
The import option data
is a string specifying the actual data to be imported.
The path
option takes precedence over the data
option.
If both are present, the data
option is ignored.
module config
def data = "a,b,c\n1,2,3\n4,5,6"
end
Integration
You may need to specify credentials for accessing private JSON data in cloud storage,
using the import option integration
.
See the Accessing the Cloud guide for more details.
Data Sources
This section covers the available options to import JSON data.
Strings
You can import JSON data directly from a string.
Here’s an example using the data-defined schema:
// read query
def config[:data] =
"""
{"name": "John", "phone": [{"no.":"1234"}, {"no.":"4567"}]}
"""
def output = load_json[config]
This is the same example using the general representation:
// read query
def config[:data] =
"""
{"name": "John", "phone": [{"no.":"1234"}, {"no.":"4567"}]}
"""
def output = load_json_general[config]
Local Files
You can load local JSON data using the importing functionality from the RAI Console. These are the required steps:
-
Upload the file as a string to a named relation, for example,
my_data
. -
Use
def config:data = my_data
, and set other configuration options as needed. -
Run
load_json[config]
orload_json_general[config]
, depending on the schema representation you are using.
See the Loading JSON Data guide for more details.
You can import files up to 64MB in size. If you try to import a larger file, you will receive a validation error. This limit applies to local files only; no limit is imposed for imports from cloud sources.
Cloud
You can import public and private JSON data from the supported cloud providers.
Here’s an example that loads public data:
module config
def path = "s3://my-s3-bucket/myfile.json"
end
def my_json = load_json[config]
def my_json_general = load_json_general[config]
See the Accessing the Cloud guide for more details.
Compressed Files
You can import compressed JSON files from the cloud. No specific options are required, so you can just provide the file path with the compression extension. For example:
module config
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.json.gz"
end
def my_json_compressed = load_json[config]
def my_json_general_compressed = load_json_general[config]
See Compressed Data for more details.
Loading JSON Lines
The RKGS supports the JSON Lines file format.
JSON Lines (opens in a new tab) is a file format that treats each line of the text file as a separate JSON object.
Rel provides the following built-in relations depending on the schema representation you are working with:
Rel Relation | JSON Schema |
---|---|
load_jsonlines | Data-defined |
load_jsonlines_general | General |
Here’s an example using the data-defined schema:
// read query
def data =
"""
{"name": "Anton", "age":56}
{"name": "Alex", "age":44}
"""
def config[:data] = data
def output = load_jsonlines[config]
This is the same example using the general representation:
// read query
def data =
"""
{"name": "Anton", "age":56}
{"name": "Alex", "age":44}
"""
def config[:data] = data
def output = load_jsonlines_general[config]
Import Errors
The RKGS expects the JSON data that are going to be imported to have the correct format.
The data import process is atomic. This means that if there is an error in the data, the system does not attempt to load any of the data.
In such cases, the whole transaction is aborted with an error message and the data are not loaded. See the Loading Errors guide for more details.
See Also
For more information on the supported JSON data types and formats as well as how to work with them, see the Working With JSON Data concept guides.
See the JSON Export guide for exporting JSON and the Data Import and Export guides for more information on how to import and export other types of data.