Skip to content
REL
HOW-TO GUIDES
Data I/O: CSV Export

Data I/O: CSV Export

This how-to guide demonstrates how to export data to a CSV file using Rel.

Goal

The goal of this how-to guide is to demonstrate how to export data to a CSV format through a number of examples. After reading this guide, you should expect to know how to export CSV data using the different options available in Rel.

Simple Data Export

You can start by defining a simple data relation with just one column and exporting it to a cloud_URI location as a CSV file:

def config:data[:beverage] = {"water"; "orange juice"; "apple juice"}
def config:path = cloud_URI
def export = export_csv[config]

The cloud_URI parameter refers to any resource within your cloud service provider where you would like to export the data. See Accessing the Cloud for more details.

The export configuration is defined by a module called, by convention, config. See the Modules concept guide to learn more about modules in Rel.

🔎

Note that config is an arbitrary module name and will be used throughout this guide, but you can give it any other name.

The relation config:data contains a key, :beverage, that defines the CSV column name. Later, you’ll see how to define the column names in a more general way. The CSV file generated by this query will look like this:

beverage
apple juice
orange juice
water

In RAI, a data export is triggered when the control relation export is properly populated. To export into CSV files, you can use export_csv, which takes as input a configuration relation that defines all the appropriate options. This config relation maps the :path configuration key to its value and :data to the data. Section Export Options explains all available configuration options in detail.

The order in which the data are written to the file is not guaranteed. In the previous example, the output data happened to be sorted, but this is not necessarily always the case. As with relations in general, users should not rely on the rows being in any particular order.

The rest of the examples in this tutorial assume that cloud_URI and config are defined, so that the export operation can write to the cloud_URI location. To learn more about the cloud parameters and how to configure your cloud_URI, see Accessing the Cloud.

Exporting Data With Multiple Columns

When there is more than one column, you can expect the data relation to have tuples of the form (column, pos..., v), where pos... is a set of keys that group corresponding values into the same row.

In the simple example above, it was possible to omit the pos... fields because only a single column was exported. When you have multiple columns to export, you need pos..., which may be a single or compound key (see section Compound Keys).

Here is an example of exporting an arity-2 relation, which contains cocktail names and their corresponding prices:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

As you can see, before exporting, you need to collect all the data in one relation, i.e., data in this case. The relation data must follow the Coordinate Format (COO), i.e. : (column, row_id, value), where column should be a Symbol (of typeRelName). The positional key (row_id in this case) is not exported.

The generated CSV file for this example reads:

cocktail,price
sazerac,15
cosmopolitan,20
martini,12

Export Options

When exporting data to a CSV file

def export = export_csv[config]

all configurations are controlled via the relation config, which maps configuration keys, for example path, to their value, i.e. a particular cloud_URI. To export the data in the exact format needed, the following configuration options are available:

OptionDescription
pathA string specifying the URI (location and name) where the data will be exported. See Accessing the Cloud for more details.
dataThe relation that should be exported. It generally has the form (:column_name, row_key..., value). If no relation is provided, the output file defined by :path will be empty.
syntaxA relation containing syntax configuration.
integrationCredentials needed to export data to the cloud. See Accessing the Cloud for more details.

Syntax

The syntax field can be used to specify formatting options and contains the following options:

keydescription
syntax:headerA relation with (Int, RelName) pairs, specifying the column names in the exported file, where the Int indices indicate the column order, and the column name is provided as a Symbol (of type RelName). This option overrides the default (or existing) header names.
syntax:header_rowAn Int to control whether or not a header row, i.e., column names, is exported. For values of 1 or greater, the header is written in the first row, which is also the default case when this option is not defined.
syntax:missingstringA String that will be used to represent missing values.
syntax:delimA Char that represents the delimiters used in the file to separate the columns. The default delimiter is a comma ','.
syntax:quotecharA Char that indicates a quoted field. Such a field can contain textual delimiters or newline characters. The default quote char is '"'.
syntax:escapecharThe Char used to escape the quote characters within a quoted (e.g., text) field. The default escape char is a backslash '\\'.

Export Examples

This section demonstrates how to use the most common configuration options.

Delimiter and Quotes

The first example shows how to export string data that contain the default quote character ". To parse the string properly, the quote character must be escaped with a leading backslash, that is, \".

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini \"shaken not stirred\"")}
def price = {(1, 15); (2, 20); (3, 12)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

The resulting CSV file reads:

cocktail,price
sazerac,15
cosmopolitan,20
"martini \"shaken not stirred\"",12

You can also take a look at this simple example changing the default delimiter to | instead of the comma ,.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
 
// configure the csv export:
def config[:syntax, :delim] = '|'
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

The CSV output file reads:

cocktail|price
sazerac|15
cosmopolitan|20
martini|12

As a more complex example of tweaking special characters, you can redefine all special CSV characters: the delimiter, the quote character, and the escape character.

def cocktail = {(1, "_sazerac_"); (2, "cosmopolitan (or cosmo)"); (3, "_martini!__drink_")}
def price = {(1, 15); (2, 20); (3, 12)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
 
// configure the csv export:
def config[:syntax, :delim] = ';'
def config[:syntax, :quotechar] = '_'
def config[:syntax, :escapechar] = '!'
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

This example changed the delimiter to ;, the quote character to _, and the escape character to !. As a result, all _ and ! characters are escaped with a leading !.

The generated CSV file looks like this:

cocktail;price
_!_sazerac!__;15
cosmopolitan (or cosmo);20
_!_martini!!!_!_drink!__;12

Headers

You might often want to export the data but not the column names. You can do this by setting syntax:header_row to a value less than 1. For example:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
 
// configure the csv export
def config[:syntax, :header_row] = -1
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

The CSV file reads:

sazerac,15
cosmopolitan,20
martini,12

Missing Values

In the previous examples, the data output was a relation where all the values were present. This section explores what happens when some of the values are missing. It begins with a simple example of exporting the data but with some values missing across all columns:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

Note that the data have been added for another relation named rating. Here’s the output:

cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,
martini,12,4.0
,10,
bellini,,3.5
,19,
,,2.0

As you can observe, whatever values were missing from the relations cocktail, price, and rating are simply empty in the final output. In essence, you are performing an outer join in this case when exporting the data.

If you would like to actually require a specific column to appear in the output, you would need to add a condition in the logic (see section Order and Presence of Columns).

In the case that you would like to use a specific string to indicate the absence of values instead of showing them as empty as in the example above, you can use the missingstring option. Here is the same example using the ? string to indicate missing values:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
 
// configure the csv export
def config[:syntax, :missingstring] = "?"
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

Giving the output:

cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,?
martini,12,4.0
?,10,?
bellini,?,3.5
?,19,?
?,?,2.0

Order and Presence of Columns

In all previous cases, the columns of the output have been alphabetically ordered. This section discusses how to reorder, omit, and even add new, empty columns in the output. This is controlled by the config:syntax:header configuration. It should be a relation of the form: (column_position, column_name).

Here is an example of how to manually define the column order:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
 
// configure the csv export
def config[:syntax, :header] = { (1, :rating) ; (2, :price) ; (3, :cocktail) }
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

giving the following output:

rating,price,cocktail
3.0,15,sazerac
4.0,20,cosmopolitan
3.5,12,martini

It is also easy to skip a column if you don’t want to include it in the output. The following example outputs only two of the three columns in the relation data:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
 
// configure the csv export
def config[:syntax, :header] = {(1, :rating); (2, :cocktail)}
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

giving the output:

rating,cocktail
3.0,sazerac
4.0,cosmopolitan
3.5,martini

You can also specify columns that may not exist. The following example directs a column nonexistent, which is totally empty and has no reference in data, to appear in the exported CSV file:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (4, 4.0); (3, 3.5)}
 
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
 
// configure the csv export
def config[:syntax, :header] = {(1, :rating); (2, :nonexistent); (3, :cocktail)}
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

giving the output:

rating,nonexistent,cocktail
3.0,,sazerac
,,cosmopolitan
3.5,,martini
4.0,,

Positional Keys

So far, you have used integer values as a positional key to control the order of the data records (or rows) in the CSV file. If the integer values start at 1 and increase by 1 (without any gaps), the positional key corresponds to the data row in the CSV file.

It’s best not to have overlapping key values. For example, the following code:

def cocktail = {(1, "sazerac"); (2, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

will generate the output:

cocktail
sazerac
sazerac

where you can observe that the same value, i.e., sazerac, appears twice since it has different keys. Compare with the case where the same key appears twice:

def cocktail = {(1, "sazerac"); (1, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

the tuple (1, "sazerac") appears only once:

cocktail
sazerac

Positional keys can start at any value and can have arbitrary step sizes. The positional key doesn’t even need to be an integer and can be anything that enforces an order. As the following subsections will show, you can use dates and/or compound keys as long as the data types used can be ordered.

Dates as Key

You can add the date when a particular drink was consumed and export the data.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def mydate = {
    (1, parse_date["2020-03-03", "Y-m-d"]);
    (2, parse_date["2020-02-02", "Y-m-d"]);
    (3, parse_date["2020-01-01", "Y-m-d"])
}
 
def data(:cocktail, pos, v) = (id: cocktail(id, v) and mydate(id, pos))
def data(:price, pos, v) = (id: price(id, v) and mydate(id, pos))
def data(:mydate, pos, v) = mydate(_, v) and pos = v
 
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

The original data are still keyed by an integer, but for the relation data, which will be exported, use the dates in mydate to key the data. Because the positional key, pos, is not exported, you need to define an additional column :mydate that also contains the dates.

The generated CSV file reads:

cocktail,mydate,price
martini,2020-01-01,12
cosmopolitan,2020-02-02,20
sazerac,2020-03-03,15

Compound Keys

The positional key doesn’t need to be a single value (like an integer or a date), but it can also be a collection of values, which is known as a compound key. Compound keys are useful for highly structured datasets with many structural levels. Additionally, a compound key is an ideal tool for group-by operations where the data are organized in groups where additional structure and/or order needs to be enforced.

Consider the cocktail example. In addition to the price, the example adds the name of the bar and the order history across all bars, which you store in a normalized form. The next step is to export the cocktail order history (cocktail name, location, price, and time) across all bars as shown in the following example.

def price["foobar"] = {("sazerac", 10); ("cosmopolitan", 15);}
def price["wunderbar"] = {("cosmopolitan", 14); ("martini", 16);}
 
// order information
def order_location = {1;  2; 3}, "foobar"
def order_location = {100; 101; 102}, "wunderbar"
 
def time_format = "YYYY-mm-dd HH:MM"
def order_time = {
    (1,  parse_datetime["2019-12-31 22:00", time_format]);
    (2,  parse_datetime["2019-12-31 23:00", time_format]);
    (3,  parse_datetime["2020-01-01 00:00", time_format]);
    (100, parse_datetime["2019-12-31 22:30", time_format]);
    (101, parse_datetime["2019-12-31 23:30", time_format]);
    (102, parse_datetime["2020-01-01 00:30", time_format]);
}
 
def order_item = {
    (1, "cosmopolitan");
    (2, "sazerac");
    (3, "cosmopolitan");
    (101, "cosmopolitan");
    (100, "cosmopolitan");
    (102, "martini");
}
 
 
// join the data
def data:cocktail =
    order_item[order], order_time[order], order, order_item[order]
    from order
 
def data:time =
    order_item[order], order_time[order], order, order_time[order]
    from order
 
def data:location =
    order_item[order], order_time[order], order, order_location[order]
    from order
 
def data:price =
    order_item[order], order_time[order], order, p
    from order, p
    where p = order_location[order].price[order_item[order]]
 
// export configuration
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

The CSV file contains four columns defined by :cocktail, :location, :price, and :time in the relation data. The compounded positional key is the tuple of the three arguments stored after the column names, i.e.: order_item[order], order_time[order], order referring to the cocktail name, order time, and order ID.

As in the case for a single-valued positional key, the entries in the compound key are not exported. You can verify this by looking at the CSV file, where you can see that there is no order ID column. The cocktail name and the order time are shown in the CSV file because you explicitly constructed two columns, :time and :cocktail, which contain this information.

cocktail,location,price,time
cosmopolitan,foobar,15,2019-12-31T22:00:00
cosmopolitan,wunderbar,14,2019-12-31T22:30:00
cosmopolitan,wunderbar,14,2019-12-31T23:30:00
cosmopolitan,foobar,15,2020-01-01T00:00:00
martini,wunderbar,16,2020-01-01T00:30:00
sazerac,foobar,10,2019-12-31T23:00:00

Type-Overloaded Keys

There are certain cases where you can have data with type-overloaded keys. For instance, in the previous running example, you could have two bars where one keeps orders as an integer while the other one keeps them as string. Case in point, the code below exports the data where the order_item relation contains a key of two types, i.e., string and integer.

def order_location = {"111"; "222"}, "foobar"
def order_location = {111}, "wunderbar"
 
def order_item = {
 ("111", "cosmopolitan" );
 ("222", "sazerac" );
 (111, "martini");
}
 
def data:cocktail = order, order_item[order]     from order
def data:location = order, order_location[order] from order
 
// export configuration
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

The output file looks as follows:

cocktail,location
cosmopolitan,foobar
sazerac,foobar
martini,wunderbar

You can see that the output is similar to sequentially appending the data from both key types. In other words, Rel can handle the fact that the keys are of different types and outputs the data accordingly. This is also true for compound keys with different arities, e.g., combining arity-1 and arity-2 keys (see section Compound Keys).

Multiple Exports

You can also perform multiple exports of data. Here is an example that exports the same data to two different files and chooses a different delimiter for each file. To do this, you need to assign each export instruction a unique key within the relation export (in this case, :one and :two) in order to distinguish the exports. Note that the unique keys within the relation export need to be specialized values.

def data:cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def data:price = {(1, 15); (2, 20); (3, 12)}
 
def config_base:data = data
 
module config1
    def path = "cloud_URI1"
    def syntax:delim = '|'
end
 
module config2
    def path = "cloud_URI2"
    def syntax:delim = ';'
end
 
def export[:one] = export_csv[config1 <++ config_base]
def export[:two] = export_csv[config2 <++ config_base]

The two CSV files look as follows:

>>> csv_data_1.csv
cocktail|price
sazerac|15
cosmopolitan|20
martini|12

>>> csv_data_2.csv
cocktail;price
sazerac;15
cosmopolitan;20
martini;12

Note that through the use of the left override operator, <++, configurations that are not defined within modules config1 and config2 but defined in config_base are used as default. This prevents the case that multiple values are assigned to the same configuration, which can easily happen when using the union operator, ;, since overwriting is not applied.

Exporting Relations Sequentially

In certain cases, you may wish to export relations one after the other in a sequential fashion. This is useful, for example, if you want to export data from different relations to a single file. Here is an example that exports relations in such a way:

def cocktail = {"sazerac"; "cosmopolitan"; "martini"}
def price = {15; 20; 12}
 
def data(:cocktail, pos, v) = auto_number[cocktail](v, pos)
def data(:price, pos, v) = auto_number[price](v, pos)
 
// export configuration
def config:data = data
def config:path = cloud_URI
 
def export = export_csv[config]

giving the output:

cocktail,price
cosmopolitan,
martini,
sazerac,
,12
,15
,20

The use of auto_number generates unique (i.e., non-overlapping) keys for each relation. Please note that because of auto_number, each relation has missing values because each value in cocktail and prices will get a different position value.

How Not to Use the export Relation

As discussed, populating the relation export triggers the data export. This functionality can be problematic if export is persisted in the database, for instance, when loading a model. In this case, export will be evaluated, and data will potentially be exported every time a request is sent to the database.

Generally, this behavior is not desired, and it is therefore advisable to define the export relation only as a transient predicate in a query and not as part of an installed model. See Working With Models and Working With Base Relations for more details.

Similarly, you should be careful not to install the definition of config used in an export, since it will then be picked up by other import or exports if they use the same config name.

See Also

Was this doc helpful?