CSV Export

This how-to guide demonstrates how to export data to a CSV file using Rel.

Goal

The goal of this how-to guide is to demonstrate how to export data to a CSV format through a number of examples. After reading this guide, you should expect to know how to export CSV data using the different options available in Rel.

Simple Data Export

You can start by defining a simple data relation with just one column and exporting it to a cloud_URI location as a CSV file:

def config:data[:beverage] = {"water"; "orange juice"; "apple juice"}
def config:path = cloud_URI
def export = export_csv[config]

The export configuration is defined by a module that’s called, by convention, config. The relation config:data contains a key, :beverage, that defines the CSV column name. Later, you’ll see how to define the column names in a more general way. The CSV file generated by this query will look as follows:

beverage
apple juice
orange juice
water

In Rel, a data export is triggered when the relation export is properly populated. To export into CSV files, you can use export_csv, which takes as input a configuration relation that defines all the appropriate options. This config relation maps the :path configuration key to its value and :data to the data. Section Export Options explains all available configuration options in detail.

The order in which the data is written to the file is not guaranteed. In the previous example, the output data happened to be sorted, but this is not necessarily always the case. As with relations in general, users should not rely on the rows being in any particular order.

Cloud Configuration

To run the examples in this tutorial, you will need to include your cloud parameters in the config relation.

Here is an example that showcases how to set the parameters for exporting to Azure Blob Storage through the use of the integration parameter within the export_csv. Please note that you will have to provide a valid sas_token and azure_url for the code to work. These tokens should be retrieved by the user through their valid Azure account and provided in the code below:

def data = ...

def config[:integration, :provider] = "azure"
def config[:integration, :credentials, :azure_sas_token] = "sv=2014-02-14&sr=b&si=TestPolicy&sig=o%2B5%2F0C%2BLm3tXWftMJvQGEKHlSt%2Afs5No7FTkHk5L%2Bv0%3D"

def azure_url = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.txt"

def export = export_csv[(:path, azure_url); (:data, data); config]

The examples below assume that cloud_URI and config are defined, so that the export operation can write to the cloud_URI location.

Exporting Data with Multiple Columns

When there is more than one column, you can expect the data relation to have tuples of the form (column, pos..., v), where pos... is a set of keys that group corresponding values into the same row.

In the simple example above, it was possible to omit the pos... fields because only a single column was exported. When you have multiple columns to export, you need pos..., which may be a single or compound key (see section Compound Keys).

Here is an example of exporting an arity-2 relation, which contains cocktail names and their corresponding prices:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

def config:data = data
def config:path = cloud_URI

def export = export_csv[config]

As you can see, before exporting, you need to collect all the data in one relation, i.e., data in this case. The relation data must follow the Coordinate Format (COO), i.e. : (column, row_id, value), where column should be a RelName. The positional key (row_id in this case) is not exported.

The generated CSV file for this example reads:

cocktail,price
sazerac,15
cosmopolitan,20
martini,12

Export Options

When exporting data to a CSV file

def export = export_csv[config]

all configurations are controlled via the relation config, which maps configuration keys, for example path, to their value, i.e. a particular cloud_URI. To export the data in the exact format needed, the following configuration options are available:

Required configuration keys:

keydescription
pathA string specifying the URI (location and name) where the data will be exported.

Optional configuration keys:

keydescription
dataThe relation that should be exported. It generally has the form (:column_name, row_key..., value). If no relation is provided, the output file defined by :path will be empty.
syntaxA relation containing syntax configuration.
integrationA relation containing storage integration configuration.

Syntax Options

The :syntax field can be used to specify formatting options and contains the following options:

keydescription
syntax:headerA relation containing symbols that can be used as column names together with the desired ordering of the columns in the output. The idea here is that the user can override the default (or existing) header names.
syntax:header_rowThe row containing the header names. Values smaller than 1 indicate that no header should be written. For values of 1 or greater, write the header to the first row, which is also the default when this key is not present.
syntax:missingstringA String that will be used to represent missing values.
syntax:delimA Char that represents the delimiters used in the file to separate the columns. The default delimiter is a comma ','.
syntax:quotecharA Char that indicates a quoted field. Such a field can contain textual delimiters or newline characters. The default quote char is '"'.
syntax:escapecharThe Char used to escape the quote characters within a quoted (e.g., text) field. The default escape char is a backslash '\\'.

Integration Options

keydescription
integrationThe integration field allows for exporting to the cloud. In this case, you will also have to specify the provider and the credentials to be used for accessing the cloud.
integration:providerString, e.g., “azure”.
integration:credentials:azure_sas_tokenString with azure token.

Export Examples

This section demonstrates how to use the most common configuration options.

Delimiter and Quotes

The first example shows how to export string data that contains the default quote character ". To parse the string properly, the quote character must be escaped with a leading backslash, that is, \".

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini \"shaken not stirred\"")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

def config:data = data
def config:path = cloud_URI

def export = export_csv[config]

The resulting CSV file reads:

cocktail,price
sazerac,15
cosmopolitan,20
"martini \"shaken not stirred\"",12

You can also take a look at this simple example changing the default delimiter to | instead of the comma ,.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

// configure the csv export:
def config[:syntax, :delim] = '|'

def config:data = data
def config:path = cloud_URI

def export = export_csv[(:path, cloud_URI); (:data, data); config]

The CSV output file reads:

cocktail|price
sazerac|15
cosmopolitan|20
martini|12

As a more complex example of tweaking special characters, you can redefine all special CSV characters: the delimiter, the quote character, and the escape character.

def cocktail = {(1, "_sazerac_"); (2, "cosmopolitan (or cosmo)"); (3, "_martini!__drink_")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

// configure the csv export:
def config[:syntax, :delim] = ';'
def config[:syntax, :quotechar] = '_'
def config[:syntax, :escapechar] = '!'

def export = export_csv[(:path, cloud_URI); (:data, data); config]

This example changed the delimiter to ;, the quote character to _, and the escape character to !. As a result, all _ and ! characters are escaped with a leading !.

The generated CSV file looks like this:

cocktail;price
_!_sazerac!__;15
cosmopolitan (or cosmo);20
_!_martini!!!_!_drink!__;12

Headers

You might often want to export the data but not the column names. You can do this by setting syntax:header_row to -1. For example:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

// configure the csv export
def config[:syntax, :header_row] = -1

def export = export_csv[(:path, cloud_URI); (:data, data); config]

The CSV file reads:

sazerac,15
cosmopolitan,20
martini,12

Missing Values

In the previous examples, the data output was a relation where all the values were present. This section explores what happens when some of the values are missing. It begins with a simple example of exporting the data but with some values missing across all columns:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

def export = export_csv[(:path, cloud_URI); (:data, data)]

Note that data has been added for another relation named rating. Here is the output:

cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,
martini,12,4.0
,10,
bellini,,3.5
,19,
,,2.0

As you can observe, whatever values were missing from the relations cocktail, price, and rating are simply empty in the final output. In essence, you are performing an outer join in this case when exporting the data.

If you would like to actually require a specific column to appear in the output, you would need to add a condition in the logic (see section Order and Presence of Columns).

In the case that you would like to use a specific string to indicate the absence of values instead of showing them as empty as in the example above, you can use the missingstring option. Here is the same example using the ? string to indicate missing values:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :missingstring] = "?"

def export = export_csv[(:path, cloud_URI); (:data, data); config]

Giving the output:

cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,?
martini,12,4.0
?,10,?
bellini,?,3.5
?,19,?
?,?,2.0

Order and Presence of Columns

In all previous cases, the columns of the output have been alphabetically ordered. This section discusses how to reorder, omit, and even add new, empty columns in the output. This is controlled by the config:syntax:header configuration. It should be a relation of the form: (column_position, column_name).

Here is an example of how to manually define the column order:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :header] = { (1, :rating) ; (2, :price) ; (3, :cocktail) }

def export = export_csv[(:path, cloud_URI); (:data, data); config]

giving the following output:

rating,price,cocktail
3.0,15,sazerac
4.0,20,cosmopolitan
3.5,12,martini

It is also easy to skip a column if you don’t want to include it in the output. The following example outputs only two of the three columns in the relation data:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :header] = {(1, :rating); (2, :cocktail)}

def export = export_csv[(:path, cloud_URI); (:data, data); config]

giving the output:

rating,cocktail
3.0,sazerac
4.0,cosmopolitan
3.5,martini

You can also specify columns that may not exist. The following example directs a column nonexistent, which is totally empty and has no reference in data, to appear in the exported CSV file:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (4, 4.0); (3, 3.5)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :header] = {(1, :rating); (2, :nonexistent); (3, :cocktail)}

def export = export_csv[(:path, cloud_URI); (:data, data); config]

giving the output:

rating,nonexistent,cocktail
3.0,,sazerac
,,cosmopolitan
3.5,,martini
4.0,,

Positional Keys

So far, you have used integer values as a positional key to control the order of the data records (aka rows) in the CSV file. If the integer values start at 1 and increase by 1 (without any gaps), the positional key corresponds to the data row in the CSV file.

Of course, it is advisable not to have overlapping key values. For example, the following code:

def cocktail = {(1, "sazerac"); (2, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
def export = export_csv[(:path, cloud_URI); (:data, data)]

will generate the output:

cocktail
sazerac
sazerac

where you can observe that the same value, i.e., sazerac, appears twice since it has different keys. Compare with the case where the same key appears twice:

def cocktail = {(1, "sazerac"); (1, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
def export = export_csv[(:path, cloud_URI); (:data, data)]

the tuple (1, "sazerac") appears only once:

cocktail
sazerac

Positional keys can start at any value and can have arbitrary step sizes. The positional key doesn’t even need to be an integer and can be anything that enforces an order. As the following subsections will show, you can use dates and/or compound keys as long as the data types used can be ordered.

Dates as Key

You can add the date when a particular drink was consumed and export the data.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def mydate = {
(1, parse_date["2020-03-03", "Y-m-d"]);
(2, parse_date["2020-02-02", "Y-m-d"]);
(3, parse_date["2020-01-01", "Y-m-d"])
}

def data(:cocktail, pos, v) =(id: cocktail(id, v) and mydate(id, pos))
def data(:price, pos, v) =(id: price(id, v) and mydate(id, pos))
def data(:mydate, pos, v) = mydate(_, v) and pos = v

def export = export_csv[(:path, cloud_URI); (:data, data)]

The original data is still keyed by an integer, but for the relation data, which will be exported, use the dates in mydate to key the data. Because the positional key, pos, is not exported, you need to define an additional column :mydate that also contains the dates.

The generated CSV file reads:

cocktail,mydate,price
martini,2020-01-01,12
cosmopolitan,2020-02-02,20
sazerac,2020-03-03,15

Compound Keys

The positional key doesn’t need to be a single value (like an integer or a date), but it can also be a collection of values, which is known as a compound key. Compound keys are useful for highly structured datasets with many structural levels. Additionally, a compound key is an ideal tool for group-by operations where the data is organized in groups where additional structure and/or order needs to be enforced.

Consider the cocktail example. In addition to the price, the example adds the name of the bar and the order history across all bars, which you store in a normalized form. The next step is to export the cocktail order history (cocktail name, location, price, and time) across all bars as shown in the following example.

def price["foobar"] = {("sazerac", 10); ("cosmopolitan", 15);}
def price["wunderbar"] = {("cosmopolitan", 14); ("martini", 16);}

// order information
def order_location = {1; 2; 3}, "foobar"
def order_location = {100; 101; 102}, "wunderbar"

def time_format = "YYYY-mm-dd HH:MM"
def order_time = {
(1, parse_datetime["2019-12-31 22:00", time_format]);
(2, parse_datetime["2019-12-31 23:00", time_format]);
(3, parse_datetime["2020-01-01 00:00", time_format]);
(100, parse_datetime["2019-12-31 22:30", time_format]);
(101, parse_datetime["2019-12-31 23:30", time_format]);
(102, parse_datetime["2020-01-01 00:30", time_format]);
}

def order_item = {
(1, "cosmopolitan");
(2, "sazerac");
(3, "cosmopolitan");
(101, "cosmopolitan");
(100, "cosmopolitan");
(102, "martini");
}


// join the data
def data =
:cocktail, order_item[order], order_time[order], order, order_item[order]
from order

def data =
:time, order_item[order], order_time[order], order, order_time[order]
from order

def data =
:location, order_item[order], order_time[order], order, order_location[order]
from order

def data =
:price, order_item[order], order_time[order], order, p
from order, p
where p = order_location[order].price[order_item[order]]


def export = export_csv[(:path, cloud_URI); (:data, data)]

The CSV file contains four columns defined by :cocktail, :location, :price, and :time in the relation data. The compounded positional key is the tuple of the three arguments stored after the column names, i.e.: order_item[order], order_time[order], order referring to the cocktail name, order time, and order ID.

As in the case for a single-valued positional key, the entries in the compound key are not exported. You can verify this by looking at the CSV file, where you can see that there is no order ID column. The cocktail name and the order time are shown in the CSV file because you explicitly constructed two columns, :time and :cocktail, which contain this information.

cocktail,location,price,time
cosmopolitan,foobar,15,2019-12-31T22:00:00
cosmopolitan,wunderbar,14,2019-12-31T22:30:00
cosmopolitan,wunderbar,14,2019-12-31T23:30:00
cosmopolitan,foobar,15,2020-01-01T00:00:00
martini,wunderbar,16,2020-01-01T00:30:00
sazerac,foobar,10,2019-12-31T23:00:00

Type-Overloaded Keys

There are certain cases where you can have data with type-overloaded keys. For instance, in the previous running example, you could have two bars where one keeps orders as an integer while the other one keeps them as string. Case in point, the code below exports the data where the order_item relation contains a key of two types, i.e., string and integer.

def order_location = {"111"; "222"}, "foobar"
def order_location = {111}, "wunderbar"

def order_item = {
("111", "cosmopolitan" );
("222", "sazerac" );
(111, "martini");
}

def data = :cocktail, order, order_item[order]
from order
def data = :location, order, order_location[order]
from order

def export = export_csv[(:path, cloud_URI); (:data, data)]

The output file looks as follows:

cocktail,location
cosmopolitan,foobar
sazerac,foobar
martini,wunderbar

You can see that the output is similar to sequentially appending the data from both key types. In other words, Rel can handle the fact that the keys are of different types and outputs the data accordingly. This is also true for compound keys with different arities, e.g., combining arity-1 and arity-2 keys (see section Compound Keys).

Multiple Exports

You can also perform multiple exports of data. Here is an example that exports the same data to two different files and chooses a different delimiter for each file. To do this, you need to assign each export instruction a unique key within the relation export (in this case, :one and :two) in order to distinguish the exports.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

def config1[:syntax, :delim] = '|'
def config2[:syntax, :delim] = ';'

def export[:one] = export_csv[(:path, cloud_URI1); (:data, data); config1]
def export[:two] = export_csv[(:path, cloud_URI2); (:data, data); config2]

The two CSV files look as follows:

>>> csv_data_1.csv
cocktail|price
sazerac|15
cosmopolitan|20
martini|12

>>> csv_data_2.csv
cocktail;price
sazerac;15
cosmopolitan;20
martini;12

Exporting Relations Sequentially

In certain cases, you may wish to export relations one after the other in a sequential fashion. This is useful, for example, if you want to export data from different relations to a single file. Here is an example that exports relations in such a way:

def cocktail = {"sazerac"; "cosmopolitan"; "martini"}
def price = {15; 20; 12}

def data(:cocktail, pos, v) = auto_number[cocktail](v, pos)
def data(:price, pos, v) = auto_number[price](v, pos)

def export = export_csv[(:path, cloud_URI); (:data, data)]

giving the output:

cocktail,price
cosmopolitan,
martini,
sazerac,
,12
,15
,20

The use of auto_number generates unique (i.e., non-overlapping) keys for each relation. Please note that because of auto_number, each relation has missing values because each value in cocktail and prices will get a different position value.

How Not to Use the export Relation

As discussed, populating the relation export triggers the data export. This functionality can be problematic if export is persisted in the database, for instance, when installing a model. In this case, export will be evaluated, and data will potentially be exported every time a request is sent to the database.

Generally, this behavior is not desired, and it is therefore advisable to define the export relation only as a transient predicate in a query and not as part of an installed model. See the Installing Models and the Updating Data: Working with EDB Relations Concept Guides for more details about installed predicates, updates, and read-only transactions.

Similarly, you should be careful not to install the definition of config used in an export, since it will then be picked up by other import or exports if they use the same config name.

See also