CSV Export

This how-to guide demonstrates how to export data to a CSV file using Rel.

Goal

The goal of this how-to guide is to demonstrate how to export data to a CSV format through a number of examples. After reading this guide, you should expect to know how to export CSV data using the different options available in Rel.

Simple Data Export

Let’s start by defining a simple data relation with just one column and exporting it to a cloud_URI location, as a CSV file

def config:data[:beverage] = {"water"; "orange juice"; "apple juice"}
def config:path = cloud_URI
def export = export_csv[config]

The export configuration is defined by a module that we call, by convention, config. The relation config:data contains a key, :beverage, that defines the CSV column name. Later, we will see how we can define the column names in a more general way. The CSV file generated by this query will look as follows:

beverage
apple juice
orange juice
water

In Rel, a data export is triggered when the relation export is properly populated. To export into CSV files, we use export_csv, which takes as input a configuration relation that defines all the appropriate options. This config relation maps the :path configuration key to its value and :data to the data. Section Export Options explains all available configuration options in detail.

The order in which the data is written to the file is not guaranteed. In the previous example, the output data happened to be sorted, but this is not necessarily always the case. As with relations in general, users should not rely on the rows being in any particular order.

Cloud Configuration

To run the examples in this tutorial, you will need to include your cloud parameters in the config relation.

Here is an example that showcases how to set the parameters for exporting to Azure Blob Storage through the use of the integration parameter within the export_csv. Please note that you will have to provide a valid sas_token and azure_url for the code to work. These tokens should be retrieved by the user through their valid Azure account and provided in the code below:

def data = ...

def config[:integration, :provider] = "azure"
def config[:integration, :credentials, :azure_sas_token] = "sv=2014-02-14&sr=b&si=TestPolicy&sig=o%2B5%2F0C%2BLm3tXWftMJvQGEKHlSt%2Afs5No7FTkHk5L%2Bv0%3D"

def azure_url = "https://myaccount.blob.core.windows.net/sascontainer/myfile.txt"

def export = export_csv[(:path, azure_url); (:data, data); config]

In the examples below, we asume that cloud_URI and config are defined so that the export operation can write to the cloud_URI location

Exporting Data with Multiple Columns

When we have more than one column, we expect the data relation to have tuples of the form (column, pos..., v), where pos... is a set of keys that group corresponding values into the same row.

In our simple example above, we were able to omit the pos... fields because we exported only a single column. When we have multiple columns to export, we need pos..., which may be a single or compound key (see section Compound Keys).

Here is an example of exporting an arity-2 relation, which contains cocktail names and their corresponding prices:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

def config:data = data
def config:path = cloud_URI

def export = export_csv[config]

As we can see, before we can export, we first need to collect all the data in one relation, i.e., data in this case. The relation data must follow the Coordinate Format (COO), i.e. : (column, row_id, value), where column should be a RelName. The positional key (row_id in this case) is not exported.

The generated CSV file for this example reads:

cocktail,price
sazerac,15
cosmopolitan,20
martini,12

Export Options

When exporting data to a CSV file

def export = export_csv[config]

all configurations are controlled via the relation config, which maps configuration keys, e.g., path, to their value, i.e., a particular cloud_URI. To export the data in the exact format needed, the following configuration options are available:

Required configuration keys:

keydescription
pathA string specifying the URI (location and name) where the data will be exported.

Optional configuration keys:

keydescription
dataThe relation that should be exported. It has generally of the form (:column_name, row_key..., value). If no relation is provided, then the output file defined by :path will be empty.
syntaxA relation containing syntax configuration.
integrationA relation containing storage integration configuration.

Syntax Options

The :syntax field can be used to specify formatting options and contains the following options:

keydescription
syntax:headerA relation containing symbols that can be used as column names, together with the desired ordering of the columns in the output. The idea here is that the user can override the default (or existing) header names.
syntax:header_rowThe row containing the header names. Values smaller than 1 indicate that no header should be written. Values of 1 or greater, write the header to the first row, which is also the default when this key is not present.
syntax:missingstringA String that will be used to represent missing values.
syntax:delimA Char that represents the delimiters used in the file to separate the columns. The default delimiter is a comma ','.
syntax:quotecharA Char that indicates a quoted field. Such a field can contain textual delimiters or newline characters. The default quote char is '"'.
syntax:escapecharThe Char used to escape the quote characters within a quoted (e.g., text) field. The default escape char is a backslash '\\'.

Integration Options

keydescription
integrationThe integration field allows for exporting to the cloud. In this case, we will also have to specify the provider and the credentials to be used for accessing the cloud.
integration:providerString, e.g., “azure”.
integration:credentials:azure_sas_tokenString with azure token.

Export Examples Using the Different Options

In this section, we demonstrate how to use the most common configuration options.

Delimiter and Quotes

In our first example, we will take a look at how to export string data that contains the default quote character ". To parse the string properly, the quote character must be escaped with a leading backslash, that is, \".

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini \"shaken not stirred\"")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

def config:data = data
def config:path = cloud_URI

def export = export_csv[config]

The resulting CSV file reads:

cocktail,price
sazerac,15
cosmopolitan,20
"martini \"shaken not stirred\"",12

Let’s also take a look at a simple example where we change the default delimiter to | instead of the comma ,.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

// configure the csv export:
def config[:syntax, :delim] = '|'

def config:data = data
def config:path = cloud_URI

def export = export_csv[(:path, cloud_URI); (:data, data); config]

The CSV output file reads:

cocktail|price
sazerac|15
cosmopolitan|20
martini|12

As a more complex example of tweaking special characters, let’s redefine all special CSV characters: the delimiter, the quote character, and the escape character.

def cocktail = {(1, "_sazerac_"); (2, "cosmopolitan (or cosmo)"); (3, "_martini!__drink_")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

// configure the csv export:
def config[:syntax, :delim] = ';'
def config[:syntax, :quotechar] = '_'
def config[:syntax, :escapechar] = '!'

def export = export_csv[(:path, cloud_URI); (:data, data); config]

In this example, we changed the delimiter to ;, the quote character to _, and the escape character to !. As a result all _ and ! characters will be escaped by having a leading !.

The generated CSV file looks as follows:

cocktail;price
_!_sazerac!__;15
cosmopolitan (or cosmo);20
_!_martini!!!_!_drink!__;12

Headers

We often want to export the data but not the column names. This is done by setting syntax:header_row to -1. For example:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

// configure the csv export
def config[:syntax, :header_row] = -1

def export = export_csv[(:path, cloud_URI); (:data, data); config]

The CSV file reads:

sazerac,15
cosmopolitan,20
martini,12

Missing Values

In the previous examples, the data output was a relation where all the values were present. In this section, we explore what happens when some of the values are missing. Let’s start with a simple example of exporting the data, but with some values missing across all columns:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

def export = export_csv[(:path, cloud_URI); (:data, data)]

Note that we added data for another relation named rating. Here is the output:

cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,
martini,12,4.0
,10,
bellini,,3.5
,19,
,,2.0

As you can observe, whatever values were missing from the relations cocktail, price, and rating are simply empty in the final output. In essence, we are performing an outer join in this case when exporting the data.

If we would like to actually require a specific column to appear in the output we would need to add a condition in the logic (see section Order and Presence of Columns).

In the case that we would like to use a specific string to indicate the absence of values instead of showing them as empty as in the example above, we can use the missingstring option. Here is the same example using the ? string to indicate missing values:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :missingstring] = "?"

def export = export_csv[(:path, cloud_URI); (:data, data); config]

Giving the output:

cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,?
martini,12,4.0
?,10,?
bellini,?,3.5
?,19,?
?,?,2.0

Order and Presence of Columns

In all previous cases, the columns of the output were alphabetically ordered. In this section, we discuss how we can reorder, omit, and even add new empty columns in the output. This is controlled by the config:syntax:header configuration. It should be a relation of the form: (column_position, column_name).

Here is an example were we manually define the column order:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :header] = { (1, :rating) ; (2, :price) ; (3, :cocktail) }

def export = export_csv[(:path, cloud_URI); (:data, data); config]

giving the following output:

rating,price,cocktail
3.0,15,sazerac
4.0,20,cosmopolitan
3.5,12,martini

It is also easy to skip a column if we don’t wish to include it in the output. The following example outputs only two of the three columns in the relation data:

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :header] = {(1, :rating); (2, :cocktail)}

def export = export_csv[(:path, cloud_URI); (:data, data); config]

giving the output:

rating,cocktail
3.0,sazerac
4.0,cosmopolitan
3.5,martini

We can also specify columns that may not exist. In the following example, we direct that a column nonexistent, which is totally empty and has no reference in data, will appear in the exported CSV file.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (4, 4.0); (3, 3.5)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)

// configure the csv export
def config[:syntax, :header] = {(1, :rating); (2, :nonexistent); (3, :cocktail)}

def export = export_csv[(:path, cloud_URI); (:data, data); config]

giving the output:

rating,nonexistent,cocktail
3.0,,sazerac
,,cosmopolitan
3.5,,martini
4.0,,

Positional Keys

So far, we have used integer values as a positional key to control the order of the data records (aka rows) in the CSV file. If the integer values start at 1 and increase by 1 (without any gaps), then the positional key corresponds to the data row in the CSV file.

Of course, we need to be careful not to have overlapping key values. For example, the following code:

def cocktail = {(1, "sazerac"); (2, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
def export = export_csv[(:path, cloud_URI); (:data, data)]

will generate the output:

cocktail
sazerac
sazerac

where we observe that the same value, i.e., sazerac, appears twice since it has different keys. Compare with the case where the same key appears twice:

def cocktail = {(1, "sazerac"); (1, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
def export = export_csv[(:path, cloud_URI); (:data, data)]

the tuple (1, "sazerac") appears only once:

cocktail
sazerac

Positional keys can start at any value and can have arbitrary step sizes. The positional key doesn’t even need to be an integer and can be anything that enforces an order. As we will see in the following subsections, we can use dates and/or compound keys, as long as the data types used can be ordered.

Dates as Key

Let’s add the date when a particular drink was consumed and export the data.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def mydate = {
(1, parse_date["2020-03-03", "Y-m-d"]);
(2, parse_date["2020-02-02", "Y-m-d"]);
(3, parse_date["2020-01-01", "Y-m-d"])
}

def data(:cocktail, pos, v) =(id: cocktail(id, v) and mydate(id, pos))
def data(:price, pos, v) =(id: price(id, v) and mydate(id, pos))
def data(:mydate, pos, v) = mydate(_, v) and pos = v

def export = export_csv[(:path, cloud_URI); (:data, data)]

The original data is still keyed by an integer, but for the relation data, which will be exported, we use the dates in mydate to key our data. Because the positional key, pos, is not exported, we need to define additionally a column :mydate that also contains the dates.

The generated CSV file reads:

cocktail,mydate,price
martini,2020-01-01,12
cosmopolitan,2020-02-02,20
sazerac,2020-03-03,15

Compound Keys

The positional key doesn’t need to be a single value (like an integer or a date), but, it can also be a collection of values which is known as a compound key. Compound keys are useful for highly structured datasets with many structural levels. Additionally, a compound key is an ideal tool for group-by operations where the data is organized in groups where additional structure and/or order needs to be enforced.

Let’s consider our cocktail example. In addition to the price, we add the name of the bar and order history across all bars, which we store in a normalized form. Next, we export the cocktail order history (cocktail name, location, price, and time) across all bars as shown in the following example.

def price["foobar"] = {("sazerac", 10); ("cosmopolitan", 15);}
def price["wunderbar"] = {("cosmopolitan", 14); ("martini", 16);}

// order information
def order_location = {1; 2; 3}, "foobar"
def order_location = {100; 101; 102}, "wunderbar"

def time_format = "YYYY-mm-dd HH:MM"
def order_time = {
(1, parse_datetime["2019-12-31 22:00", time_format]);
(2, parse_datetime["2019-12-31 23:00", time_format]);
(3, parse_datetime["2020-01-01 00:00", time_format]);
(100, parse_datetime["2019-12-31 22:30", time_format]);
(101, parse_datetime["2019-12-31 23:30", time_format]);
(102, parse_datetime["2020-01-01 00:30", time_format]);
}

def order_item = {
(1, "cosmopolitan");
(2, "sazerac");
(3, "cosmopolitan");
(101, "cosmopolitan");
(100, "cosmopolitan");
(102, "martini");
}


// join the data
def data =
:cocktail, order_item[order], order_time[order], order, order_item[order]
from order

def data =
:time, order_item[order], order_time[order], order, order_time[order]
from order

def data =
:location, order_item[order], order_time[order], order, order_location[order]
from order

def data =
:price, order_item[order], order_time[order], order, p
from order, p
where p = order_location[order].price[order_item[order]]


def export = export_csv[(:path, cloud_URI); (:data, data)]

The CSV file contains 4 columns defined by :cocktail, :location, :price, and :time in the relation data. The compounded positional key is the tuple of the three arguments stored after the column names, i.e.: order_item[order], order_time[order], order referring to the cocktail name, order time, and order ID.

As in the case for a single-valued positional key, the entries in the compound key are not exported. We can verify this by looking at the CSV file, where we see that there is no order ID column. The cocktail name and the order time are shown in the CSV file because we explicitly constructed two columns, :time and :cocktail, that contain this information.

cocktail,location,price,time
cosmopolitan,foobar,15,2019-12-31T22:00:00
cosmopolitan,wunderbar,14,2019-12-31T22:30:00
cosmopolitan,wunderbar,14,2019-12-31T23:30:00
cosmopolitan,foobar,15,2020-01-01T00:00:00
martini,wunderbar,16,2020-01-01T00:30:00
sazerac,foobar,10,2019-12-31T23:00:00

Type-Overloaded Keys

There are certain cases where we can have data with type-overloaded keys. For instance, in our running example, we could have two bars where one keeps orders as an integer while the other one keeps them as string. Case in point, the code below exports the data where the order_item relation contains a key of two types, i.e., string and integer.

def order_location = {"111"; "222"}, "foobar"
def order_location = {111}, "wunderbar"

def order_item = {
("111", "cosmopolitan" );
("222", "sazerac" );
(111, "martini");
}

def data = :cocktail, order, order_item[order]
from order
def data = :location, order, order_location[order]
from order

def export = export_csv[(:path, cloud_URI); (:data, data)]

The output file looks as follows:

cocktail,location
cosmopolitan,foobar
sazerac,foobar
martini,wunderbar

We observe that the output is similar to sequentially appending the data from both key types. In other words, Rel can handle the fact that the keys are of different types and outputs the data accordingly. This is also true for compound keys with different arities, e.g., combining arity-1 and arity-2 keys (see section Compound Keys).

Multiple Exports

We can also perform multiple exports of data. Here is an example that exports the same data to two different files and chooses a different delimiter for each file. To do this, we need to assign each export instruction a unique key within the relation export (in this case, :one and :two) in order to distinguish the exports.

def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}

def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)

def config1[:syntax, :delim] = '|'
def config2[:syntax, :delim] = ';'

def export[:one] = export_csv[(:path, cloud_URI1); (:data, data); config1]
def export[:two] = export_csv[(:path, cloud_URI2); (:data, data); config2]

The two CSV files look as follows:

>>> csv_data_1.csv
cocktail|price
sazerac|15
cosmopolitan|20
martini|12

>>> csv_data_2.csv
cocktail;price
sazerac;15
cosmopolitan;20
martini;12

Exporting Relations Sequentially

In certain cases, we may wish to export relations one after the other in a sequential fashion. This is useful, for example, if we want to export data from different relations to a single file. Here is an example that exports relations in such a way:

def cocktail = {"sazerac"; "cosmopolitan"; "martini"}
def price = {15; 20; 12}

def data(:cocktail, pos, v) = auto_number[cocktail](v, pos)
def data(:price, pos, v) = auto_number[price](v, pos)

def export = export_csv[(:path, cloud_URI); (:data, data)]

Giving the output:

cocktail,price
cosmopolitan,
martini,
sazerac,
,12
,15
,20

The use of auto_number generates unique (i.e., non-overlapping) keys for each relation. Please note that because of auto_number each relation has missing values, because each value in cocktail and prices will get a different position value.

How Not to Use the export Relation

As discussed, populating the relation export triggers the data export. This functionality can be problematic if export is persisted in the database, for instance when installing a source. In this case, export will be evaluated, and data will be potentially exported, every time a request is sent to the database.

Generally, this behavior is not desired and it is therefore advisable to define the export relation only as a query call and not as an installed source. See the Installing Model Sources and the Updating Data: Working with EDB Relations concept guides for more details about “install source” and “query” transactions.

Similarly, you should be careful not to install the definition of config used in an export, since it will then be picked up by other exports if they use the same config name.

See also