CSV Export
This how-to guide demonstrates how to export CSV data using Rel.
Goal
The goal of this how-to guide is to demonstrate how to export data to a CSV format through a number of examples. After reading this guide, you should expect to know how to export CSV data using the different options available in Rel.
Data Exporting Basics
Start by defining a data
relation with one column and exporting it to the desired location as a CSV file:
module config
def data[:beverage] = {"water"; "orange juice"; "apple juice"}
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
Here, path
refers to any resource within your cloud service provider where you intend to export the data.
See Accessing the Cloud for more details.
The export configuration is defined by a module called, by convention, config
.
See Modules for more details.
Note that config
is an arbitrary module name and will be used throughout this guide, but you can give it any other name.
The relation config:data
contains a key, :beverage
, that defines the CSV column name.
Later, you’ll see how to define the column names in a more general way.
The CSV file generated by this query will look like this:
beverage
apple juice
orange juice
water
In RAI, a data export is triggered when the control relation export
is properly populated.
To export into CSV files, you can use export_csv
,
which takes as input a configuration relation that defines all the appropriate options.
This config
relation maps the :path
configuration key
to its value and :data
to the data.
Section Export Options explains all available configuration options in detail.
The order in which the data are written to the file is not guaranteed. In the previous example, the output data were sorted, but this sorting doesn’t necessarily occur all the time. As is generally the case with relations, the rows of data can appear in any order.
The rest of the examples in this guide assume that config
and path
are defined so that the export operation can write to the path
location.
To learn more about cloud parameters and how to configure your path
, see Accessing the Cloud.
When there is more than one column,
you can expect the data
relation to have tuples of the form (column, pos..., v)
,
where pos...
is a set of keys that group corresponding values into the same row.
In the example above, it was possible to omit the pos...
fields because only a single column was exported.
When you have multiple columns to export, you need pos...
,
which may be a single or compound key (see section Compound Keys).
Here is an example of exporting an arity-2 relation, which contains cocktail names and their corresponding prices:
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
module config
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
As you can see, before exporting, you need to collect all the data in one relation, i.e., data
in this case.
The relation data
must follow the Coordinate Format (COO), i.e. : (column, row_id, value)
, where column
should be a Symbol (of typeRelName
).
The positional key (row_id
in this case) is not exported.
The generated CSV file for this example reads:
cocktail,price
sazerac,15
cosmopolitan,20
martini,12
Export Options
When exporting data to a CSV file, all configurations are controlled via the relation config
:
def export = export_csv[config]
This relation maps configuration keys, for example path
, to their value.
To ensure that the data are exported in the desired format, you can use one of these configuration options:
Option | Description |
---|---|
path | A string specifying the URI (location and name) where the data will be exported. See Accessing the Cloud for more details. |
data | The relation to be exported. It generally has the form (:column_name, row_key..., value) . If no relation is provided, the output file defined by :path will be empty. |
integration | The credentials needed to export data to the cloud. See Accessing the Cloud for more details. |
syntax | A relation containing syntax configuration. |
partition_size | An int specifying the partition size of the produced CSV file in MB. For example, with partition_size=100 , a 1GB CSV file will be exported in 10 partitions of roughly 100MB each. You can suppress the partitioning altogether, though this may impact export performance, by setting partition_size=0 — the default value. |
compression | Τhe algorithm used to compress the CSV output file. Currently, only gzip is supported. |
Exporting CSV data to local files is not currently supported.
Syntax
The syntax
field can be used to specify formatting options and contains the following options:
key | description |
---|---|
syntax:header | A relation with (Int, RelName) pairs, specifying the column names in the exported file, where the Int indices indicate the column order, and the column name is provided as a Symbol (of type RelName ). This option overrides the default (or existing) header names. |
syntax:header_row | An Int to control whether or not a header row, i.e., column names, is exported. For values of 1 or greater, the header is written in the first row, which is also the default case when this option is not defined. |
syntax:missingstring | A String that will be used to represent missing values. |
syntax:delim | A Char that represents the delimiters used in the file to separate the columns. The default delimiter is a comma ',' . |
syntax:quotechar | A Char that indicates a quoted field. Such a field can contain textual delimiters or newline characters. The default quote char is '"' . |
syntax:escapechar | The Char used to escape the quote characters within a quoted (e.g., text) field. The default escape char is a backslash '\\' . |
Export Examples
This section demonstrates how to use the most common configuration options.
Delimiter and Quotes
The first example shows how to export string data that contain the default quote character "
.
To parse the string properly, the quote character must be escaped with a leading backslash, that is, \"
.
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini \"shaken not stirred\"")}
def price = {(1, 15); (2, 20); (3, 12)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
module config
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
The resulting CSV file reads:
cocktail,price
sazerac,15
cosmopolitan,20
"martini \"shaken not stirred\"",12
You can also take a look at this simple example changing the default delimiter to |
instead of the comma ,
.
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
// Configure the CSV export.
module config
def syntax:delim = '|'
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
The CSV output file reads:
cocktail|price
sazerac|15
cosmopolitan|20
martini|12
As a more complex example of tweaking special characters, you can redefine all special CSV characters: the delimiter, the quote character, and the escape character.
def cocktail = {(1, "_sazerac_"); (2, "cosmopolitan (or cosmo)"); (3, "_martini!__drink_")}
def price = {(1, 15); (2, 20); (3, 12)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
// Configure the CSV export.
module config
def syntax:delim = ';'
def syntax:quotechar = '_'
def syntax:escapechar = '!'
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
This example changed the delimiter to ;
, the quote character to _
, and the escape character to !
.
As a result, all _
and !
characters are escaped with a leading !
.
The generated CSV file looks like this:
cocktail;price
_!_sazerac!__;15
cosmopolitan (or cosmo);20
_!_martini!!!_!_drink!__;12
Headers
You might often want to export the data but not the column names.
You can do this by setting syntax:header_row
to a value less than 1
. For example:
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
// Configure the csv export.
module config
def syntax:header_row = -1
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
The CSV file reads:
sazerac,15
cosmopolitan,20
martini,12
Missing Values
In the previous examples, the data output was a relation where all the values were present. This section explores what happens when some of the values are missing. It begins with a simple example of exporting the data but with some values missing across all columns:
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
module config
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
Note that the data have been added for another relation named rating
. Here’s the output:
cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,
martini,12,4.0
,10,
bellini,,3.5
,19,
,,2.0
As you can observe, whatever values were missing from the relations cocktail
, price
, and rating
are simply empty in the final output.
In essence, you are performing an outer join in this case when exporting the data.
If you would like to actually require a specific column to appear in the output, you would need to add a condition in the logic (see section Order and Presence of Columns).
In the case that you would like to use a specific string to indicate the absence of values instead of showing them as empty as in the example above, you can use the missingstring
option.
Here is the same example using the ?
string to indicate missing values:
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini"); (5, "bellini")}
def price = {(1, 15); (2, 20); (3, 12); (4, 10); (6,19)}
def rating = {(1, 3.0); (3, 4.0); (5, 3.5); (7, 2.0)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
// Configure the CSV export.
module config
def syntax:missingstring = "?"
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
Giving the output:
cocktail,price,rating
sazerac,15,3.0
cosmopolitan,20,?
martini,12,4.0
?,10,?
bellini,?,3.5
?,19,?
?,?,2.0
Order and Presence of Columns
In all previous cases, the columns of the output have been alphabetically ordered.
This section discusses how to reorder, omit, and even add new, empty columns in the output.
This is controlled by the config:syntax:header
configuration.
It should be a relation of the form: (column_position, column_name)
.
Here is an example of how to manually define the column order:
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
// Configure the CSV export.
module config
def syntax:header = { (1, :rating) ; (2, :price) ; (3, :cocktail) }
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
giving the following output:
rating,price,cocktail
3.0,15,sazerac
4.0,20,cosmopolitan
3.5,12,martini
It is also easy to skip a column if you don’t want to include it in the output.
The following example outputs only two of the three columns in the relation data
:
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (2, 4.0); (3, 3.5)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
// Configure the CSV export.
module config
def syntax:header = {(1, :rating); (2, :cocktail)}
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
giving the output:
rating,cocktail
3.0,sazerac
4.0,cosmopolitan
3.5,martini
You can also specify columns that may not exist.
The following example directs a column nonexistent
, which is totally empty and has no reference in data
, to appear in the exported CSV file:
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def rating = {(1, 3.0); (4, 4.0); (3, 3.5)}
def data(:cocktail, pos, v) = cocktail(pos, v)
def data(:price, pos, v) = price(pos, v)
def data(:rating, pos, v) = rating(pos, v)
// Configure the CSV export.
module config
def syntax:header = {(1, :rating); (2, :nonexistent); (3, :cocktail)}
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
giving the output:
rating,nonexistent,cocktail
3.0,,sazerac
,,cosmopolitan
3.5,,martini
4.0,,
Positional Keys
So far, you have used integer values as a positional key to control the order of the data records (or rows) in the CSV file. If the integer values start at 1 and increase by 1 (without any gaps), the positional key corresponds to the data row in the CSV file.
It’s best not to have overlapping key values. For example, the following code:
def cocktail = {(1, "sazerac"); (2, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
module config
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
will generate the output:
cocktail
sazerac
sazerac
where you can observe that the same value, i.e., sazerac
, appears twice since it has different keys.
Compare with the case where the same key appears twice:
def cocktail = {(1, "sazerac"); (1, "sazerac")}
def data(:cocktail, pos, v) = cocktail(pos, v)
module config
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
the tuple (1, "sazerac")
appears only once:
cocktail
sazerac
Positional keys can start at any value and can have arbitrary step sizes. The positional key doesn’t even need to be an integer and can be anything that enforces an order. As the following subsections will show, you can use dates and/or compound keys as long as the data types used can be ordered.
Dates as Key
You can add the date when a particular drink was consumed and export the data.
def cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def price = {(1, 15); (2, 20); (3, 12)}
def mydate = {
(1, parse_date["2020-03-03", "Y-m-d"]);
(2, parse_date["2020-02-02", "Y-m-d"]);
(3, parse_date["2020-01-01", "Y-m-d"])
}
def data(:cocktail, pos, v) = ∃(id: cocktail(id, v) and mydate(id, pos))
def data(:price, pos, v) = ∃(id: price(id, v) and mydate(id, pos))
def data(:mydate, pos, v) = mydate(_, v) and pos = v
module config
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
The original data are still keyed by an integer, but the relation data
, which will be exported, uses the dates in mydate
to key the data.
Because the positional key, pos
, is not exported, you need to define an additional column :mydate
that also contains the dates.
The generated CSV file reads:
cocktail,mydate,price
martini,2020-01-01,12
cosmopolitan,2020-02-02,20
sazerac,2020-03-03,15
Compound Keys
The positional key doesn’t need to be a single value (like an integer or a date), but it can also be a collection of values, which is known as a compound key. Compound keys are useful for highly structured datasets with many structural levels. Additionally, a compound key is an ideal tool for group-by operations where the data are organized in groups where additional structure and/or order needs to be enforced.
Consider the cocktail example. In addition to the price, the example adds the name of the bar and the order history across all bars, which you store in a normalized form. The next step is to export the cocktail order history (cocktail name, location, price, and time) across all bars as shown in the following example.
def price["foobar"] = {("sazerac", 10); ("cosmopolitan", 15);}
def price["wunderbar"] = {("cosmopolitan", 14); ("martini", 16);}
// Order the information.
def order_location = {1; 2; 3}, "foobar"
def order_location = {100; 101; 102}, "wunderbar"
def time_format = "YYYY-mm-dd HH:MM"
def order_time = {
(1, parse_datetime["2019-12-31 22:00", time_format]);
(2, parse_datetime["2019-12-31 23:00", time_format]);
(3, parse_datetime["2020-01-01 00:00", time_format]);
(100, parse_datetime["2019-12-31 22:30", time_format]);
(101, parse_datetime["2019-12-31 23:30", time_format]);
(102, parse_datetime["2020-01-01 00:30", time_format]);
}
def order_item = {
(1, "cosmopolitan");
(2, "sazerac");
(3, "cosmopolitan");
(101, "cosmopolitan");
(100, "cosmopolitan");
(102, "martini");
}
// Join the data.
def data:cocktail =
order_item[order], order_time[order], order, order_item[order]
from order
def data:time =
order_item[order], order_time[order], order, order_time[order]
from order
def data:location =
order_item[order], order_time[order], order, order_location[order]
from order
def data:price =
order_item[order], order_time[order], order, p
from order, p
where p = order_location[order].price[order_item[order]]
// Set the export configuration.
module config
def data = data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
The CSV file contains four columns defined by :cocktail
, :location
, :price
, and :time
in the relation data
.
The compounded positional key is the tuple of the three arguments stored after the column names, i.e.: order_item[order], order_time[order], order
referring to the cocktail name, order time, and order ID.
As in the case for a single-valued positional key, the entries in the compound key are not exported.
You can verify this by looking at the CSV file, where you can see that there is no order ID column.
The cocktail name and the order time are shown in the CSV file because you explicitly constructed two columns, :time
and :cocktail
, which contain this information.
cocktail,location,price,time
cosmopolitan,foobar,15,2019-12-31T22:00:00
cosmopolitan,wunderbar,14,2019-12-31T22:30:00
cosmopolitan,wunderbar,14,2019-12-31T23:30:00
cosmopolitan,foobar,15,2020-01-01T00:00:00
martini,wunderbar,16,2020-01-01T00:30:00
sazerac,foobar,10,2019-12-31T23:00:00
When using multiple keys, it’s necessary that the keys are nonspecialized. This means that you cannot use RelNames.
Type-Overloaded Keys
There are certain cases where you can have data with type-overloaded keys.
For instance, in the previous running example, you could have two bars where one keeps orders as an integer
while the other one keeps them as string
.
Case in point, the code below exports the data where the order_item
relation contains a key of two types, i.e., string
and integer
.
def order_location = {"111"; "222"}, "foobar"
def order_location = {111}, "wunderbar"
def order_item = {
("111", "cosmopolitan" );
("222", "sazerac" );
(111, "martini");
}
module mydata
def cocktail[order] = order_item[order]
def location[order] = order_location[order]
end
// Export configuration
module config
def data = mydata
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
The output file looks as follows:
cocktail,location
cosmopolitan,foobar
sazerac,foobar
martini,wunderbar
You can see that the output is similar to sequentially appending the data from both key types. In other words, Rel can handle the fact that the keys are of different types and outputs the data accordingly. This is also true for compound keys with different arities, e.g., combining arity-1 and arity-2 keys (see section Compound Keys).
Multiple Exports
You can also perform multiple exports of data.
Here is an example that exports the same data to two different files and chooses a different delimiter for each file.
To do this, you need to assign each export instruction a unique key within the relation export
(in this case, :one
and :two
) in order to distinguish the exports. Note that the unique keys within the relation export
need to be specialized values.
def data:cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def data:price = {(1, 15); (2, 20); (3, 12)}
def config_base:data = data
module config1
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile1.csv"
def syntax:delim = '|'
end
module config2
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile2.csv"
def syntax:delim = ';'
end
def export[:one] = export_csv[config1 <++ config_base]
def export[:two] = export_csv[config2 <++ config_base]
The two CSV files look as follows:
>>> csv_data_1.csv
cocktail|price
sazerac|15
cosmopolitan|20
martini|12
>>> csv_data_2.csv
cocktail;price
sazerac;15
cosmopolitan;20
martini;12
Note that through the use of the left override operator, <++
, configurations that are not defined within the modules config1
and config2
but are defined in config_base
are used as default.
This ensures that multiple values are not assigned to the same configuration — a case that can easily happen when using the union operator, ;
, since overwriting is not applied.
Exporting Data in Partitions
For large relations, you can speed up the exporting time by exporting the data in partitions.
You can specify the partition size using the partition_size
option as follows:
// Export configuration
module config
def data = large_relation
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
def partition_size = 100
end
def export = export_csv[config]
In this case, there will be a set of files written using the path
name that was specified above.
Each partition will contain an underscore _
and a number.
For example, for config:path="azure://myurl.com/simple-drinks.csv"
, the following partitions will be created:
azure://myurl.com/simple-drinks_1.csv
,
azure://myurl.com/simple-drinks_2.csv
, etc.
Each file will have a size of roughly 100MB.
Exporting Compressed Data
You can export compressed data in a .gzip
file format by using the compression
option as follows:
def my_data:cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def my_data:price = {(1, 15); (2, 20); (3, 12)}
// Export configuration
module config
def data = my_data
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
def compression = "gzip"
end
def export = export_csv[config]
This code will create a compressed CSV file in the path
location specified above.
Note that you can export compressed data in partitions using both the compression
and partition_size
options.
Exporting to String
In addition to exporting data to a remote file, you can also export a relation to a CSV string within the database through csv_string
.
Using csv_string
is very similar to using export_csv
.
Both support the same exporting options.
However, instead of writing them to an external file, using csv_string
stores the CSV data in a string within the database.
Here’s an example that exports a relation to a string using |
as the delimiter:
// read query
def mydata:cocktail = {(1, "sazerac"); (2, "cosmopolitan"); (3, "martini")}
def mydata:price = {(1, 15); (2, 20); (3, 12)}
module config
def data = mydata
def syntax:delim = '|'
end
def output = csv_string[config]
As previously mentioned, when multiple keys are present, they are not exported.
For example, consider the following code that has an additional key for the states "CA"
and "NY"
:
// read query
module config
def data = {
:cocktail, 1, "CA", "sazerac" ;
:cocktail, 1, "NY", "daiquiri" ;
:price, 1, "CA", 10 ;
:price, 1, "NY", 20
}
end
def output = csv_string[config]
This will give the following output, where only the cocktail and the price are exported:
When using csv_string
with multiple keys, it’s necessary that the keys are nonspecialized. This means that you cannot use RelNames.
Exporting Non-GNF Relations
In general, export_csv
expects the relation you want to export to be in Graph Normal Form.
In other words, it expects the relation to be in the form (column, position, values)
.
Although not recommended, you may have relations that are not in GNF. For example, consider the following relation:
def R = {
("alpha", 10) ;
("beta", 20) ;
("delta", 40)
}
In order to properly export R
, you need to convert it to GNF.
The following code converts R
to GNF, defines two column names (the first called letter
and the second called value
), and exports the data:
def R = {
("alpha", 10) ;
("beta", 20) ;
("delta", 40)
}
def mydata =
{ (:letter, pos, x); (:value, pos, y) }
from pos, x, y
where
enumerate[R](pos, x, y)
// Export configuration
module config
def data = mydata
def path = "azure://myaccount.blob.core.windows.net/sascontainer/myfile.csv"
end
def export = export_csv[config]
How Not to Use the export
Relation
As discussed, populating the relation export
triggers the data export.
This functionality can be problematic if export
is persisted in the database,
for instance, when loading a model.
In this case, export
will be evaluated, and data will potentially be exported every time a request is sent to the database.
Generally, this behavior is not desired, and it is therefore advisable to define the export
relation only as a
transient predicate in a query and not as part of an installed model.
See Working With Models
and Working With Base Relations for more details.
Similarly, you should be careful not to install the definition of config
used in an export,
since it will then be picked up by other import or exports if they use the same config
name.
See Also
See the CSV Import guide to learn how to import CSV data. For more details on how to interact with data, see the rest of the Data Import and Export guides.
See also the Working With CSV Data concept guide, where you can find all the relevant information about how to work with CSV data in Rel, including querying, data analysis, and data manipulation.