Data Visualization (Vega-Lite)

This How-To Guide demonstrates how to create graphical representations of data with Rel and Vega-Lite.

Layered Chart

You can download this tutorial as a RAI notebook by clicking here.

Goal

This How-To Guide will provide the steps required to enable you to visualize your data in Rel using the RAI Notebook.

Introduction

In this How-To Guide, we showcase how to use Vega-Lite from within Rel in order to visualize data. The code presented here can be easily adapted to different kinds of charts (e.g., creating a sorted bar chart instead of a regular bar chart).

Creating a chart using Rel and Vega-Lite can be as simple as using the appropriate chart type (vegalite:bar in our case) and applying it to some data:

query
// prepare data
def in_data:year = {(1, 2018); (2, 2019); (3, 2020); (4, 2021)}
def in_data:sales = {(1, 100); (2, 120); (3, 65); (4, 180)}

// plot it
def output = vegalite:plot[
vegalite:bar[:year, :sales, {:data, in_data}]
]
Simple Bar Chart

In the following sections, we will discuss in more detail how to prepare data from different sources as well as how to configure and plot different charts.

Please note that the code presented in this How-To Guide will produce graphical output in the RelationalAI Notebook. Everywhere else (e.g.: RelationalAI SDK), relations are returned in the standard, non-graphical form.

Preparing the Data

Let’s begin by discussing how to prepare our data in order to easily plot them with Vega-Lite. At a high level, our data needs to be set up as a JSON array of the form: (:[], position, attribute, value). Here is a small example, where we insert data directly in a relation called small_data.

query
def small_data[:[], 1, :category] = "Alpha"
def small_data[:[], 1, :value] = 28
def small_data[:[], 2, :category] = "Beta"
def small_data[:[], 2, :value] = 55
def small_data[:[], 3, :category] = "Gamma"
def small_data[:[], 3, :value] = 43

def output = small_data

Relation: output

:[]1:category"Alpha"
:[]1:value28
:[]2:category"Beta"
:[]2:value55
:[]3:category"Gamma"
:[]3:value43

In this How-To Guide, we will initially work with toy data that we will input directly. In later examples, we will leverage existing datasets that are larger.

In our first example, we have three data points in the array. Each data point contains the attributes :category and :value.

We can now use the data in this array format to create plots as we will see in the later sections. We can see that instead of providing each data item and structuring the array by hand, we can use Rel to do the same thing with fewer lines of code:

install
def in_data = {("Alpha", 28); ("Beta", 55); ("Gamma", 43)}
def small_data[:[], i] = {(:category, a); (:value, b)}
from a, b where sort[in_data](i, a, b)

CSV Data

This is very useful when we already have our data in an existing relation (such as in_data in this case) and we want to easily convert it to the array form for plotting with Vega-Lite.

In our next example, we use an existing dataset that we will directly import into Rel before using it for plotting. We will use the penguin dataset located in our public S3 bucket. This dataset contains data for a set of attributes (e.g., species, flipper length, sex) for 152 penguins. We discuss it in more detail in our Machine Learning (Classification) How-To Guide.

Now that we have our data set, we write the code to load it and convert it to the appropriate format for Vega-Lite.

install
// data location
def penguin_config:path = "s3://relationalai-documentation-public/ml-classification/penguin/penguins_size.csv"

// data schema
def penguin_config:schema:species = "string"
def penguin_config:schema:island = "string"
def penguin_config:schema:culmen_length_mm = "float"
def penguin_config:schema:culmen_depth_mm = "float"
def penguin_config:schema:flipper_length_mm = "float"
def penguin_config:schema:body_mass_g = "float"
def penguin_config:schema:sex = "string"

def penguins = lined_csv[load_csv[penguin_config]]

// clean the data to remove NA and .
def row_with_error(row) =
penguins:sex(row, "NA") or
penguins:sex(row, ".") or
penguins:load_errors(row, _, _)

def penguins_clean(column, row, entry...) =
penguins(column, row, entry...) and not row_with_error(row)

// prepare data in array format
def penguin_data[:[], i, col] = penguins_clean[col, i]

We will be using both small_data and penguin_data in the following examples as we continue.

Configuring the Plot

Assigning Data

Once we have prepared our data in this specific form, we can now define a chart relation and provide the chart parameters that we would like to use.

We start by providing the data for our chart, which needs to be assigned in the :values field under a :data field:

query
// set up data to plot
def chart:data:values = small_data

def output = chart

Relation: output

:data:values:[]1:category"Alpha"
:data:values:[]1:value28
:data:values:[]2:category"Beta"
:data:values:[]2:value55
:data:values:[]3:category"Gamma"
:data:values:[]3:value43

Or, similarly for the penguin dataset:

def chart:data:values = penguin_data

For certain operations in plots using Vega-Lite, Rel provides convenience relations that can be helpful when configuring plots. An example of this is vegalite_utils:data, which can help create the appropriate data format for plotting:

query
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}

def chart = vegalite_utils:data[simple_data]
def output = chart

Relation: output

:data:values:[]1:category"Alpha"
:data:values:[]1:value28
:data:values:[]2:category"Beta"
:data:values:[]2:value55
:data:values:[]3:category"Gamma"
:data:values:[]3:value43

This approach is useful when we have data specified as columns in a relation, such as simple_data in this case. We note that the chart relation is identical to the one created earlier using small_data as far as the data configuration is concerned.

Styling the Graph

Next we can specify the chart type that we would like to use. To do this in Rel, we can use the :mark and :type fields of the chart to specify the type that we wish to use. For example, if we wanted to use a bar chart:

def chart:mark:type = "bar"

This is equivalent to using vegalite:bar as we did at the beginning of this How-to Guide.

In a similar fashion, we can specify that we would like to enable tooltips:

def chart:mark:tooltip = boolean_true

Finally, we can provide a specification for the axes of our chart, using the :encoding, :x, :y, :type and :field fields. For example:

def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:ecnoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"

The code above specifies that the x-axis will be using the category field from our data, that it takes nominal (i.e., categorical) values, that we want the labels to be rotated 270 degrees, and that the axis should have a blue title. Similarly, we specify that the y-axis will be using the value field from our data and that it takes numerical values.

Here, we can again use a convenience relation to set up the x and y axes properly. here is an example for doing the same thing for the x-axis:

query
def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:titleColor, "blue");
})
}]

Examples

Once we have our chart specification set up, we can plot it using vegalite:plot as follows:

query
def chart:data:values = small_data

def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true

def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:ecnoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"

def output = vegalite:plot[chart]
Simple Bar Chart

Again, please note that the chart, as well as certain graphical functionality (e.g., tooltips), will be visible only in the RelationalAI Notebook environment.

You can check the Vega-Lite Documentation for more information on all the different charts as well as their parameters. In general, Rel follows the same hierarchy of fields and values as the Vega-Lite Documentation.

As we already discussed, for certain types of charts and operations, Rel also provides some convenience relations. For example, instead of setting up the chart relation in a detailed manner as in the previous examples, we can plot a simple bar chart on the same data as follows:

query
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}

def output = vegalite:plot[
vegalite:bar[:category, :value, {:data, simple_data}]
]
Simple Bar Chart

In this case, by using vegalite:bar, some of the parameters of the chart (such as that the x-axis has nominal data) are already pre-filled. Similarly, instead of providing the parameters in detail, we can specify the data for the x-axis and some parameters very easily as follows:

query
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}

def chart = vegalite_utils:data[simple_data]

def chart:mark:type = "bar"

def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:ticks, boolean_true);
(:grid, boolean_true);
(:titleColor, "blue");
})
}]

def chart = vegalite_utils:y[{
(:value);
(:type, "quantitative");
}]

def output = vegalite:plot[chart]
Simple Bar Chart

Please note the alternative way of specifying the :field parameter for the specification of y-axis, by specifying :value as the field for y-axis instead of (:field, "value"). Rel automatically understands in this case that :value is the field we are using for the y-axis. Please also note that unlike the array format we used in the previous examples, the convenience relations (i.e., vegalite:bar) take the data in a (field, keys..., value) format. This is essentially the same format that is returned from relations such as load_csv. This makes convenience relations extremely useful for quick loading and plotting for certain types of data.

Using JSON Strings

In addition to specifying the chart configuration in Rel, we can also directly provide it as a JSON string. This functionality is very useful when we want to develop a chart specification outside of Rel (e.g., in the Vega-Lite Editor) and then use it to directly visualize our data.

Here is an example with a JSON specification for a simple bar chart:

query
// asign data
def chart:data:values = small_data

//chart specification in JSON
def chart = parse_json["""{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {"field": "category", "type": "nominal", "axis": {"labelAngle": 270}},
"y": {"field": "value", "type": "quantitative"}
}
}"""]

// display
def output = vegalite:plot[chart]
Simple Bar Chart

In our examples, we will be using Rel to configure the different Vega-Lite charts.

Example Charts

Simple Bar Chart

Let’s begin by creating a simple bar chart with real data. We will create a chart using the penguin data that we discussed in the previous section. For our first example, we will plot the number of penguins on each island in the dataset:

query
// assign data
def chart:data:values = penguin_data

def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true

def chart = vegalite_utils:x[{
(:field, "island");
(:title, "Island");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]

def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]


// display
def output = vegalite:plot[chart]
Simple Bar Chart Penguins

Stacked Bar Chart

In the next example, we expand on the simple bar chart by creating a stacked version.

For our stacked bar example, we are interested in plotting the number of male and female penguins per species. More specifically, we will generate a bar chart where the horizontal axis is the species and the vertical axis will have stacked bars showing the number of male and female penguins within each species:

query
// assign data
def chart:data:values = penguin_data

def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true

def chart = vegalite_utils:x[{
(:field, "species");
(:title, "Penguin Species");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]

def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]

def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]

// display
def output = vegalite:plot[chart]
Stacked Bar Chart

Scatterplot

In our next example, we will work on building a scatter plot. To this end, we will plot the culmen depth vs the culmen length of each penguin. The culmen is the upper ridge of a penguin’s beak. This kind of plot could showcase a potential correlation between these two values.

We generate a scatterplot where the horizontal axis shows the culmen depth in millimeters and the vertical axis shows the culmen length, also in millimeters. Each point in the plot corresponds to one instance in the data (i.e., one penguin).

Here is the code to generate this scatterplot:

query
// assign data
def chart:data:values = penguin_data

def chart:mark = "point"

def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

// display
def output = vegalite:plot[chart]
Scatterplot

Please note: We altered the range of both the axes through the use of the scale parameter in order to have a better view of the data. For more details on the different options for both the scatterplot and the rest of the plots in this How-To Guide, please refer to the Vega-Lite documentation.

Of course, we can easily use the power of Rel and Vega-Lite to make this last example even more elaborate. For example, we could display which of the points in the scatterplot belong to male versus female penguins:

query
// assign data
def chart:data:values = penguin_data

def chart:mark = "point"

def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]

// display
def output = vegalite:plot[chart]
Colored Scatterplot

Line Chart

In the next example, we explore how to create a line chart. For this example, we will create artificial data using a mathematical function in Rel (natural_exp). The following code generates x values from -10.0 to 10.0 with a step of 0.1, computes the sigmoid function for each x, and plots the results using a line chart:

query
def mydata = 1.0 / (1.0 + natural_exp[-x]) for x in range[-10.0, 10.0, 0.1]

def chart:data:values[:[], i] =
{(:x, a); (:sigmoid_x, b)} from a, b where sort[mydata](i, a, b)

def chart:mark = "line"

def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]

def chart = vegalite_utils:y[{
(:field, "sigmoid_x");
(:type, "quantitative");
}]

// display
def output = vegalite:plot[chart]
Line Chart

We can also plot multiple lines in the same graph, with the appropriate legend on the side and with the accent color scheme:

query
def x_vals = range[0.0, 8*pi_float64, 0.1]
def n = count[x_vals]
def data:x = r, x : sort[x_vals](i, x) and (i = r or r = i+n) from i
def data:function = range[1, n, 1], "cos[x]"
def data:function = range[n+1, 2*n, 1], "sin[x]"
def data:value = cos[data[:x, i]] for i where i <= n
def data:value = sin[data[:x, i]] for i where i > n

def chart = vegalite_utils:data[data]
def chart:mark = "line"
def chart:width = 400
def chart:height = 200

def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]

def chart = vegalite_utils:y[{
(:field, "value");
(:type, "quantitative");
}]

def chart = vegalite_utils:color[{
(:field, "function");
(:type, "nominal");
(:scale, :scheme, "accent");
}]

// display
def output = vegalite:plot[chart]
Multiple Line Charts

Overlaying Multiple Charts

The combination of Rel and Vega-Lite allows us to create interesting charts where we can show two different plots on the same chart. In the next example, we generate weekly sale data over the period of 1 year (52 weeks). We show the number of sales in each week with a bar chart, and we overlay the running total of sales with a line on the chart. We also use two separate y-axes since the scales of the two plots are different:

query
def n_weeks = 52
def mydata:x = sort[range[1, n_weeks, 1]]
def mydata:sales[i] = sin[mydata:x[i] / n_weeks * pi_float64]^2
def mydata:cumulative[1] = mydata:sales[1]
def mydata:cumulative[i] = mydata:sales[i] + mydata:cumulative[i-1]

def chart = vegalite_utils:data[mydata]
def chart:width = 500
def chart:height = 200

def chart:resolve = { (:scale, :y, "independent"); }

def chart = vegalite_utils:x[{
(:field, "x");
(:title, "week of year");
(:type, "quantitative");
}]

def chart = vegalite_utils:y[{
(:type, "quantitative");
}]

def chart[:layer, :[], 1] = {
(:mark, {(:type, "bar"); (:opacity, 0.5)});
(:encoding, :y, {(:field, "sales"); (:title, "weekly sales")});
}

def chart[:layer, :[], 2] = {
(:mark, {(:type, "line"); (:color, "orange"); (:size, 3)});
(:encoding, :y, {(:field, "cumulative"); (:title, "cumulative sales")});
}

// display
def output = vegalite:plot[chart]
Layered Chart

Marginal Histograms

In our next example, we show how to create marginal histograms. These are histograms displayed at the sides (or margins) of a scatterplot’s axes to showcase the distribution of each measurement.

In the following example, we are using the penguin data and we plot the penguins' culmen_depth and culmen_length in a marginal histogram:

query
// generate data
def mydata:x = penguins_clean:culmen_depth_mm
def mydata:y = penguins_clean:culmen_length_mm

// set up graph
def chart = vegalite_utils:data[mydata]

def chart:spacing = 15
def chart:bounds = "flush"

def chart[:vconcat, :[], 1] = {
(:mark, "bar");
(:height, 60);
vegalite_utils:x[{(:x); (:bin, boolean_true); (:axis, missing);}];
vegalite_utils:y[{(:y); (:aggregate, "count"); (:title, "");}];
}

def chart[:vconcat, :[], 2] = {
(:spacing, 15);
(:bounds, "flush");
(:hconcat, :[], 1, {
(:mark, "rect");
vegalite_utils:x[{(:x); (:bin, boolean_true); (:title, "culmen_depth_mm")}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:title, "culmen_length_mm")}];
vegalite_utils:color[{(:aggregate, "count");}];
});
(:hconcat, :[], 2, {
(:mark, "bar");
(:width, 60);
vegalite_utils:x[{(:x); (:aggregate, "count"); (:title, "");}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:axis, missing);}];
});

}

def chart:config:view:stroke = "transparent"

// display
def output = vegalite:plot[chart]
Marginal Histogram

Summary

We have discussed how to generate visualizations with Rel using either existing datasets or by creating data using mathematical functions. Rel, combined with Vega-Lite, provides a very powerful tool for visualizing and exploring data. More information on Rel can be found in the Rel Language Reference and information on the different types of charts and their parameters for Vega-Lite can be found in the Vega-Lite documentation.

See Also

In order to visualize data, Rel currently makes use of Vega and Vega-Lite. We have focused on Vega-Lite for its simplicity and good coverage of a large variety of visualization needs. For more advanced visualization needs, you may also use Vega.

Two resources that may be useful in addition to this guide are the CSV Import Guide and JSON Import and Export Guide. They contain examples and functionality useful for understanding how to appropriately load different kinds of data into the system, which can then be visualized through Vega-Lite.