Rel
HOW-TO GUIDES
Data Visualization: Vega-Lite

# Data Visualization: Vega-Lite

This how-to guide demonstrates how to create graphical representations of data with Rel and Vega-Lite.

## Goal#

This how-to guide shows you how to create Vega-Lite charts to visualize your data in Rel using the RAI notebook.

## Introduction#

This how-to guide showcases how to use Vega-Lite from within Rel in order to visualize data. The code presented here can be easily adapted to different kinds of charts, for example, creating a sorted bar chart instead of a regular bar chart.

Creating a chart using Rel and Vega-Lite can be as simple as using the appropriate chart type, vegalite:bar in this case, and applying it to some data:

// query

// prepare data
def in_data:year = {(1, 2018); (2, 2019); (3, 2020); (4, 2021)}
def in_data:sales = {(1, 100); (2, 120); (3, 65); (4, 180)}

// plot it
def output = vegalite:plot[
vegalite:bar[:year, :sales, {:data, in_data}]
]

The following sections discuss in more detail how to prepare data from different sources as well as how to configure and plot different charts.

Note that the code presented in this how-to guide produces graphical output in the RelationalAI Notebook. Everywhere else, for example, RelationalAI SDK), relations are returned in the standard, non-graphical form.

## Preparing the Data#

Consider how to prepare data in order to easily plot them with Vega-Lite. At a high level, your data need to be set up as a JSON array of the form: (:[], position, attribute, value). Here is a small example, where data are directly inserted in a relation called small_data.

// query

def small_data[:[], 1, :category] = "Alpha"
def small_data[:[], 1, :value] = 28
def small_data[:[], 2, :category] = "Beta"
def small_data[:[], 2, :value] = 55
def small_data[:[], 3, :category] = "Gamma"
def small_data[:[], 3, :value] = 43

def output = small_data

This how-to guide initially works with toy data that you can input directly. Later examples will leverage existing datasets that are larger.

The first example has three data points in the array. Each data point contains the attributes :category and :value.

You can now use the data in this array format to create plots, as you will see in later sections. You can see that instead of providing each data item and structuring the array by hand, you can use Rel to do the same thing with fewer lines of code:

// install

def in_data = {("Alpha", 28); ("Beta", 55); ("Gamma", 43)}
def small_data[:[], i] = {(:category, a); (:value, b)}
from a, b where sort[in_data](i, a, b)

This is very useful when you already have your data in an existing relation, such as in_data in this case, and you want to easily convert it to the array form for plotting with Vega-Lite.

### CSV Data#

The next example imports an existing dataset in CSV format into Rel and then uses it for plotting: the penguin dataset, located in this public S3 bucket. This dataset contains data for a set of attributes (for example, species, flipper length, sex) for 152 penguins. For more details, see the Rel Machine Learning (Classification) how-to guide.

The following Rel code loads this data and converts them into the appropriate format for Vega-Lite:

// install

// data location
def penguin_config:path = "s3://relationalai-documentation-public/ml-classification/penguin/penguins_size.csv"

// data schema
def penguin_config:schema:species = "string"
def penguin_config:schema:island = "string"
def penguin_config:schema:culmen_length_mm = "float"
def penguin_config:schema:culmen_depth_mm = "float"
def penguin_config:schema:flipper_length_mm = "float"
def penguin_config:schema:body_mass_g = "float"
def penguin_config:schema:sex = "string"

// clean the data to remove NA and .
def row_with_error(row) =
penguins:sex(row, "NA") or
penguins:sex(row, ".") or

def penguins_clean(column, row, entry...) =
penguins(column, row, entry...) and not row_with_error(row)

def penguins_transformed = lined_csv[penguins_clean]

// prepare data in array format
def penguin_data[:[], i, col] = penguins_transformed[col, i]

The examples that follow use both small_data and penguin_data.

## Configuring the Plot#

### Assigning Data#

Once you have prepared your data in this specific form, you can now define a chart relation and provide the chart parameters that you would like to use.

You can start by providing the data for your chart, which need to be assigned in the :values field under a :data field:

// query

// set up data to plot
def chart:data:values = small_data

def output = chart

Or, similarly for the penguin dataset:

def chart:data:values = penguin_data

For certain operations in plots using Vega-Lite, Rel provides convenience relations that can be helpful when configuring plots. An example of this is vegalite_utils:data, which can help create the appropriate data format for plotting:

// query

def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}

def chart = vegalite_utils:data[simple_data]
def output = chart

This approach is useful when you have data specified as columns in a relation, such as simple_data in this case. Note that the chart relation is identical to the one created earlier using small_data as far as the data configuration is concerned.

### Styling the Graph#

Next, you can specify the chart type that you would like to use. To do this in Rel, you can use the :mark and :type fields of the chart to specify the type that you wish to use. For example, if you want to use a bar chart:

def chart:mark:type = "bar"

This is equivalent to using vegalite:bar as you did at the beginning of this how-to guide.

In a similar fashion, you can specify that you would like to enable tooltips:

def chart:mark:tooltip = boolean_true

Finally, you can provide a specification for the axes of your chart, using the :encoding, :x, :y, :type, and :field fields. For example:

def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:encoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"

The code above specifies that the x-axis will be using the category field from your data, that it takes nominal (i.e., categorical) values, that you want the labels to be rotated 270 degrees, and that the axis should have a blue title. Similarly, the code specifies that the y-axis will be using the value field from your data and that it takes numerical values.

Here, you can again use a convenience relation to set up the x and y axes properly. Here is an example for doing the same thing for the x-axis:

// query

def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:titleColor, "blue");
})
}]

### Examples#

Once you have your chart specification set up, you can plot it using vegalite:plot as follows:

// query

def chart:data:values = small_data

def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true

def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:encoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"

def output = vegalite:plot[chart]

Again, note that the chart, as well as certain graphical functionality, for example, tooltips, are visible only in the RelationalAI Notebook environment.

You can check the Vega-Lite Documentation for more information on all the different charts as well as their parameters. In general, Rel follows the same hierarchy of fields and values as the Vega-Lite Documentation.

As already discussed, for certain types of charts and operations, Rel also provides some convenience relations. For example, instead of setting up the chart relation in a detailed manner as in the previous examples, you can plot a simple bar chart on the same data as follows:

// query

def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}

def output = vegalite:plot[
vegalite:bar[:category, :value, {:data, simple_data}]
]

In this case, by using vegalite:bar, some of the parameters of the chart, such as the x-axis having nominal data, are already pre-filled. Similarly, instead of providing the parameters in detail, you can specify the data for the x-axis and some parameters very easily as follows:

// query

def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}

def chart = vegalite_utils:data[simple_data]

def chart:mark:type = "bar"

def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:ticks, boolean_true);
(:grid, boolean_true);
(:titleColor, "blue");
})
}]

def chart = vegalite_utils:y[{
(:value);
(:type, "quantitative");
}]

def output = vegalite:plot[chart]

Note that the alternative way of specifying the :field parameter for the specification of the y-axis is by specifying :value as the field for the y-axis instead of (:field, "value"). Rel automatically understands in this case that :value is the field you are using for the y-axis. Also note that unlike the array format you used in the previous examples, the convenience relations (i.e., vegalite:bar) take the data in a (field, keys..., value) format. This is essentially the same format that is returned from relations such as load_csv. This makes convenience relations extremely useful for quick loading and plotting for certain types of data.

### Using JSON Strings#

In addition to specifying the chart configuration in Rel, you can also directly provide it as a JSON string. This functionality is very useful when you want to develop a chart specification outside of Rel, for example, in the Vega-Lite Editor, and then use it to directly visualize your data.

Here is an example with a JSON specification for a simple bar chart:

// query

// assign data
def chart:data:values = small_data

//chart specification in JSON
def chart = parse_json["""{
"\$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {"field": "category", "type": "nominal", "axis": {"labelAngle": 270}},
"y": {"field": "value", "type": "quantitative"}
}
}"""]

// display
def output = vegalite:plot[chart]

In the following examples, Rel will configure the different Vega-Lite charts.

## Example Charts#

### Simple Bar Chart#

The first example creates a simple bar chart with real data. The chart uses the penguin data discussed in the previous section. Here, the number of penguins on each island is plotted in the dataset:

// query

// assign data
def chart:data:values = penguin_data

def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true

def chart = vegalite_utils:x[{
(:field, "island");
(:title, "Island");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]

def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]

// display
def output = vegalite:plot[chart]

### Stacked Bar Chart#

The next example expands on the simple bar chart by creating a stacked version.

The stacked bar example plots the number of male and female penguins per species. More specifically, this example generates a bar chart where the horizontal axis is the species, and the vertical axis has stacked bars showing the number of male and female penguins within each species:

// query

// assign data
def chart:data:values = penguin_data

def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true

def chart = vegalite_utils:x[{
(:field, "species");
(:title, "Penguin Species");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]

def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]

def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]

// display
def output = vegalite:plot[chart]

### Scatterplot#

The next example shows you how to build a scatter plot. This example plots the culmen depth versus the culmen length of each penguin. The culmen is the upper ridge of a penguin’s beak. This kind of plot can show potential correlation between these two values.

The horizontal axis of the scatter plot shows the culmen depth in millimeters, and the vertical axis shows the culmen length, also in millimeters. Each point in the plot corresponds to one instance in the data, i.e., one penguin.

Here is the code to generate this scatterplot:

// query

// assign data
def chart:data:values = penguin_data

def chart:mark = "point"

def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

// display
def output = vegalite:plot[chart]

Note: The range of both the axes has been altered through the use of the scale parameter in order to provide a better view of the data. For more details on the different options for both the scatterplot and the rest of the plots in this how-to guide, see the Vega-Lite documentation.

You can easily use the power of Rel and Vega-Lite to make this last example even more elaborate. For example, you can display which of the points in the scatterplot belong to male versus female penguins:

// query

// assign data
def chart:data:values = penguin_data

def chart:mark = "point"

def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]

def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]

// display
def output = vegalite:plot[chart]

### Line Chart#

The next example explores how to create a line chart. This example creates artificial data using a mathematical function in Rel (natural_exp). The following code generates x values from -10.0 to 10.0 with a step of 0.1, computes the sigmoid function for each x, and plots the results using a line chart:

// query

def mydata = 1.0 / (1.0 + natural_exp[-x]) for x in range[-10.0, 10.0, 0.1]

def chart:data:values[:[], i] =
{(:x, a); (:sigmoid_x, b)} from a, b where sort[mydata](i, a, b)

def chart:mark = "line"

def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]

def chart = vegalite_utils:y[{
(:field, "sigmoid_x");
(:type, "quantitative");
}]

// display
def output = vegalite:plot[chart]

You can also plot multiple lines in the same graph, with the appropriate legend on the side and with the accent color scheme:

// query

def x_vals = range[0.0, 8*pi_float64, 0.1]
def n = count[x_vals]
def data:x = r, x : sort[x_vals](i, x) and (i = r or r = i+n) from i
def data:function = range[1, n, 1], "cos[x]"
def data:function = range[n+1, 2*n, 1], "sin[x]"
def data:value = cos[data[:x, i]] for i where i <= n
def data:value = sin[data[:x, i]] for i where i > n

def chart = vegalite_utils:data[data]
def chart:mark = "line"
def chart:width = 400
def chart:height = 200

def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]

def chart = vegalite_utils:y[{
(:field, "value");
(:type, "quantitative");
}]

def chart = vegalite_utils:color[{
(:field, "function");
(:type, "nominal");
(:scale, :scheme, "accent");
}]

// display
def output = vegalite:plot[chart]

### Overlaying Multiple Charts#

The combination of Rel and Vega-Lite allows you to create interesting charts where you can show two different plots on the same chart. The next example generates weekly sales data over the period of one year (52 weeks). This example shows the number of sales in each week with a bar chart and overlays the running total of sales with a line on the chart. It also uses two separate y-axes since the scales of the two plots are different:

// query

def n_weeks = 52
def mydata:x = sort[range[1, n_weeks, 1]]
def mydata:sales[i] = sin[mydata:x[i] / n_weeks * pi_float64]^2
def mydata:cumulative[1] = mydata:sales[1]
def mydata:cumulative[i] = mydata:sales[i] + mydata:cumulative[i-1]

def chart = vegalite_utils:data[mydata]
def chart:width = 500
def chart:height = 200

def chart:resolve = { (:scale, :y, "independent"); }

def chart = vegalite_utils:x[{
(:field, "x");
(:title, "week of year");
(:type, "quantitative");
}]

def chart = vegalite_utils:y[{
(:type, "quantitative");
}]

def chart[:layer, :[], 1] = {
(:mark, {(:type, "bar"); (:opacity, 0.5)});
(:encoding, :y, {(:field, "sales"); (:title, "weekly sales")});
}

def chart[:layer, :[], 2] = {
(:mark, {(:type, "line"); (:color, "orange"); (:size, 3)});
(:encoding, :y, {(:field, "cumulative"); (:title, "cumulative sales")});
}

// display
def output = vegalite:plot[chart]

### Marginal Histograms#

The next example shows how to create marginal histograms. These are histograms displayed at the sides, or margins, of a scatterplot’s axes to show the distribution of each measurement.

The following example uses the penguin data and plots the penguins’ culmen_depth and culmen_length in a marginal histogram:

// query

// generate data
def mydata:x = penguins_clean:culmen_depth_mm
def mydata:y = penguins_clean:culmen_length_mm

// set up graph
def chart = vegalite_utils:data[mydata]

def chart:spacing = 15
def chart:bounds = "flush"

def chart[:vconcat, :[], 1] = {
(:mark, "bar");
(:height, 60);
vegalite_utils:x[{(:x); (:bin, boolean_true); (:axis, missing);}];
vegalite_utils:y[{(:y); (:aggregate, "count"); (:title, "");}];
}

def chart[:vconcat, :[], 2] = {
(:spacing, 15);
(:bounds, "flush");
(:hconcat, :[], 1, {
(:mark, "rect");
vegalite_utils:x[{(:x); (:bin, boolean_true); (:title, "culmen_depth_mm")}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:title, "culmen_length_mm")}];
vegalite_utils:color[{(:aggregate, "count");}];
});
(:hconcat, :[], 2, {
(:mark, "bar");
(:width, 60);
vegalite_utils:x[{(:x); (:aggregate, "count"); (:title, "");}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:axis, missing);}];
});

}

def chart:config:view:stroke = "transparent"

// display
def output = vegalite:plot[chart]