Data Visualization: Vega-Lite
This how-to guide demonstrates how to create graphical representations of data with Rel and Vega-Lite.
You can download this tutorial as a RAI notebook by clicking here.
Goal
This how-to guide shows you how to create Vega-Lite charts to visualize your data in Rel using the RAI notebook.
Introduction
This how-to guide showcases how to use Vega-Lite from within Rel in order to visualize data. The code presented here can be easily adapted to different kinds of charts, for example, creating a sorted bar chart instead of a regular bar chart.
Creating a chart using Rel and Vega-Lite can be as simple as using the appropriate chart type, vegalite:bar
in this case, and applying it to some data:
// prepare data
def in_data:year = {(1, 2018); (2, 2019); (3, 2020); (4, 2021)}
def in_data:sales = {(1, 100); (2, 120); (3, 65); (4, 180)}
// plot it
def output = vegalite:plot[
vegalite:bar[:year, :sales, {:data, in_data}]
]
The following sections discuss in more detail how to prepare data from different sources as well as how to configure and plot different charts.
Note that the code presented in this how-to guide produces graphical output in the RelationalAI Notebook. Everywhere else, for example, RelationalAI SDK), relations are returned in the standard, non-graphical form.
Preparing the Data
Consider how to prepare data in order to easily plot them with Vega-Lite.
At a high level, your data need to be set up as a JSON array of the form: (:[], position, attribute, value)
.
Here is a small example, where data are directly inserted in a relation called small_data
.
def small_data[:[], 1, :category] = "Alpha"
def small_data[:[], 1, :value] = 28
def small_data[:[], 2, :category] = "Beta"
def small_data[:[], 2, :value] = 55
def small_data[:[], 3, :category] = "Gamma"
def small_data[:[], 3, :value] = 43
def output = small_data
Relation: output
:[] | 1 | :category | "Alpha" |
:[] | 1 | :value | 28 |
:[] | 2 | :category | "Beta" |
:[] | 2 | :value | 55 |
:[] | 3 | :category | "Gamma" |
:[] | 3 | :value | 43 |
This how-to guide initially works with toy data that you can input directly. Later examples will leverage existing datasets that are larger.
The first example has three data points in the array.
Each data point contains the attributes :category
and :value
.
You can now use the data in this array format to create plots, as you will see in later sections. You can see that instead of providing each data item and structuring the array by hand, you can use Rel to do the same thing with fewer lines of code:
def in_data = {("Alpha", 28); ("Beta", 55); ("Gamma", 43)}
def small_data[:[], i] = {(:category, a); (:value, b)}
from a, b where sort[in_data](i, a, b)
This is very useful when you already have your data in an existing relation, such as in_data
in this case, and you want to easily convert it to the array form for plotting with Vega-Lite.
CSV Data
The next example imports an existing dataset in CSV format into Rel and then uses it for plotting: the penguin dataset, located in this public S3 bucket. This dataset contains data for a set of attributes (for example, species, flipper length, sex) for 152 penguins. For more details, see the Rel Machine Learning (Classification) how-to guide.
The following Rel code loads this data and converts them into the appropriate format for Vega-Lite:
// data location
def penguin_config:path = "s3://relationalai-documentation-public/ml-classification/penguin/penguins_size.csv"
// data schema
def penguin_config:schema:species = "string"
def penguin_config:schema:island = "string"
def penguin_config:schema:culmen_length_mm = "float"
def penguin_config:schema:culmen_depth_mm = "float"
def penguin_config:schema:flipper_length_mm = "float"
def penguin_config:schema:body_mass_g = "float"
def penguin_config:schema:sex = "string"
def penguins = lined_csv[load_csv[penguin_config]]
// clean the data to remove NA and .
def row_with_error(row) =
penguins:sex(row, "NA") or
penguins:sex(row, ".") or
penguins:load_errors(row, _, _)
def penguins_clean(column, row, entry...) =
penguins(column, row, entry...) and not row_with_error(row)
// prepare data in array format
def penguin_data[:[], i, col] = penguins_clean[col, i]
The examples that follow use both small_data
and penguin_data
.
Configuring the Plot
Assigning Data
Once you have prepared your data in this specific form, you can now define a chart
relation and provide the chart parameters that you would like to use.
You can start by providing the data for your chart, which need to be assigned in the :values
field under a :data
field:
// set up data to plot
def chart:data:values = small_data
def output = chart
Relation: output
:data | :values | :[] | 1 | :category | "Alpha" |
:data | :values | :[] | 1 | :value | 28 |
:data | :values | :[] | 2 | :category | "Beta" |
:data | :values | :[] | 2 | :value | 55 |
:data | :values | :[] | 3 | :category | "Gamma" |
:data | :values | :[] | 3 | :value | 43 |
Or, similarly for the penguin dataset:
def chart:data:values = penguin_data
For certain operations in plots using Vega-Lite, Rel provides convenience relations that can be helpful when configuring plots.
An example of this is vegalite_utils:data
, which can help create the appropriate data format for plotting:
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}
def chart = vegalite_utils:data[simple_data]
def output = chart
Relation: output
:data | :values | :[] | 1 | :category | "Alpha" |
:data | :values | :[] | 1 | :value | 28 |
:data | :values | :[] | 2 | :category | "Beta" |
:data | :values | :[] | 2 | :value | 55 |
:data | :values | :[] | 3 | :category | "Gamma" |
:data | :values | :[] | 3 | :value | 43 |
This approach is useful when you have data specified as columns in a relation, such as simple_data
in this case.
Note that the chart
relation is identical to the one created earlier using small_data
as far as the data configuration is concerned.
Styling the Graph
Next, you can specify the chart type that you would like to use.
To do this in Rel, you can use the :mark
and :type
fields of the chart
to specify the type that you wish to use.
For example, if you want to use a bar chart:
def chart:mark:type = "bar"
This is equivalent to using vegalite:bar
as you did at the beginning of this how-to guide.
In a similar fashion, you can specify that you would like to enable tooltips:
def chart:mark:tooltip = boolean_true
Finally, you can provide a specification for the axes of your chart, using the :encoding
, :x
, :y
, :type
, and :field
fields.
For example:
def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:encoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"
The code above specifies that the x-axis will be using the category
field from your data, that it takes nominal (i.e., categorical) values, that you want the labels to be rotated 270 degrees, and that the axis should have a blue title.
Similarly, the code specifies that the y-axis will be using the value
field from your data and that it takes numerical values.
Here, you can again use a convenience relation to set up the x and y axes properly. Here is an example for doing the same thing for the x-axis:
def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:titleColor, "blue");
})
}]
Examples
Once you have your chart specification set up, you can plot it using vegalite:plot
as follows:
def chart:data:values = small_data
def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true
def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:encoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"
def output = vegalite:plot[chart]
Again, note that the chart, as well as certain graphical functionality, for example, tooltips, are visible only in the RelationalAI Notebook environment.
You can check the Vega-Lite Documentation for more information on all the different charts as well as their parameters. In general, Rel follows the same hierarchy of fields and values as the Vega-Lite Documentation.
As already discussed, for certain types of charts and operations, Rel also provides some convenience relations.
For example, instead of setting up the chart
relation in a detailed manner as in the previous examples, you can plot a simple bar chart on the same data as follows:
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}
def output = vegalite:plot[
vegalite:bar[:category, :value, {:data, simple_data}]
]
In this case, by using vegalite:bar
, some of the parameters of the chart, such as the x-axis having nominal data, are already pre-filled.
Similarly, instead of providing the parameters in detail, you can specify the data for the x-axis and some parameters very easily as follows:
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}
def chart = vegalite_utils:data[simple_data]
def chart:mark:type = "bar"
def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:ticks, boolean_true);
(:grid, boolean_true);
(:titleColor, "blue");
})
}]
def chart = vegalite_utils:y[{
(:value);
(:type, "quantitative");
}]
def output = vegalite:plot[chart]
Note that the alternative way of specifying the :field
parameter for the specification of the y-axis is by specifying :value
as the field for the y-axis instead of (:field, "value")
.
Rel automatically understands in this case that :value
is the field you are using for the y-axis.
Also note that unlike the array format you used in the previous examples, the convenience relations (i.e., vegalite:bar
) take the data in a (field, keys..., value)
format.
This is essentially the same format that is returned from relations such as load_csv.
This makes convenience relations extremely useful for quick loading and plotting for certain types of data.
Using JSON Strings
In addition to specifying the chart configuration in Rel, you can also directly provide it as a JSON
string.
This functionality is very useful when you want to develop a chart specification outside of Rel, for example, in the Vega-Lite Editor, and then use it to directly visualize your data.
Here is an example with a JSON
specification for a simple bar chart:
// asign data
def chart:data:values = small_data
//chart specification in JSON
def chart = parse_json["""{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {"field": "category", "type": "nominal", "axis": {"labelAngle": 270}},
"y": {"field": "value", "type": "quantitative"}
}
}"""]
// display
def output = vegalite:plot[chart]
In the following examples, Rel will configure the different Vega-Lite charts.
Example Charts
Simple Bar Chart
The first example creates a simple bar chart with real data. The chart uses the penguin data discussed in the previous section. Here, the number of penguins on each island is plotted in the dataset:
// assign data
def chart:data:values = penguin_data
def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true
def chart = vegalite_utils:x[{
(:field, "island");
(:title, "Island");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]
def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]
// display
def output = vegalite:plot[chart]
Stacked Bar Chart
The next example expands on the simple bar chart by creating a stacked version.
The stacked bar example plots the number of male and female penguins per species. More specifically, this example generates a bar chart where the horizontal axis is the species, and the vertical axis has stacked bars showing the number of male and female penguins within each species:
// assign data
def chart:data:values = penguin_data
def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true
def chart = vegalite_utils:x[{
(:field, "species");
(:title, "Penguin Species");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]
def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]
def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]
// display
def output = vegalite:plot[chart]
Scatterplot
The next example shows you how to build a scatter plot. This example plots the culmen depth versus the culmen length of each penguin. The culmen is the upper ridge of a penguin’s beak. This kind of plot can show potential correlation between these two values.
The horizontal axis of the scatter plot shows the culmen depth in millimeters, and the vertical axis shows the culmen length, also in millimeters. Each point in the plot corresponds to one instance in the data, i.e., one penguin.
Here is the code to generate this scatterplot:
// assign data
def chart:data:values = penguin_data
def chart:mark = "point"
def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
// display
def output = vegalite:plot[chart]
Note: The range of both the axes has been altered through the use of the scale
parameter in order to provide a better view of the data.
For more details on the different options for both the scatterplot and the rest of the plots in this how-to guide, see the Vega-Lite documentation.
You can easily use the power of Rel and Vega-Lite to make this last example even more elaborate. For example, you can display which of the points in the scatterplot belong to male versus female penguins:
// assign data
def chart:data:values = penguin_data
def chart:mark = "point"
def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]
// display
def output = vegalite:plot[chart]
Line Chart
The next example explores how to create a line chart.
This example creates artificial data using a mathematical function in Rel (natural_exp
).
The following code generates x
values from -10.0
to 10.0
with a step of 0.1
, computes the sigmoid function for each x
, and plots the results using a line chart:
def mydata = 1.0 / (1.0 + natural_exp[-x]) for x in range[-10.0, 10.0, 0.1]
def chart:data:values[:[], i] =
{(:x, a); (:sigmoid_x, b)} from a, b where sort[mydata](i, a, b)
def chart:mark = "line"
def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]
def chart = vegalite_utils:y[{
(:field, "sigmoid_x");
(:type, "quantitative");
}]
// display
def output = vegalite:plot[chart]
You can also plot multiple lines in the same graph, with the appropriate legend on the side and with the accent
color scheme:
def x_vals = range[0.0, 8*pi_float64, 0.1]
def n = count[x_vals]
def data:x = r, x : sort[x_vals](i, x) and (i = r or r = i+n) from i
def data:function = range[1, n, 1], "cos[x]"
def data:function = range[n+1, 2*n, 1], "sin[x]"
def data:value = cos[data[:x, i]] for i where i <= n
def data:value = sin[data[:x, i]] for i where i > n
def chart = vegalite_utils:data[data]
def chart:mark = "line"
def chart:width = 400
def chart:height = 200
def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]
def chart = vegalite_utils:y[{
(:field, "value");
(:type, "quantitative");
}]
def chart = vegalite_utils:color[{
(:field, "function");
(:type, "nominal");
(:scale, :scheme, "accent");
}]
// display
def output = vegalite:plot[chart]
Overlaying Multiple Charts
The combination of Rel and Vega-Lite allows you to create interesting charts where you can show two different plots on the same chart. The next example generates weekly sales data over the period of one year (52 weeks). This example shows the number of sales in each week with a bar chart and overlays the running total of sales with a line on the chart. It also uses two separate y-axes since the scales of the two plots are different:
def n_weeks = 52
def mydata:x = sort[range[1, n_weeks, 1]]
def mydata:sales[i] = sin[mydata:x[i] / n_weeks * pi_float64]^2
def mydata:cumulative[1] = mydata:sales[1]
def mydata:cumulative[i] = mydata:sales[i] + mydata:cumulative[i-1]
def chart = vegalite_utils:data[mydata]
def chart:width = 500
def chart:height = 200
def chart:resolve = { (:scale, :y, "independent"); }
def chart = vegalite_utils:x[{
(:field, "x");
(:title, "week of year");
(:type, "quantitative");
}]
def chart = vegalite_utils:y[{
(:type, "quantitative");
}]
def chart[:layer, :[], 1] = {
(:mark, {(:type, "bar"); (:opacity, 0.5)});
(:encoding, :y, {(:field, "sales"); (:title, "weekly sales")});
}
def chart[:layer, :[], 2] = {
(:mark, {(:type, "line"); (:color, "orange"); (:size, 3)});
(:encoding, :y, {(:field, "cumulative"); (:title, "cumulative sales")});
}
// display
def output = vegalite:plot[chart]
Marginal Histograms
The next example shows how to create marginal histograms. These are histograms displayed at the sides, or margins, of a scatterplot’s axes to show the distribution of each measurement.
The following example uses the penguin data and plots the penguins' culmen_depth
and culmen_length
in a marginal histogram:
// generate data
def mydata:x = penguins_clean:culmen_depth_mm
def mydata:y = penguins_clean:culmen_length_mm
// set up graph
def chart = vegalite_utils:data[mydata]
def chart:spacing = 15
def chart:bounds = "flush"
def chart[:vconcat, :[], 1] = {
(:mark, "bar");
(:height, 60);
vegalite_utils:x[{(:x); (:bin, boolean_true); (:axis, missing);}];
vegalite_utils:y[{(:y); (:aggregate, "count"); (:title, "");}];
}
def chart[:vconcat, :[], 2] = {
(:spacing, 15);
(:bounds, "flush");
(:hconcat, :[], 1, {
(:mark, "rect");
vegalite_utils:x[{(:x); (:bin, boolean_true); (:title, "culmen_depth_mm")}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:title, "culmen_length_mm")}];
vegalite_utils:color[{(:aggregate, "count");}];
});
(:hconcat, :[], 2, {
(:mark, "bar");
(:width, 60);
vegalite_utils:x[{(:x); (:aggregate, "count"); (:title, "");}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:axis, missing);}];
});
}
def chart:config:view:stroke = "transparent"
// display
def output = vegalite:plot[chart]
Summary
In this guide, you have learned how to generate visualizations with Rel using either existing datasets or by creating data using mathematical functions. Rel, combined with Vega-Lite, provides a very powerful tool for visualizing and exploring data. More information on Rel can be found in the Rel Language Reference, and information on the different types of charts and their parameters for Vega-Lite can be found in the Vega-Lite documentation.
See Also
In order to visualize data, Rel currently makes use of Vega and Vega-Lite. This guide has focused on Vega-Lite for its simplicity and good coverage of a large variety of visualization needs. For more advanced visualization needs, see Vega.
Two resources that may be useful in addition to this guide are the CSV Import Guide and JSON Import and Export Guide. They contain examples and functionality useful for understanding how to appropriately load different kinds of data into the system, which can then be visualized through Vega-Lite.