Vega-Lite
This how-to guide demonstrates how to create graphical representations of data with Rel and Vega-Lite.
Goal
This how-to guide shows you how to create Vega-Lite charts to visualize your data in Rel using RAI worksheets.
Introduction
This how-to guide showcases how to use Vega-Lite (opens in a new tab) from within Rel in order to visualize data. The code presented here can be easily adapted to different kinds of charts, for example, creating a sorted bar chart instead of a regular bar chart.
Creating a chart using Rel and Vega-Lite (opens in a new tab) can be as simple as using the appropriate chart type, vegalite:bar
in this case, and applying it to some data:
// read query
// Prepare the data.
def in_data:year = {(1, 2018); (2, 2019); (3, 2020); (4, 2021)}
def in_data:sales = {(1, 100); (2, 120); (3, 65); (4, 180)}
// Plot them.
def output = ::std::display::vegalite::plot[
::std::display::vegalite::bar[:year, :sales, {:data, in_data}]
]
The following sections discuss in more detail how to prepare data from different sources as well as how to configure and plot different charts.
Note that the code presented in this how-to guide produces graphical output for the RAI Console worksheets. Everywhere else, for example, the RelationalAI SDKs, relations are returned in the standard, nongraphical form.
Preparing the Data
Consider how to prepare data in order to easily plot them with Vega-Lite (opens in a new tab).
At a high level, your data need to be set up as a JSON array of the form: (:[], position, attribute, value)
.
Here is a small example, where data are directly inserted in a relation called small_data
.
// read query
def small_data[:[], 1, :category] = "Alpha"
def small_data[:[], 1, :value] = 28
def small_data[:[], 2, :category] = "Beta"
def small_data[:[], 2, :value] = 55
def small_data[:[], 3, :category] = "Gamma"
def small_data[:[], 3, :value] = 43
def output = small_data
This how-to guide initially works with toy data that you can input directly. Later examples will leverage existing datasets that are larger.
The first example has three data points in the array.
Each data point contains the attributes :category
and :value
.
You can now use the data in this array format to create plots, as you will see in later sections. You can see that instead of providing each data item and structuring the array by hand, you can use Rel to do the same thing with fewer lines of code:
// model
def in_data = {("Alpha", 28); ("Beta", 55); ("Gamma", 43)}
def small_data[:[], i] = {(:category, a); (:value, b)}
from a, b where sort[in_data](i, a, b)
This is very useful when you already have your data in an existing relation, such as in_data
in this case, and you want to easily convert it to the array form for plotting with Vega-Lite (opens in a new tab).
CSV Data
The next example imports an existing dataset in CSV format into Rel and then uses it for plotting: the penguin dataset (opens in a new tab), located in this public Azure bucket. This dataset contains data for a set of attributes (for example, species, flipper length, sex) for 152 penguins.
The following Rel code loads this data and converts them into the appropriate format for Vega-Lite (opens in a new tab):
// write query
// Specify the data location.
def config:path = "azure://raidocs.blob.core.windows.net/datasets/penguins/penguins_size.csv"
// Specify the data schema.
def config:schema:species = "string"
def config:schema:island = "string"
def config:schema:culmen_length_mm = "float"
def config:schema:culmen_depth_mm = "float"
def config:schema:flipper_length_mm = "float"
def config:schema:body_mass_g = "float"
def config:schema:sex = "string"
def penguin_data = load_csv[config]
// Clean the data to remove `NA` and `.`.
def row_with_error(row) =
penguin_data:sex(row, "NA") or
penguin_data:sex(row, ".") or
penguin_data:load_errors(row, _, _)
def penguins_clean(column, row, entry...) =
penguin_data(column, row, entry...) and not row_with_error(row)
// Insert data into the database.
def insert:penguin = penguins_clean
For most Vega-Lite charts, the penguin data need to be formatted as a JSON array.
// model
// Prepare data in array format.
def penguin_array(:[], index, column, value) = lined_csv[penguin](column, index, value)
The examples that follow use both small_data
, penguin
, and penguin_array
.
Configuring the Plot
Assigning Data
Once you have prepared your data in this specific form, you can now define a chart
relation and provide the chart parameters that you would like to use.
You can start by providing the data for your chart, which need to be assigned in the :values
field under a :data
field:
// read query
// Set up the data to plot.
def chart:data:values = small_data
def output = chart
Or, similarly for the penguin dataset:
def chart:data:values = penguin_array
For certain operations in plots using Vega-Lite (opens in a new tab), Rel provides convenience relations that can be helpful when configuring plots.
An example of this is vegalite_utils:data
, which can help create the appropriate data format for plotting:
// read query
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}
def chart = vegalite_utils:data[simple_data]
def output = chart
This approach is useful when you have data specified as columns in a relation, such as simple_data
in this case.
Note that the chart
relation is identical to the one created earlier using small_data
as far as the data configuration is concerned.
Styling the Graph
Next, you can specify the chart type that you would like to use.
To do this in Rel, you can use the :mark
and :type
fields of the chart
to specify the type that you wish to use.
For example, if you want to use a bar chart:
def chart:mark:type = "bar"
This is equivalent to using vegalite:bar
as you did at the beginning of this how-to guide.
In a similar fashion, you can specify that you would like to enable tooltips:
def chart:mark:tooltip = boolean_true
Finally, you can provide a specification for the axes of your chart, using the :encoding
, :x
, :y
, :type
, and :field
fields.
For example:
def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:encoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"
The code above specifies that the x-axis will be using the category
field from your data, that it takes nominal (i.e., categorical) values, that you want the labels to be rotated 270 degrees, and that the axis should have a blue title.
Similarly, the code specifies that the y-axis will be using the value
field from your data and that it takes numerical values.
Here, you can again use a convenience relation to set up the x and y axes properly. Here is an example for doing the same thing for the x-axis:
// read query
def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:titleColor, "blue");
})
}]
Examples
Once you have your chart specification set up, you can plot it using vegalite:plot
as follows:
// read query
def chart:data:values = small_data
def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true
def chart:encoding:x:field = "category"
def chart:encoding:x:title = "My cool x-axis"
def chart:encoding:x:sort = "descending"
def chart:encoding:x:type = "nominal"
def chart:encoding:x:axis:labelAngle = 270
def chart:encoding:x:axis:titleColor = "blue"
def chart:encoding:y:field = "value"
def chart:encoding:y:type = "quantitative"
def output = ::std::display::vegalite::plot[chart]
Again, note that the chart, as well as certain graphical functionality, for example, tooltips, are visible only in the RAI Console worksheets environment.
You can check the Vega-Lite Documentation (opens in a new tab) for more information on all the different charts as well as their parameters. In general, Rel follows the same hierarchy of fields and values as the Vega-Lite Documentation (opens in a new tab).
As already discussed, for certain types of charts and operations, Rel also provides some convenience relations.
For example, instead of setting up the chart
relation in a detailed manner as in the previous examples, you can plot a simple bar chart on the same data as follows:
// read query
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}
def output = ::std::display::vegalite::plot[
::std::display::vegalite::bar[:category, :value, {:data, simple_data}]
]
In this case, by using vegalite:bar
, some of the parameters of the chart, such as the x-axis having nominal data, are already pre-filled.
Similarly, instead of providing the parameters in detail, you can specify the data for the x-axis and some parameters very easily as follows:
// read query
def simple_data:category = {(1, "Alpha"); (2, "Beta"); (3, "Gamma")}
def simple_data:value = {(1, 28); (2, 55); (3, 43)}
def chart = vegalite_utils:data[simple_data]
def chart:mark:type = "bar"
def chart = vegalite_utils:x[{
(:field, "category");
(:title, "My cool x-axis");
(:sort, "descending");
(:type, "nominal");
(:axis, {
(:labelAngle, 270);
(:ticks, boolean_true);
(:grid, boolean_true);
(:titleColor, "blue");
})
}]
def chart = vegalite_utils:y[{
(:value);
(:type, "quantitative");
}]
def output = ::std::display::vegalite::plot[chart]
Note that the alternative way of specifying the :field
parameter for the specification of the y-axis is by specifying :value
as the field for the y-axis instead of (:field, "value")
.
Rel automatically understands in this case that :value
is the field you are using for the y-axis.
Also note that unlike the array format you used in the previous examples, the convenience relations (i.e., vegalite:bar
) take the data in a (field, keys..., value)
format.
This is essentially the same format that is returned from relations such as load_csv
.
This makes convenience relations extremely useful for quick loading and plotting for certain types of data.
Using JSON Strings
In addition to specifying the chart configuration in Rel, you can also directly provide it as a JSON
string.
This functionality is very useful when you want to develop a chart specification outside of Rel, for example, in the Vega-Lite Editor (opens in a new tab), and then use it to directly visualize your data.
Here is an example with a JSON
specification for a simple bar chart:
// read query
// Assign the data.
def chart:data:values = small_data
// Build the chart specification in JSON.
def chart = parse_json["""{
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {"field": "category", "type": "nominal", "axis": {"labelAngle": 270}},
"y": {"field": "value", "type": "quantitative"}
}
}"""]
// Display.
def output = ::std::display::vegalite::plot[chart]
In the following examples, Rel will configure the different Vega-Lite (opens in a new tab) charts.
Example Charts
Simple Bar Chart
The first example creates a simple bar chart with real data. The chart uses the penguin data discussed in the previous section. Here, the number of penguins on each island is plotted in the dataset:
// read query
// Assign the data.
def chart:data:values = penguin_array
def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true
def chart = vegalite_utils:x[{
(:field, "island");
(:title, "Island");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]
def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]
// Display.
def output = ::std::display::vegalite::plot[chart]
Stacked Bar Chart
The next example expands on the simple bar chart by creating a stacked version.
The stacked bar example plots the number of male and female penguins per species. More specifically, this example generates a bar chart where the horizontal axis is the species, and the vertical axis has stacked bars showing the number of male and female penguins within each species:
// read query
// Assign the data.
def chart:data:values = penguin_array
def chart:mark:type = "bar"
def chart:mark:tooltip = boolean_true
def chart = vegalite_utils:x[{
(:field, "species");
(:title, "Penguin Species");
(:type, "ordinal");
(:axis, {
(:labelAngle, 45);
(:ticks, boolean_true);
(:grid, boolean_true);
})
}]
def chart = vegalite_utils:y[{
(:aggregate, "count");
(:type, "quantitative");
}]
def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]
// Display.
def output = ::std::display::vegalite::plot[chart]
Scatterplot
The next example shows you how to build a scatter plot. This example plots the culmen depth versus the culmen length of each penguin. The culmen is the upper ridge of a penguin’s beak. This kind of plot can show potential correlation between these two values.
The horizontal axis of the scatter plot shows the culmen depth in millimeters, and the vertical axis shows the culmen length, also in millimeters. Each point in the plot corresponds to one instance in the data, i.e., one penguin.
Here is the code to generate this scatterplot:
// read query
// Assign the data.
def chart:data:values = penguin_array
def chart:mark = "point"
def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
// Display.
def output = ::std::display::vegalite::plot[chart]
Note: The range of both the axes has been altered through the use of the scale
parameter in order to provide a better view of the data.
For more details on the different options for both the scatterplot and the rest of the plots in this how-to guide, see the Vega-Lite documentation (opens in a new tab).
You can easily use the power of Rel and Vega-Lite to make this last example even more elaborate. For example, you can display which of the points in the scatterplot belong to male versus female penguins:
// read query
// Assign the data.
def chart:data:values = penguin_array
def chart:mark = "point"
def chart = vegalite_utils:x[{
(:field, "culmen_depth_mm");
(:title, "Culmen depth (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
def chart = vegalite_utils:y[{
(:field, "culmen_length_mm");
(:title, "Culmen length (mm)");
(:type, "quantitative");
(:scale, :zero, boolean_false);
}]
def chart = vegalite_utils:color[{
(:field, "sex");
(:type, "nominal");
(:scale, :domain, :[], {(1, "MALE"); (2, "FEMALE")});
(:title, "Penguin Sex");
}]
// Display.
def output = ::std::display::vegalite::plot[chart]
Line Chart
The next example explores how to create a line chart.
This example creates artificial data using a mathematical function in Rel (natural_exp
).
The following code generates x
values from -10.0
to 10.0
with a step of 0.1
, computes the sigmoid function for each x
, and plots the results using a line chart:
// read query
def mydata = 1.0 / (1.0 + natural_exp[-x]) for x in range[-10.0, 10.0, 0.1]
def chart:data:values[:[], i] =
{(:x, a); (:sigmoid_x, b)} from a, b where sort[mydata](i, a, b)
def chart:mark = "line"
def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]
def chart = vegalite_utils:y[{
(:field, "sigmoid_x");
(:type, "quantitative");
}]
// Display.
def output = ::std::display::vegalite::plot[chart]
You can also plot multiple lines in the same graph, with the appropriate legend on the side and with the accent
color scheme:
// read query
def x_vals = range[0.0, 8*pi_float64, 0.1]
def n = count[x_vals]
def data:x = r, x : sort[x_vals](i, x) and (i = r or r = i+n) from i
def data:function = range[1, n, 1], "cos[x]"
def data:function = range[n+1, 2*n, 1], "sin[x]"
def data:value = cos[data[:x, i]] for i where i <= n
def data:value = sin[data[:x, i]] for i where i > n
def chart = vegalite_utils:data[data]
def chart:mark = "line"
def chart:width = 400
def chart:height = 200
def chart = vegalite_utils:x[{
(:field, "x");
(:type, "quantitative");
}]
def chart = vegalite_utils:y[{
(:field, "value");
(:type, "quantitative");
}]
def chart = vegalite_utils:color[{
(:field, "function");
(:type, "nominal");
(:scale, :scheme, "accent");
}]
// Display.
def output = ::std::display::vegalite::plot[chart]
Overlaying Multiple Charts
The combination of Rel and Vega-Lite (opens in a new tab) allows you to create interesting charts where you can show two different plots on the same chart. The next example generates weekly sales data over the period of one year (52 weeks). This example shows the number of sales in each week with a bar chart and overlays the running total of sales with a line on the chart. It also uses two separate y-axes since the scales of the two plots are different:
// read query
def n_weeks = 52
def mydata:x = sort[range[1, n_weeks, 1]]
def mydata:sales[i] = sin[mydata:x[i] / n_weeks * pi_float64]^2
def mydata:cumulative[1] = mydata:sales[1]
def mydata:cumulative[i] = mydata:sales[i] + mydata:cumulative[i-1]
def chart = vegalite_utils:data[mydata]
def chart:width = 500
def chart:height = 200
def chart:resolve = { (:scale, :y, "independent"); }
def chart = vegalite_utils:x[{
(:field, "x");
(:title, "week of year");
(:type, "quantitative");
}]
def chart = vegalite_utils:y[{
(:type, "quantitative");
}]
def chart[:layer, :[], 1] = {
(:mark, {(:type, "bar"); (:opacity, 0.5)});
(:encoding, :y, {(:field, "sales"); (:title, "weekly sales")});
}
def chart[:layer, :[], 2] = {
(:mark, {(:type, "line"); (:color, "orange"); (:size, 3)});
(:encoding, :y, {(:field, "cumulative"); (:title, "cumulative sales")});
}
// Display.
def output = ::std::display::vegalite::plot[chart]
Marginal Histograms
The next example shows how to create marginal histograms. These are histograms displayed at the sides, or margins, of a scatterplot’s axes to show the distribution of each measurement.
The following example uses the penguin data and plots the penguins’ culmen_depth
and culmen_length
in a marginal histogram:
// read query
// Generate the data.
def mydata:x = penguin:culmen_depth_mm
def mydata:y = penguin:culmen_length_mm
// Set up the graph.
def chart = vegalite_utils:data[mydata]
def chart:spacing = 15
def chart:bounds = "flush"
def chart[:vconcat, :[], 1] = {
(:mark, "bar");
(:height, 60);
vegalite_utils:x[{(:x); (:bin, boolean_true); (:axis, missing);}];
vegalite_utils:y[{(:y); (:aggregate, "count"); (:title, "");}];
}
def chart[:vconcat, :[], 2] = {
(:spacing, 15);
(:bounds, "flush");
(:hconcat, :[], 1, {
(:mark, "rect");
vegalite_utils:x[{(:x); (:bin, boolean_true); (:title, "culmen_depth_mm")}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:title, "culmen_length_mm")}];
vegalite_utils:color[{(:aggregate, "count");}];
});
(:hconcat, :[], 2, {
(:mark, "bar");
(:width, 60);
vegalite_utils:x[{(:x); (:aggregate, "count"); (:title, "");}];
vegalite_utils:y[{(:y); (:bin, boolean_true); (:axis, missing);}];
});
}
def chart:config:view:stroke = "transparent"
// Display.
def output = ::std::display::vegalite::plot[chart]
Summary
In this guide, you have learned how to generate visualizations with Rel using either existing datasets or by creating data using mathematical functions. Rel, combined with Vega-Lite, provides a very powerful tool for visualizing and exploring data. More information on Rel can be found in the Rel Language Reference, and information on the different types of charts and their parameters for Vega-Lite can be found in the Vega-Lite documentation (opens in a new tab).
See Also
In order to visualize data, Rel currently makes use of Vega (opens in a new tab) and Vega-Lite (opens in a new tab). This guide has focused on Vega-Lite (opens in a new tab) for its simplicity and good coverage of a large variety of visualization needs. For more advanced visualization needs, see Vega (opens in a new tab).
Two resources that may be useful in addition to this guide are the CSV Import and JSON Import guides. They contain examples and functionality useful for understanding how to appropriately load different kinds of data into the system, which can then be visualized through Vega-Lite (opens in a new tab).