My First Rel Program

This tutorial is designed to give users their first introduction to RAI’s declarative language, Rel.

Goal

In this tutorial, you will learn how to write simple facts in Rel and use basic queries to explore your data. After completing this tutorial, you should have a basic understanding of how to express knowledge in Rel, import CSV files, and build simple data models.

Scenario

Imagine you are a developer of electric vehicle (EV) charging stations, and you want to find where in the United States to build your first five stations. To find a solution, you ask two questions about your data:

In which states are people likely to buy an EV?
Where can you position your stations in order to reach the most EV drivers?

Organizing Facts

To keep things simple, you begin by collecting information about just three states: California, Missouri, and Delaware. You choose these states because they present a good sample set: based on population, California is a large state, Missouri a medium state, and Delaware a small state. California is the largest EV producer in the US — will that also mean that it is the state where people are most likely to buy an EV?

From here, you can create a table to order relevant facts about these states: population and EV count. Each state has a population property (source: wikipedia (opens in a new tab)) and an EV registration count property (source: afdc.energy.gov (opens in a new tab)).

The data look like this:

State	Population	EV Registration
California	39512223	425300
Delaware	973764	1950
Missouri	6137428	6740

Every row in the table has three columns:

State name
Population
EV registration

With this information, you can now start to derive the number of EV registrations per 1000 people:

\text{EV penetration} = 1000\ \frac{\text{EV registrations}}{\text{Population}}.

Using Rel

Now you’re ready to begin using Rel, RAI’s declarative modeling and query language. Rel will let you experiment with your ideas about state population and EV registrations. Think of this as building a model and asking questions of the model.

In Rel, you express data in terms of stated facts. You can express the data gathered as follows:

// model
 
def population {
    ("California", 39512223);
    ("Delaware", 973764);
    ("Missouri", 6137428)
}
 
def registration {
    ("California", 425300);
    ("Delaware", 1950);
    ("Missouri", 6740)
}

🔎

Note that you will not see any output results from your code just yet. You are loading your model (notice you are executing a model transaction), and later you will use a read query transaction to generate the results.

Each line above defines a relationship, or relation. On the left of the equals sign is the name of the relation. On the right is its definition. The first relation is named population and contains the name and population of each state. The second relation, registration, is defined similarly and holds the EV registration count for each state.

Now you can use Rel to derive new facts from this information. In this case, this is how we define EV penetration in Rel:

// model
 
def penetration(state, value) {
    exists(r, p :
        registration(state, r)
        and population(state, p)
        and value = 1000 * r / p
    )
}

You have used the power of Rel for the first time.

If you look at this code closely, you’ll notice that the relation penetration specifies two values:

state: the state name in the first element, and
value: the penetration value in the second element.

The first line, which begins with exists, specifies that for each state, there must exist a registration count r and a population p. If either piece of information is missing, you can’t calculate the penetration.

The penetration relation is generated by joining the registration and population relations using the and clause. Notice that the variable state is common to both relations, population and registration. That is, state joins, or connects, these relations. You will learn more about joins later.

For each state, Rel looks up the registration count r and the population p and in the fourth line calculates the penetration value.

It’s time to look at the results. To do so, assign the desired output, penetration, to the relation output.

// read query
 
def output = penetration

🔎

When you are ready for Rel to calculate your results, you create the relation output in a query, which is, by convention, the relation that the system displays. This is how you run your queries against the model you installed.

From this tabular information, it’s clear that California is not only the most populated state, but it is also the state with the highest EV penetration. So choosing California as the state to start building charging stations is a safe choice.

This was a simple example to show what you can do with Rel, but don’t be too hasty with conclusions without analyzing all 50 US states.

You can only build five charging stations and want to maximize the number of EV drivers they reach. California is a very large state, so you probably can’t reach all 39 million people there with five charging stations; there may be a better arrangement if you also consider other states.

Importing Data

You can now pull in facts about all 50 states by importing data from a CSV file. To do so, use another set of Rel commands:

// write query
 
module config_50_states
    def path = "azure://raidocs.blob.core.windows.net/datasets/states/state_statistics.csv"
    def schema {
        (:state, "string");
        (:area, "int");
        (:population, "int");
        (:ev_registration_count, "int")
    }
end
 
def insert[:data] = lined_csv[load_csv[config_50_states]]

The first nine lines of code define the module config_50_states that contains all the import configurations. It specifies the file location (path) and its schema, ensuring that the CSV data get imported correctly into the database. For more details, see CSV Import. The load_csv command loads the data. The lined_csv wrapper is a convenience helper that lets you refer to each entry in the CSV file by its line number. The def insert[:data] statement tells the system to store the imported data permanently in the relation data. See Working With Base Relations for more details.

Now you can confirm that the data were imported successfully:

// read query
 
def output = ::std::display::table[data]

With the relation table, data is displayed as a table.

Reading the specified CSV file, the Rel CSV import automatically does what you did by hand above. It takes a row of data, chops it up into single pieces (state, area, population, and EV registrations) and gives an identifier to each piece. Every column is stored as a separate subrelation in data and the created row identifier connects the attributes belonging to the same row with each other.

For instance, to query all the states that begin with the letter “A”, you do the following:

// read query
 
def output(name) = data:state(Any, name) and like_match("A\%", name)

The first relation data:state(Any, name) says that you want all the US state names and don’t mind which row they are in. The second relation like_match specifies that the state must start with “A”.

Discovering New Insights

Now that you understand how to write about data and pull it in from your data source, the next step is to use Rel to run queries that will help determine where to build the charging stations.

You now need to calculate the EV penetration value again for all 50 states:

// model
 
def penetration_all_states(value, name) {
    exists(row, p, r:
        data:state(row, name)
        and data:ev_registration_count(row, r)
        and data:population(row, p)
        and value = 1000 * r / p
    )
}
 
def highest_penetration_states = bottom[10, penetration_all_states]

You write an expression that is very similar to what you wrote above. Create a relation penetration_all_states that holds the penetration in the value variable and the state name in the name variable.

Because you have imported CSV data, you need to specify the row ID and the column names you want to use. In the first three lines you join the columns with the state name, registration count, and population based on their row. Remember, you don’t have to understand this in depth right now; you will learn more in later tutorials and guides.

Calculate the penetration value again. In the last line you again specify that the values of row, p, and r must exist.

The definition of highest_penetration_states uses the Rel Standard Library’s bottom utility, which sorts a relation in increasing order and takes the last N values–the ones with the largest values.

From that information, you can list the states with the highest rates of penetration:

// read query
 
def output = highest_penetration_states

You can see that the top state is still California, followed by Hawaii and Washington. This is the first step in your analysis.

You want the five charging stations to reach the maximum possible number of customers. To calculate this, consider the EV density per area:

// model
 
def density_all_states(value, name) {
    exists(row, a, r:
        data:state(row, name)
        and data:ev_registration_count(row, r)
        and data:area(row, a)
        and value = r / a
    )
}
 
def highest_density_state = bottom[10, density_all_states]

This code follows the same patterns as you have seen above.

Next, you can list the densest states:

// read query
 
def output = highest_density_state

Optimizing Your Investment

To optimize for both EV penetration and EV density, you can cross-check the states to find out which appear in both top-10 lists. To do this, you create a join. A join connects two or more relations and assigns the result to a new relation.

To find the best states, you can join your two top-10 relations, keeping only the state names that appear in both.

// model
 
def best_states(state) {
    highest_density_state(_, _, state)
    and highest_penetration_states(_, _, state)
}

This definition uses the _ expression. This simply indicates that you are not interested in the values in the first and second positions of the relations highest_density_state and highest_penetration_states. You are only interested in the third position of each relation, which holds the name of the US state. This lets you define a new relation, best_states, which holds only the state names.

Now you can list them:

// read query
 
def output = best_states

The best states for the five charging stations are California, Hawaii, and New Jersey. These states have the highest population density and the deepest penetration of EV ownership.

As a final step, you can calculate how many customers driving EVs you can actually reach in each of these three states. Here, you assume that each charging station has an impact radius of 10 miles, meaning EV owners living within 10 miles of a charging station are considered potential customers.

Think of this as the maximum distance-to-owner for the charging stations.

// read query
 
def impact_area = 314.16  //  impact area = π * r^2, where r = 10 miles
 
def output(state, user_count) {
    exists(d:
        best_states(state)
        and density_all_states(d, state)
        and user_count = round[:ROUND_NEAREST, 5 * impact_area * d]
    )
 
}

First, define the value for the impact area. In the following lines, you join the best_states and density_all_states relations. Notice that the variable state is common to both relations for the join.

You have also defined a new variable, user_count, which is the expected number of users for the charging stations and it will be returned in the second position in output. With the relation round and the element :ROUND_NEAREST, you round the user_count to the nearest integer number.

The output relation returns the state name along with the expected user_count, as you can see above.

It’s clear that of the three best states, you reach the most customers in New Jersey. You were able to come to this conclusion by taking the facts you collected about individual states and using Rel to ask questions directly related to your business operations, generating new insights from your data. Now your car charging company can start helping the world use green energy in the most efficient way possible!

Summary

This tutorial has shown how to organize data using Rel, how to import from a data source, and how to run queries against the data. You have learned how to derive new insights by building a simple business model to help make the best investment decision for your example business.

Was this doc helpful?