# My First Rel Program

This tutorial is designed to give users their first introduction to RAI's declarative language, Rel

## Goal#

In this tutorial, we will learn how to write simple facts in Rel and use basic queries to explore our data. By the end, you should have a basic understanding of how to express knowledge in Rel, import CSV files, and build simple data models.

## Scenario#

Let’s imagine we are a developer of electric car charging stations. We want to identify where in the United States to build our first five charging stations. It makes business sense to target states where people are most likely to buy an EV (electric vehicle). It also makes sense to try to discover how to reach the most EV drivers with the positioning of our five charging stations. If we can find the answers to these two questions, we will be able to get the best results from our investment in our infrastructure.

## Organizing Facts#

To keep things simple, let’s begin by collecting information about just three states: California, Missouri, and Delaware. We can conceptualize these states as large, medium, and small states respectively. Let’s apply some intuition before we start. We know that California is the most populated state, Delaware is tiny, and Missouri feels somewhere in the middle. We also know that California is the largest EV producer in the US. Will that also mean that it is the state where people are most likely to buy an EV? Let’s find out!

We can create a table to order our facts about these states. Each state has a population attribute (source: wikipedia) and an EV registration count attribute (source: afdc.energy.gov). Knowing these two facts, we can start to derive the number of EV registrations per 1000 people:

$$\text{EV penetration} = 1000\ \frac{\text{EV registrations}}{\text{Population}}.$$

First, we look at the data and write down what we know:

StatePopulationEV Registration
California39512223425300
Delaware9737641950
Missouri61374286740

Every row in our table has three columns:

• State name
• Population
• EV registration

## Using Rel#

Now let’s investigate Rel, which is RAI’s declarative modeling and query language. Rel will allow us to experiment with our ideas about state population and EV registrations. We think of this as building a model and asking questions of the model.

In Rel we express data in terms of stated facts. Using the data we gathered above, we write:

install
def population = {    ("California", 39512223);    ("Delaware", 973764);    ("Missouri", 6137428)}def registration = {    ("California", 425300);    ("Delaware", 1950);    ("Missouri", 6740)}

What have we done here? Each line above defines a relationship, or relation. On the left of the equals sign we have the name of the relation. On the right is its definition. The first relation is named population and contains the name and population of each state. The second relation, registration, is defined similarly and holds the EV registration count for each state.

Now we can use Rel to derive new facts from this information. In our case, this is how we define EV penetration in Rel:

install
def penetration(state, value) =        registration(state, r)    and population(state, p)    and value = 1000 * r / p    from r, p

Here, we have used the power of Rel for the first time. Let’s look at it closely to understand the code here.

We have defined the relation penetration, which specifies two values:

• state: the state name in the first parameter, and
• value: the penetration value in the second parameter.

The penetration relation is generated by joining the registration and population relations using the and clause. Notice that the variable state is common to both relations, population and registration. That is, state joins, or connects, these relations. We will learn more about joins later.

For each state, we look up the registration count r and the population p and in the fourth line we use this formula to calculate the penetration value.

The last line, starting with from, states that a registration count r and population p for each state needs to exist. If either piece of information is missing, we can’t calculate the penetration.

It’s time to look at the results. To do so, we assign the desired output, penetration, to the relation output.

query
def output = penetration

Relation: output

 "California" 10.7638 "Delaware" 2.00254 "Missouri" 1.09818

From this tabular information, we can see that California is not only the most populated state but also the state with the highest EV penetration. So choosing California as the state to start building our charging stations is a safe choice.

This was a simple example to show what we can do with Rel, but we shouldn’t be too hasty with conclusions without analyzing all 50 US states.

We can only build five charging stations and we want to maximize the number of EV drivers we can reach with our stations. California is a very large state, so we probably can’t reach all 39 million people there with our five charging stations; we may be able to discover a better arrangement if we also consider other states.

## Importing Data#

Let’s now pull in facts about all 50 states by importing data from a CSV file. To do so, we use another set of Rel commands:

install
def config[:path] = "s3://relationalai-documentation-public/tutorial/state_statistics.csv"def config[:schema, :state] = "string"def config[:schema, :area] = "int"def config[:schema, :population] = "int"def config[:schema, :ev_registration_count] = "int"def data = lined_csv[load_csv[config]]

In the first five lines of code, we have defined the file path and the CSV file schema. We have also specified our database schema by identifying a type for each column: text, integer, and so on. By doing so, we tell the Rel CSV import which parts of the CSV file we care about. Using the load_csv command, we then load the data and store the imported data in the relation data.

Reading the specified CSV file, the Rel CSV import automatically does what we did by hand above. It takes a row of data, chops it up into single pieces (state, area, population, and EV registrations) and gives an identifier to each piece. Every column is stored as a separate subrelation in data and the created row identifier connects the attributes belonging to the same row with each other.

It’s not important to understand these commands in detail yet; we’re just using them to import our CSV file. Later, we will learn more and begin to recognize common patterns.

Let’s confirm that we loaded the file successfully:

query
def output = table[data]

Relation: output

areaev_registration_countpopulationstate
15064528904903185"Alabama"
3113594287707278717"Arizona"
45203513303017804"Arkansas"
515577942530039512223"California"
7484290403565287"Connecticut"
819491950973764"Delaware"
9612360705749"District of Columbia"
10536255816021477737"Florida"
11575132353010617423"Georgia"
126423106701415872"Hawaii"
138264323001787065"Idaho"
14555192600012671821"Illinois"
153582669906732219"Indiana"
165585722603155070"Iowa"
178175931302913314"Kansas"
183948626504467673"Kentucky"
194320419504648794"Louisiana"
203084319201344212"Maine"
219707179706045680"Maryland"
227800210106892503"Massachusetts"
2356539106209986857"Michigan"
2479627103805639632"Minnesota"
25469237802976149"Mississippi"
266874267406137428"Missouri"
271455469401068778"Montana"
30895326901359711"New Hampshire"
317354304208882190"New Jersey"
3212129826202096829"New Mexico"
33471263259019453561"New York"
34486181619010488084"North Carolina"
3569001220762062"North Dakota"
36408611453011689100"Ohio"
376859534103956971"Oklahoma"
3895988228504217737"Oregon"
39447431753012801989"Pennsylvania"
40103415801059361"Rhode Island"
413006143905148714"South Carolina"
4275811410884659"South Dakota"
434123578106829174"Tennessee"
442612325219028995881"Texas"
4582170112303205958"Utah"
4692172230623989"Vermont"
4739490205108535519"Virginia"
4866456505207614893"Washington"
49240386001792147"West Virginia"
505415863105822434"Wisconsin"
5197093330578759"Wyoming"

Great, our CSV import was successful.

## Discovering New Insights#

Now that we understand how to write about our data and pull it in from our data source, the next step is to use Rel to run queries that will help us determine where in the United States we should build our first five car charging stations.

We now need to calculate the EV penetration value again for all 50 states:

install
def penetration_all_states(value, name) =        data(:state, id, name)    and data(:ev_registration_count, id, r)    and data(:population, id, p)    and value = 1000 * r / p    from id, p, rdef highest_penetration_states = bottom[10, penetration_all_states]

We write an expression that is very similar to what we’ve written above. We create a relation penetration_all_states that holds the penetration in the value variable and the state name in the name variable.

Because we have imported our CSV data, we need to specify the row ID and the column names we want to use. In the first three lines we join the columns with the state name, registration count, and population based on their row id. Remember, we don’t have to understand this in depth right now; we will learn more in later tutorials and guides.

We then calculate the penetration value again. In the last line we again specify that the values of id, p, and r must exist.

The definition of highest_penetration_states uses the Rel Standard Library’s bottom utility, which sorts a relation in increasing order and takes the last N values–the ones with the largest values.

From that information, we can list the states with the highest rates of penetration:

query
def output = highest_penetration_states

Relation: output

 1 10.7638 "California" 2 7.53599 "Hawaii" 3 6.63437 "Washington" 4 5.4176 "Oregon" 5 4.28393 "Colorado" 6 3.95262 "Arizona" 7 3.58423 "Nevada" 8 3.57378 "Vermont" 9 3.50285 "Utah" 10 3.42483 "New Jersey"

We can see that the top state is still California, followed by Hawaii and Washington. This is the first step in our analysis.

We want our five charging stations to reach the maximum possible number of customers. To calculate this, we want to consider the EV density per area:

install
def density_all_states(value, name) =        data(:state, id, name)    and data(:ev_registration_count, id, r)    and data(:area, id, a)    and value = r / a    from id, a, rdef highest_density_state = bottom[10, density_all_states]

We should be able to read this code as it follows the same patterns as we have already seen above.

Let’s list the densest states:

query
def output = highest_density_state

Relation: output

 1 38.6885 "District of Columbia" 2 4.13652 "New Jersey" 3 2.73015 "California" 4 2.69359 "Massachusetts" 5 1.867 "Connecticut" 6 1.85124 "Maryland" 7 1.66122 "Hawaii" 8 1.52805 "Rhode Island" 9 1.08457 "Florida" 10 1.00051 "Delaware"

## Optimizing Our Investment#

To optimize for both EV penetration and EV density, we can cross-check the states to find out which appear in both top-10 lists. To do this, we create a join. A join connects two or more relations and assigns the result to a new relation.

To find the information we’re looking for, we can join our two top-10 relations, keeping only the state names that appear in both.

install
def best_states(state) =        highest_density_state(_, _, state)    and highest_penetration_states(_, _, state)

Here we use the _ expression. This simply indicates that we are not interested in the values in the first and second positions of the relations highest_density_state and highest_penetration_states. We are only interested in the third position of each relation, which holds the name of the US state. This lets us define a new relation, best_states, which holds only the state names.

Let’s list them:

query
def output = best_states

Relation: output

 "California" "Hawaii" "New Jersey"

The best states for our five charging stations are California, Hawaii, and New Jersey. These states have the highest population density and the deepest penetration of EV ownership.

We can go one step further to help our car charging company. As a final step, let’s calculate how many customers driving EVs we can actually reach in each of these three states. To do this, we assume that each charging station has an impact radius of 10 miles, meaning EV owners living within 10 miles of a charging station are considered potential customers.

We can think of this as the maximum distance-to-owner for our charging stations.

query
def impact_area = 314.16  //  impact area = π * r^2, where r = 10 milesdef output(state, user_count) =        best_states(state)    and density_all_states(d, state)    and user_count = round[:ROUND_NEAREST, 5 * impact_area * d]    from d

Relation: output

 "California" 4289 "Hawaii" 2609 "New Jersey" 6498

First, we define the value for the impact area. In the following lines, we join the best_states and density_all_states relations. Notice that the variable state is common to both relations for the join.

We have also defined a new variable, user_count, which is the expected number of users for our charging stations and it will be returned in the second position in output. With the function round and the argument :ROUND_NEAREST we round the user_count to the nearest integer number.

The output relation returns the state name along with the expected user_count, as we can see above.

We can see that of the three best states, we reach the most customers in New Jersey. We were able to come to this conclusion by taking the facts we collected about individual states and using Rel to ask questions directly related to our business operations, generating new insights from our data. Now our car charging company can start helping the world use green energy in the most efficient way possible!

## Summary#

This tutorial has shown how to organize data using Rel, how to import from a data source, and how to run queries against the data. We have learned how to derive new insights by building a simple business model to help us make the best investment decision for our example business.