# My First Rel Program

This tutorial is designed to give users their first introduction to RAI's declarative language, Rel

You can optionally download this tutorial as a RAI notebook by clicking here. You can then upload it to the RAI Console to work through the code examples interactively.

## Goal#

In this tutorial, you will learn how to write simple facts in Rel and use basic queries to explore your data. After completing this tutorial, you should have a basic understanding of how to express knowledge in Rel, import CSV files, and build simple data models.

## Scenario#

In this tutorial, you imagine you are a developer of electric car charging stations, and you want to find where in the United States to build your first five stations. To find a solution, you ask two questions about your data:

1. In which states are people are likely to buy an EV (electric vehicle)?
2. Where can you position your stations in order to reach the most EV drivers?

## Organizing Facts#

To keep things simple, you begin by collecting information about just three states: California, Missouri, and Delaware. You choose these states because they present a good sample set: based on population, California is a large state, Missouri a medium state, and Delaware a small state. California is the largest EV producer in the US – will that also mean that it is the state where people are most likely to buy an EV?

From here, you can create a table to order relevant facts about these states: population and EV count. Each state has a population attribute (source: wikipedia) and an EV registration count attribute (source: afdc.energy.gov).

The data looks like this:

StatePopulationEV Registration
California39512223425300
Delaware9737641950
Missouri61374286740

Every row in the table has three columns:

• State name
• Population
• EV registration

With this information, you can now start to derive the number of EV registrations per 1000 people:

$$\text{EV penetration} = 1000\ \frac{\text{EV registrations}}{\text{Population}}.$$

## Using Rel#

Now you’re ready to begin using Rel, RAI’s declarative modeling and query language. Rel will let you experiment with your ideas about state population and EV registrations. Think of this as building a model and asking questions of the model.

In Rel, you express data in terms of stated facts. You can express the data gathered as follows:

def population = {    ("California", 39512223);    ("Delaware", 973764);    ("Missouri", 6137428)}def registration = {    ("California", 425300);    ("Delaware", 1950);    ("Missouri", 6740)}

Each line above defines a relationship, or relation. On the left of the equals sign is the name of the relation. On the right is its definition. The first relation is named population and contains the name and population of each state. The second relation, registration, is defined similarly and holds the EV registration count for each state.

Now you can use Rel to derive new facts from this information. In this case, this is how we define EV penetration in Rel:

def penetration(state, value) =    registration(state, r)    and population(state, p)    and value = 1000 * r / p    from r, p

You have used the power of Rel for the first time.

If you look at this code closely, you’ll notice that the relation penetration specifies two values:

• state: the state name in the first element, and
• value: the penetration value in the second element.

The penetration relation is generated by joining the registration and population relations using the and clause. Notice that the variable state is common to both relations, population and registration. That is, state joins, or connects, these relations. You will learn more about joins later.

For each state, Rel looks up the registration count r and the population p and in the fourth line calculates the penetration value.

The last line, starting with from, states that a registration count r and population p for each state needs to exist. If either piece of information is missing, you can’t calculate the penetration.

It’s time to look at the results. To do so, assign the desired output, penetration, to the relation output.

def output = penetration

Relation: output

 "California" 10.7638 "Delaware" 2.00254 "Missouri" 1.09818

From this tabular information, it’s clear that California is not only the most populated state but also the state with the highest EV penetration. So choosing California as the state to start building charging stations is a safe choice.

This was a simple example to show what you can do with Rel, but don’t be too hasty with conclusions without analyzing all 50 US states.

You can only build five charging stations and want to maximize the number of EV drivers they reach. California is a very large state, so you probably can’t reach all 39 million people there with five charging stations; there may be a better arrangement if you also consider other states.

## Importing Data#

You can now pull in facts about all 50 states by importing data from a CSV file. To do so, use another set of Rel commands:

module config_50_states    def path = "s3://relationalai-documentation-public/tutorial/state_statistics.csv"    def schema = {        (:state, "string");        (:area, "int");        (:population, "int");        (:ev_registration_count, "int")    }enddef insert[:data] = lined_csv[load_csv[config_50_states]]

The first nine lines of code define the module config_50_states that contains all the import configurations. It specifies the file location (path) and its schema, ensuring that the CSV data gets imported correctly into the database (for details see the CSV import guide). The load_csv command loads the data. The lined_csv wrapper is a convenience helper that lets you refer to each entry in the CSV file by its line number. The def insert[:data] expression tells the system to store the imported data permanently in the relation data (see the Updating data guide for further details).

Now you can confirm that the data was imported successfully:

def output = table[data]

Relation: output

areaev_registration_countpopulationstate
15064528904903185"Alabama"
3113594287707278717"Arizona"
45203513303017804"Arkansas"
515577942530039512223"California"
7484290403565287"Connecticut"
819491950973764"Delaware"
9612360705749"District of Columbia"
10536255816021477737"Florida"
11575132353010617423"Georgia"
126423106701415872"Hawaii"
138264323001787065"Idaho"
14555192600012671821"Illinois"
153582669906732219"Indiana"
165585722603155070"Iowa"
178175931302913314"Kansas"
183948626504467673"Kentucky"
194320419504648794"Louisiana"
203084319201344212"Maine"
219707179706045680"Maryland"
227800210106892503"Massachusetts"
2356539106209986857"Michigan"
2479627103805639632"Minnesota"
25469237802976149"Mississippi"
266874267406137428"Missouri"
271455469401068778"Montana"
30895326901359711"New Hampshire"
317354304208882190"New Jersey"
3212129826202096829"New Mexico"
33471263259019453561"New York"
34486181619010488084"North Carolina"
3569001220762062"North Dakota"
36408611453011689100"Ohio"
376859534103956971"Oklahoma"
3895988228504217737"Oregon"
39447431753012801989"Pennsylvania"
40103415801059361"Rhode Island"
413006143905148714"South Carolina"
4275811410884659"South Dakota"
434123578106829174"Tennessee"
442612325219028995881"Texas"
4582170112303205958"Utah"
4692172230623989"Vermont"
4739490205108535519"Virginia"
4866456505207614893"Washington"
49240386001792147"West Virginia"
505415863105822434"Wisconsin"
5197093330578759"Wyoming"

With the relation table, data is displayed as a table.

Reading the specified CSV file, the Rel CSV import automatically does what you did by hand above. It takes a row of data, chops it up into single pieces (state, area, population, and EV registrations) and gives an identifier to each piece. Every column is stored as a separate subrelation in data and the created row identifier connects the attributes belonging to the same row with each other.

For instance, to query all the states that begin with the letter “A”, you do the following:

def output(name) = data:state(Any, name) and like_match("A%", name)

Relation: output

The first relation data:state(Any, name) says that we want all the US state names and we don’t mind which row they are in. The second relation like_match specifies that the state must start with “A”.

## Discovering New Insights#

Now that you understand how to write about data and pull it in from your data source, the next step is to use Rel to run queries that will help determine where to build the charging stations.

You now need to calculate the EV penetration value again for all 50 states:

def penetration_all_states(value, name) =    data:state(row, name)    and data:ev_registration_count(row, r)    and data:population(row, p)    and value = 1000 * r / p    from row, p, rdef highest_penetration_states = bottom[10, penetration_all_states]

You write an expression that is very similar to what you wrote above. Create a relation penetration_all_states that holds the penetration in the value variable and the state name in the name variable.

Because you have imported CSV data, you need to specify the row ID and the column names you want to use. In the first three lines you join the columns with the state name, registration count, and population based on their row. Remember, you don’t have to understand this in depth right now; you will learn more in later tutorials and guides.

Calculate the penetration value again. In the last line you again specify that the values of row, p, and r must exist.

The definition of highest_penetration_states uses the Rel Standard Library’s bottom utility, which sorts a relation in increasing order and takes the last N values–the ones with the largest values.

From that information, you can list the states with the highest rates of penetration:

def output = highest_penetration_states

Relation: output

 1 10.7638 "California" 2 7.53599 "Hawaii" 3 6.63437 "Washington" 4 5.4176 "Oregon" 5 4.28393 "Colorado" 6 3.95262 "Arizona" 7 3.58423 "Nevada" 8 3.57378 "Vermont" 9 3.50285 "Utah" 10 3.42483 "New Jersey"

You can see that the top state is still California, followed by Hawaii and Washington. This is the first step in your analysis.

You want the five charging stations to reach the maximum possible number of customers. To calculate this, consider the EV density per area:

def density_all_states(value, name) =    data(:state, row, name)    and data(:ev_registration_count, row, r)    and data(:area, row, a)    and value = r / a    from row, a, rdef highest_density_state = bottom[10, density_all_states]

This code follows the same patterns as you have seen above.

Next, you can list the densest states:

def output = highest_density_state

Relation: output

 1 38.6885 "District of Columbia" 2 4.13652 "New Jersey" 3 2.73015 "California" 4 2.69359 "Massachusetts" 5 1.867 "Connecticut" 6 1.85124 "Maryland" 7 1.66122 "Hawaii" 8 1.52805 "Rhode Island" 9 1.08457 "Florida" 10 1.00051 "Delaware"

To optimize for both EV penetration and EV density, you can cross-check the states to find out which appear in both top-10 lists. To do this, you create a join. A join connects two or more relations and assigns the result to a new relation.

To find the best states, you can join your two top-10 relations, keeping only the state names that appear in both.

def best_states(state) =    highest_density_state(_, _, state)    and highest_penetration_states(_, _, state)

This definition uses the _ expression. _ simply indicates that you are not interested in the values in the first and second positions of the relations highest_density_state and highest_penetration_states. You are only interested in the third position of each relation, which holds the name of the US state. This lets you define a new relation, best_states, which holds only the state names.

Now you can list them:

def output = best_states

Relation: output

 "California" "Hawaii" "New Jersey"

The best states for the five charging stations are California, Hawaii, and New Jersey. These states have the highest population density and the deepest penetration of EV ownership.

As a final step, you can calculate how many customers driving EVs you can actually reach in each of these three states. Here, you assume that each charging station has an impact radius of 10 miles, meaning EV owners living within 10 miles of a charging station are considered potential customers.

Think of this as the maximum distance-to-owner for the charging stations.

def impact_area = 314.16  //  impact area = π * r^2, where r = 10 milesdef output(state, user_count) =    best_states(state)    and density_all_states(d, state)    and user_count = round[:ROUND_NEAREST, 5 * impact_area * d]    from d

Relation: output

 "California" 4289 "Hawaii" 2609 "New Jersey" 6498

First, define the value for the impact area. In the following lines, you join the best_states and density_all_states relations. Notice that the variable state is common to both relations for the join.

You have also defined a new variable, user_count, which is the expected number of users for the charging stations and it will be returned in the second position in output. With the relation round and the element :ROUND_NEAREST, you round the user_count to the nearest integer number.

The output relation returns the state name along with the expected user_count, as you can see above.

It’s clear that of the three best states, you reach the most customers in New Jersey. You were able to come to this conclusion by taking the facts you collected about individual states and using Rel to ask questions directly related to your business operations, generating new insights from your data. Now your car charging company can start helping the world use green energy in the most efficient way possible!

## Summary#

This tutorial has shown how to organize data using Rel, how to import from a data source, and how to run queries against the data. You have learned how to derive new insights by building a simple business model to help make the best investment decision for your example business.