Define base facts
This guide shows how to define base facts from source data with Model.define.
You will learn how to load Python data, DataFrames, CSV files, or Snowflake tables into the concepts and relationships you’ve declared in your model.
- PyRel is installed and importable in Python. See Set Up Your Environment for instructions.
- You have a model instance with working configuration. See Create a Model Instance.
- You have declared concepts, relationships, and properties for the facts you want to load. See Declare Concepts and Declare Relationships and Properties.
- You have declared any data sources you want to load from. See Declare Data Sources.
Understand the difference between base and derived facts
Section titled “Understand the difference between base and derived facts”A fact is a statement about the world that your model captures, for example “Alice is 30 years old” or “Bob works at Acme”. Facts are the building blocks of your model and the raw material for your queries.
There are two types of facts:
- Base fact: A fact loaded directly from a source record, like a row in a CSV file or a Snowflake table.
- Derived fact: A fact computed from other facts, for example filtering or grouping to produce new concept membership, relationships, or property values.
Next, you’ll see patterns for defining base facts from different source formats.
Define entities and their properties with Concept.new
Section titled “Define entities and their properties with Concept.new”Create entity expressions for your base facts with Concept.new.
Set property values by passing keyword arguments.
Then pass the expressions to Model.define to add them as base facts in the model.
At a high level, Concept.new:
- Enforces identity: It uses your concept’s declared identity (
identify_by) to compute entity keys. If the concept has no explicit identity, identity is inferred from the keyword values you provide. If identity is declared, missing identity values raise an error when you define the expression. - Uses find-or-create behavior: If you create multiple
Concept.newexpressions with the same identity values, they refer to the same entity instead of creating duplicates. - Returns an expression: Produces a
Newexpression you can pass toModel.define. You can also use it inside other facts, for example as an argument to a relationship.
You can use Concept.new with Python literals or with table-like sources, such as Snowflake tables or DataFrames.
Use Python literals
Section titled “Use Python literals”Pass literal values directly into Concept.new calls to create entity expressions for small datasets or quick prototypes:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare some conceptsPerson = m.Concept("Person", identify_by={"id": Integer})Company = m.Concept("Company", identify_by={"name": String})
# Declare a propertyPerson.name = m.Property(f"{Person} is named {String:name}")
# Define some base entities from literal valuesm.define( Person.new(id=1, name="Alice"), Person.new(id=2, name="Bob"), Company.new(name="Acme"),)
# Verify Person entities were created as expectedperson_df = m.select(Person.id, Person.name).to_df()print(person_df)
# Verify Company entities were created as expectedcompany_df = m.select(Company.name).to_df()print(company_df)identify_by={"id": Integer}makesidthe stable entity key forPerson.- Each
Person.new(...)andCompany.new(...)call creates an entity expression for one source record. m.define(...)adds those entity expressions as base facts in the model.
Concept.newuses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.Concept.newtreats keyword arguments like property access. For example,Person.new(name="Alice")corresponds toPerson.name. If a keyword refers to an undeclared property andmodel.implicit_propertiesis disabled, PyRel raises an error to help you catch typos and mismatches.model.implicit_propertiesis enabled by default.
Use rows from tabular data
Section titled “Use rows from tabular data”Choose this when your source data is tabular.
You can stage rows in memory with Model.data or reference Snowflake rows with Model.Table.
Use the table below to choose a mapping option:
| What to use | When to use it |
|---|---|
| Map columns explicitly | You want precise control, column names do not match your Concept.new fields, or your table is wide and you only need one or two columns. |
Use .to_schema to map all columns | Column names already match the fields you want to pass into Concept.new. |
Use .to_schema with excluded columns | Your source has extra columns you don’t need, column names don’t match your Concept.new fields, or you need to map foreign key columns to entity instances instead. |
Option 1: Map columns explicitly
Section titled “Option 1: Map columns explicitly”Load tabular data and reference columns directly in Concept.new calls:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a propertyPerson = m.Concept("Person", identify_by={"id": Integer})Person.name = m.Property(f"{Person} is named {String:name}")
# Reference an existing Snowflake table by patht = m.Table( "DB.SCHEMA.PEOPLE", schema={ "id": Integer, "name": String, },)
# Define entities from source rowsm.define(Person.new(id=t.id, name=t.name))
# Verify entities were created as expectedp = Person.ref("p")df_people = m.select(p.id, p.name).to_df()print(df_people) # Expect (1, "Alice") and (2, "Bob") in any order.m.Table(...)creates a table-like reference to Snowflake rows.t.idandt.nameare column references you can pass intoConcept.new.m.define(Person.new(...))turns each input row into aPersonentity.- The final
selectquery verifies which entities were created.
import pandas as pd
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a propertyPerson = m.Concept("Person", identify_by={"id": Integer})Person.name = m.Property(f"{Person} is named {String:name}")
# Create a small in-memory datasetdf = pd.DataFrame([ {"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"},])
# Stage the DataFrame as a temporary table-like sourcet = m.data(df)
# Define entities from source rowsm.define(Person.new(id=t.id, name=t.name))
# Verify entities were created as expecteddf = m.select(Person.id, Person.name).to_df()print(df)m.data(df)stages a temporary table-like source backed by the DataFrame.t.idandt.namereference the DataFrame columns.m.define(Person.new(...))turns each DataFrame row into aPersonentity.- The final
selectquery verifies the loaded rows.
import pandas as pd
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a propertyPerson = m.Concept("Person", identify_by={"id": Integer})Person.name = m.Property(f"{Person} is named {String:name}")
# Load CSV rows into a DataFrame (then stage it with Model.data)df = pd.read_csv("people.csv", encoding="utf-8")t = m.data(df)
# Define entities from source rowsm.define(Person.new(id=t.id, name=t.name))
# Verify entities were created as expecteddf = m.select(Person.id, Person.name).to_df()print(df)pd.read_csv(...)loads rows into a DataFrame.m.data(df)stages that DataFrame so you can reference its columns.m.define(Person.new(...))defines onePersonentity per CSV row.- The final
selectquery verifies the loaded entities.
Concept.newuses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.Concept.newtreats keyword arguments like property access. For example,Person.new(name="Alice")corresponds toPerson.name. If a keyword refers to an undeclared property andmodel.implicit_propertiesis disabled, PyRel raises an error to help you catch typos and mismatches.model.implicit_propertiesis enabled by default.- If your in-memory input is a list of tuples instead of a DataFrame, unnamed integer columns are exposed as
col0,col1, and so on. - When working with Snowflake tables, prefer
t[0]when you want to avoid column name and case ambiguity entirely. Prefert["COL_NAME"]when a column name isn’t a valid Python identifier. Dot access (t.col_name) is a convenience for columns that are valid Python identifiers.
Option 2: Use .to_schema to map all columns
Section titled “Option 2: Use .to_schema to map all columns”Create a TableSchema object by calling Table.to_schema and pass it to Concept.new:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a propertyPerson = m.Concept("Person", identify_by={"id": Integer})Person.name = m.Property(f"{Person} is named {String:name}")
# Reference an existing Snowflake table by patht = m.Table( "DB.SCHEMA.PEOPLE", schema={ "id": Integer, "name": String, },)
# Map all columns into Concept.new by namem.define(Person.new(t.to_schema()))
# Verify entities were created as expecteddf = m.select(Person.id, Person.name).to_df()print(df)t.to_schema()builds a schema object containing all declared columns.Person.new(t.to_schema())maps columns intoConcept.newby name.- This option stays short when your column names already match the fields you want.
- The final
selectquery verifies the loaded entities.
import pandas as pd
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a propertyPerson = m.Concept("Person", identify_by={"id": Integer})Person.name = m.Property(f"{Person} is named {String:name}")
# Create a small in-memory datasetdf = pd.DataFrame([ {"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"},])
# Stage the DataFrame as a temporary table-like sourcet = m.data(df)
# Map all columns into Concept.new by namem.define(Person.new(t.to_schema()))
# Verify entities were created as expecteddf = m.select(Person.id, Person.name).to_df()print(df)t.to_schema()uses the DataFrame column names.Person.new(t.to_schema())maps those columns intoConcept.newby name.- This option is shortest when your DataFrame columns already match your model fields.
- The final
selectquery verifies the loaded entities.
import pandas as pd
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a propertyPerson = m.Concept("Person", identify_by={"id": Integer})Person.name = m.Property(f"{Person} is named {String:name}")
# Load CSV rows into a DataFrame (then stage it with Model.data)df = pd.read_csv("people.csv", encoding="utf-8")t = m.data(df)
# Map all columns into Concept.new by namem.define(Person.new(t.to_schema()))
# Verify entities were created as expecteddf = m.select(Person.id, Person.name).to_df()print(df)t.to_schema()uses the CSV column names (via the DataFrame).Person.new(t.to_schema())maps those columns intoConcept.newby name.- This option is the shortest when your CSV headers match your model fields.
- The final
selectquery verifies the loaded entities.
Concept.newuses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.Concept.newtreats keyword arguments like property access. For example,Person.new(name="Alice")corresponds toPerson.name. If a keyword refers to an undeclared property andmodel.implicit_propertiesis disabled, PyRel raises an error to help you catch typos and mismatches.model.implicit_propertiesis enabled by default.to_schema()maps by column name. If your column names don’t match theConcept.newarguments you want, explicitly map columns to arguments or useexclude=[...]to rename them.
Option 3: Use .to_schema with excluded columns
Section titled “Option 3: Use .to_schema with excluded columns”Use the exclude argument of Table.to_schema to exclude extra columns from mapping into Concept.new:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare concepts, properties, and a relationshipCompany = m.Concept("Company", identify_by={"id": Integer})Company.name = m.Property(f"{Company} is named {String:name}")
Employee = m.Concept("Employee", identify_by={"id": Integer})Employee.name = m.Property(f"{Employee} is named {String:name}")Employee.company = m.Relationship(f"{Employee} works at {Company:company}")
# Reference existing Snowflake tables by pathcompanies_table = m.Table( "DB.SCHEMA.COMPANIES", schema={ "id": Integer, "name": String, },)employees_table = m.Table( "DB.SCHEMA.EMPLOYEES", schema={ "id": Integer, "name": String, "company_id": Integer, # Foreign key column "start_date": String, # Extra column },)
# Define Company entities from the company tablem.define(Company.new(companies_table.to_schema()))
# Define Employee entities from the employee table.# Exclude extra columns and map the foreign key column to a Company entity.m.define( Employee.new( employees_table.to_schema(exclude=["start_date", "company_id"]), company=Company.filter_by(id=employees_table.company_id), ))
# Verify entities were created as expecteddf = m.select(Employee.id, Employee.name, Employee.company.name).to_df()print(df)companies_table.to_schema()mapscompanies_tablecolumns intoCompany.newby name.employees_table.to_schema(exclude=[...])excludesstart_dateandcompany_idfrom mapping intoEmployee.new.Company.filter_by(id=employees_table.company_id)converts the foreign key column into a Company reference.
import pandas as pd
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare concepts, properties, and a relationshipCompany = m.Concept("Company", identify_by={"id": Integer})Company.name = m.Property(f"{Company} is named {String:name}")
Employee = m.Concept("Employee", identify_by={"id": Integer})Employee.name = m.Property(f"{Employee} is named {String:name}")Employee.company = m.Relationship(f"{Employee} works at {Company:company}")
# Create small in-memory datasets for companies and employeesdf_companies = pd.DataFrame([ {"id": 10, "name": "Acme"}, {"id": 20, "name": "Contoso"},])
# `company_id` is a foreign key column (map to a Company entity reference).# `start_date` is an extra column (exclude before mapping into Employee.new).df_employees = pd.DataFrame([ {"id": 1, "name": "Alice", "company_id": 10, "start_date": "2026-01-01"}, {"id": 2, "name": "Bob", "company_id": 20, "start_date": "2026-02-01"},])
# Stage the DataFrames as temporary table-like sourcescompanies = m.data(df_companies)employees = m.data(df_employees)
# Define Company entities from the company tablem.define(Company.new(companies.to_schema()))
# Define Employee entities from the employee table.# Exclude extra columns and map the foreign key column to a Company entity.m.define( Employee.new( employees.to_schema(exclude=["start_date", "company_id"]), company=Company.filter_by(id=employees.company_id), ))
# Verify entities were created as expecteddf = m.select(Employee.id, Employee.name, Employee.company.name).to_df()print(df)companies.to_schema()maps DataFrame columns intoCompany.newby name.employees.to_schema(exclude=[...])excludesstart_dateandcompany_idfrom mapping intoEmployee.new.Company.filter_by(id=employees.company_id)maps each employee row’s foreign key to a Company.
import pandas as pd
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare concepts, properties, and a relationshipCompany = m.Concept("Company", identify_by={"id": Integer})Company.name = m.Property(f"{Company} is named {String:name}")
Employee = m.Concept("Employee", identify_by={"id": Integer})Employee.name = m.Property(f"{Employee} is named {String:name}")Employee.company = m.Relationship(f"{Employee} works at {Company:company}")
# Load CSV rows into DataFrames (then stage them with Model.data)companies_csv = pd.read_csv("companies.csv", encoding="utf-8")# `company_id` is a foreign key column (map to a Company entity reference).# `start_date` is an extra column (exclude before mapping into Employee.new).employees_csv = pd.read_csv("employees.csv", encoding="utf-8")
companies = m.data(companies_csv)employees = m.data(employees_csv)
# Define Company entities from the company tablem.define(Company.new(companies.to_schema()))
# Define Employee entities from the employee table.# Exclude extra columns and map the foreign key column to a Company entity.m.define( Employee.new( employees.to_schema(exclude=["start_date", "company_id"]), company=Company.filter_by(id=employees.company_id), ))
# Verify entities were created as expecteddf = m.select(Employee.id, Employee.name, Employee.company.name).to_df()print(df)companies.to_schema()maps CSV columns intoCompany.newby name.employees.to_schema(exclude=[...])excludesstart_dateandcompany_idfrom mapping intoEmployee.new.Company.filter_by(id=employees.company_id)maps the foreign key column to a Company entity.
Concept.newuses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.Concept.newtreats keyword arguments like property access. For example,Person.new(name="Alice")corresponds toPerson.name. If a keyword refers to an undeclared property andmodel.implicit_propertiesis disabled, PyRel raises an error to help you catch typos and mismatches.model.implicit_propertiesis enabled by default.- When you use
exclude=[...]withto_schema, excluded column names are matched case-insensitively. Concept.filter_byis a general property filter and may match multiple entities. If you want to refer to a single entity by identity, preferConcept.to_identity.
Define relationship facts
Section titled “Define relationship facts”Facts about relationships are defined by calling a Relationship with the appropriate arguments to produce a fact expression, then passing that expression to Model.define.
You can define relationship facts from Python literals or from table-like sources, as described in the sections below.
Define individual relationship facts explicitly
Section titled “Define individual relationship facts explicitly”Call a Relationship with the appropriate arguments to produce a relationship fact expression, then pass that expression to Model.define:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare Person and Company conceptsPerson = m.Concept("Person", identify_by={"name": String})Company = m.Concept("Company", identify_by={"name": String})
# Declare a relationshipPerson.employers = m.Relationship(f"{Person} works at {Company:employers}")
# Define base entities.m.define( alice := Person.new(name="Alice"), bob := Person.new(name="Bob"), acme := Company.new(name="Acme"), contoso := Company.new(name="Contoso"),)
# Define relationship factsm.define( alice.employers(acme), bob.employers(acme), bob.employers(contoso),)
# Verify relationship facts were created as expecteddf = m.select(Person.name, Person.employers.name).to_df()print(df)alice := Person.new(name="Alice")uses the walrus operator (:=) to create aNewexpression for aPersonentity and assigns it toalice.alice.employers(acme)builds anExpressionrepresenting the fact “Alice works at Acme”. This works because the resultingNewexpression supports dot access to thePersonconcept’s relationships.m.define(...)persists both the entity expressions and the relationship fact expressions as base facts in the model.
- Calling a
Relationshipproduces anExpressionobject that must be passed toModel.defineto take effect. - Relationship calls are positional. If you swap argument order or pass the wrong type, you can get wrong-looking results without an obvious error.
Define multiple facts simultaneously from tabular data
Section titled “Define multiple facts simultaneously from tabular data”Choose this when relationship facts live in a join table (one row per “edge”) and you need to translate foreign key pairs into a binary relationship.
Match entities from key columns with Concept.filter_by, then pass those references into relationship calls.
Build entity references from foreign key columns with Concept.filter_by, then define relationship calls with Model.define:
from relationalai.semantics import Integer, Model
m = Model("MyModel")
# Declare a binary relationship.Person = m.Concept("Person", identify_by={"id": Integer})Company = m.Concept("Company", identify_by={"id": Integer})Person.employers = m.Relationship(f"{Person} works at {Company:employers}")
# Person and Company tables in Snowflake:people = m.Table("DB.SCHEMA.PEOPLE")companies = m.Table("DB.SCHEMA.COMPANIES")
# Define Person and Company entities from the tables.m.define( Person.new(people.to_schema()), Company.new(companies.to_schema()),)
# A join table where each row is an employment fact (a Person-Company pair).employment = m.Table( "DB.SCHEMA.EMPLOYMENT", schema={ "person_id": Integer, "company_id": Integer, },)
# Match entities by the join table's foreign key columns.person = Person.filter_by(id=employment.person_id)company = Company.filter_by(id=employment.company_id)
# Define one relationship fact per input row.m.define(person.employers(company))
# Verify relationship facts were created as expected.df = m.select(Person.name, Person.employers.name).to_df()print(df)import pandas as pd
from relationalai.semantics import Integer, Model
m = Model("MyModel")
# Declare a binary relationship.Person = m.Concept("Person", identify_by={"id": Integer})Company = m.Concept("Company", identify_by={"id": Integer})Person.employers = m.Relationship(f"{Person} works at {Company:employers}")
# Person and Company tables in memory:people = m.data(pd.DataFrame([ {"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"},]))companies = m.data(pd.DataFrame([ {"id": 10, "name": "Acme"}, {"id": 20, "name": "Contoso"},]))
# Define Person and Company entities from the tables.m.define( Person.new(people.to_schema()), Company.new(companies.to_schema()),)
# A join table where each row is an employment fact (a Person-Company pair).df_employment = pd.DataFrame([ {"person_id": 1, "company_id": 10}, {"person_id": 2, "company_id": 10}, {"person_id": 2, "company_id": 20},])employment = m.data(df_employment)
# Match entities by the join table's foreign key columns.person = Person.filter_by(id=employment.person_id)company = Company.filter_by(id=employment.company_id)
# Define one relationship fact per input row.m.define(person.employers(company))
# Verify relationship facts were created as expected.df = m.select(Person.name, Person.employers.name).to_df()print(df)import pandas as pd
from relationalai.semantics import Integer, Model
m = Model("MyModel")
# Declare a binary relationship.Person = m.Concept("Person", identify_by={"id": Integer})Company = m.Concept("Company", identify_by={"id": Integer})Person.employers = m.Relationship(f"{Person} works at {Company:employers}")
# Person and Company tables from CSV:df_people = pd.read_csv("people.csv", encoding="utf-8")people = m.data(df_people)
df_companies = pd.read_csv("companies.csv", encoding="utf-8")companies = m.data(df_companies)
# Define Person and Company entities from the tables.m.define( Person.new(people.to_schema()), Company.new(companies.to_schema()),)
# A join table where each row is an employment fact (a Person-Company pair).df_employment = pd.read_csv("employment.csv", encoding="utf-8")employment = m.data(df_employment)
# Match entities by the join table's foreign key columns.person = Person.filter_by(id=employment.person_id)company = Company.filter_by(id=employment.company_id)
# Define one relationship fact per input row.m.define(person.employers(company))
# Verify relationship facts were created as expected.df = m.select(Person.name, Person.employers.name).to_df()print(df)Person.new(...)andCompany.new(...)define entities from separate sources.Person.filter_by(...)andCompany.filter_by(...)match those entities using theemploymentjoin table’s foreign key columns.m.define(person.employers(company))defines one relationship fact per join table row.- This works without a Python loop because
employment.person_idandemployment.company_idrepresent whole columns. The relationship call pairs values row by row, so you get onePerson -> Companyfact per row.
Concept.filter_byonly matches existing entities. UnlikeConcept.new, it does not create them. If a join-table row contains aperson_id(orcompany_id) value that does not match any entity, that row produces no relationship fact.Concept.filter_bycan match more than one entity. In the example above, it is safe becauseidis the identity field and must be unique. If you want to refer to exactly one entity by identity, useConcept.to_identity.