Define base facts

This guide shows how to define base facts from source data with Model.define. You will learn how to load Python data, DataFrames, CSV files, or Snowflake tables into the concepts and relationships you’ve declared in your model.

PyRel is installed and configured. See Set Up Your Environment for instructions.
You have created a Model object and declared some concepts and relationships.

Understand the difference between base and derived facts

A fact is a statement about the world that your model captures, for example “Alice is 30 years old” or “Bob works at Acme”. Facts are the building blocks of your model and the raw material for your queries.

There are two types of facts:

Base fact: A fact loaded directly from a source record, like a row in a CSV file or a Snowflake table.
Derived fact: A fact computed from other facts, for example filtering or grouping to produce new concept membership, relationships, or property values.

Next, you’ll see patterns for defining base facts from different source formats.

Define entities and their properties with `Concept.new`

Create entity expressions for your base facts with Concept.new. Set property values by passing keyword arguments. Then pass the expressions to Model.define to add them as base facts in the model.

At a high level, Concept.new:

Enforces identity: It uses your concept’s declared identity (identify_by) to compute entity keys. If the concept has no explicit identity, identity is inferred from the keyword values you provide. If identity is declared, missing identity values raise an error when you define the expression.
Uses find-or-create behavior: If you create multiple Concept.new expressions with the same identity values, they refer to the same entity instead of creating duplicates.
Returns an expression: Produces a New expression you can pass to Model.define. You can also use it inside other facts, for example as an argument to a relationship.

You can use Concept.new with Python literals or with table-like sources, such as Snowflake tables or DataFrames.

Use Python literals

Pass literal values directly into Concept.new calls to create entity expressions for small datasets or quick prototypes:

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare some concepts
Person = m.Concept("Person", identify_by={"id": Integer})
Company = m.Concept("Company", identify_by={"name": String})

# Declare a property
Person.name = m.Property(f"{Person} is named {String:name}")

# Define some base entities from literal values
m.define(
    Person.new(id=1, name="Alice"),
    Person.new(id=2, name="Bob"),
    Company.new(name="Acme"),
)

# Verify Person entities were created as expected
person_df = m.select(Person.id, Person.name).to_df()
print(person_df)

# Verify Company entities were created as expected
company_df = m.select(Company.name).to_df()
print(company_df)

identify_by={"id": Integer} makes id the stable entity key for Person.
Each Person.new(...) and Company.new(...) call creates an entity expression for one source record.
m.define(...) adds those entity expressions as base facts in the model.

Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.
Some names are reserved on concepts. If you create a property or relationship named Relationship, Property, Concept, or Table, PyRel raises an error. Use a different name or a lowercase variant such as relationship instead.

Use rows from tabular data

Choose this when your source data is tabular. You can stage rows in memory with Model.data or reference Snowflake rows with Model.Table. Use the table below to choose a mapping option:

What to use	When to use it
Map columns explicitly	You want precise control, column names do not match your `Concept.new` fields, or your table is wide and you only need one or two columns.
Use `.to_schema` to map all columns	Column names already match the fields you want to pass into `Concept.new`.
Use `.to_schema` with excluded columns	Your source has extra columns you don’t need, column names don’t match your `Concept.new` fields, or you need to map foreign key columns to entity instances instead.

Option 1: Map columns explicitly

Load tabular data and reference columns directly in Concept.new calls:

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")

# Reference an existing Snowflake table by path
t = m.Table(
  "DB.SCHEMA.PEOPLE",
  schema={
    "id": Integer,
    "name": String,
  },
)

# Define entities from source rows
m.define(Person.new(id=t.id, name=t.name))

# Verify entities were created as expected
p = Person.ref("p")
df_people = m.select(p.id, p.name).to_df()
print(df_people)  # Expect (1, "Alice") and (2, "Bob") in any order.

m.Table(...) creates a table-like reference to Snowflake rows.
t.id and t.name are column references you can pass into Concept.new.
m.define(Person.new(...)) turns each input row into a Person entity.
The final select query verifies which entities were created.

import pandas as pd

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")

# Create a small in-memory dataset
df = pd.DataFrame([
  {"id": 1, "name": "Alice"},
  {"id": 2, "name": "Bob"},
])

# Stage the DataFrame as a temporary table-like source
t = m.data(df)

# Define entities from source rows
m.define(Person.new(id=t.id, name=t.name))

# Verify entities were created as expected
df = m.select(Person.id, Person.name).to_df()
print(df)

m.data(df) stages a temporary table-like source backed by the DataFrame.
t.id and t.name reference the DataFrame columns.
m.define(Person.new(...)) turns each DataFrame row into a Person entity.
The final select query verifies the loaded rows.

import pandas as pd

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")

# Load CSV rows into a DataFrame (then stage it with Model.data)
df = pd.read_csv("people.csv", encoding="utf-8")
t = m.data(df)

# Define entities from source rows
m.define(Person.new(id=t.id, name=t.name))

# Verify entities were created as expected
df = m.select(Person.id, Person.name).to_df()
print(df)

pd.read_csv(...) loads rows into a DataFrame.
m.data(df) stages that DataFrame so you can reference its columns.
m.define(Person.new(...)) defines one Person entity per CSV row.
The final select query verifies the loaded entities.

Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.
If your in-memory input is a list of tuples instead of a DataFrame, unnamed integer columns are exposed as col0, col1, and so on.
When working with Snowflake tables, prefer t[0] when you want to avoid column name and case ambiguity entirely. Prefer t["COL_NAME"] when a column name isn’t a valid Python identifier. Dot access (t.col_name) is a convenience for columns that are valid Python identifiers.

Option 2: Use `.to_schema` to map all columns

Create a TableSchema object by calling Table.to_schema and pass it to Concept.new:

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")

# Reference an existing Snowflake table by path
t = m.Table(
  "DB.SCHEMA.PEOPLE",
  schema={
    "id": Integer,
    "name": String,
  },
)

# Map all columns into Concept.new by name
m.define(Person.new(t.to_schema()))

# Verify entities were created as expected
df = m.select(Person.id, Person.name).to_df()
print(df)

t.to_schema() builds a schema object containing all declared columns.
Person.new(t.to_schema()) maps columns into Concept.new by name.
This option stays short when your column names already match the fields you want.
The final select query verifies the loaded entities.

import pandas as pd

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")

# Create a small in-memory dataset
df = pd.DataFrame([
  {"id": 1, "name": "Alice"},
  {"id": 2, "name": "Bob"},
])

# Stage the DataFrame as a temporary table-like source
t = m.data(df)

# Map all columns into Concept.new by name
m.define(Person.new(t.to_schema()))

# Verify entities were created as expected
df = m.select(Person.id, Person.name).to_df()
print(df)

t.to_schema() uses the DataFrame column names.
Person.new(t.to_schema()) maps those columns into Concept.new by name.
This option is shortest when your DataFrame columns already match your model fields.
The final select query verifies the loaded entities.

import pandas as pd

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")

# Load CSV rows into a DataFrame (then stage it with Model.data)
df = pd.read_csv("people.csv", encoding="utf-8")
t = m.data(df)

# Map all columns into Concept.new by name
m.define(Person.new(t.to_schema()))

# Verify entities were created as expected
df = m.select(Person.id, Person.name).to_df()
print(df)

t.to_schema() uses the CSV column names (via the DataFrame).
Person.new(t.to_schema()) maps those columns into Concept.new by name.
This option is the shortest when your CSV headers match your model fields.
The final select query verifies the loaded entities.

Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.
to_schema() maps by column name. If your column names don’t match the Concept.new arguments you want, explicitly map columns to arguments or use exclude=[...] to rename them.

Option 3: Use `.to_schema` with excluded columns

Use the exclude argument of Table.to_schema to exclude extra columns from mapping into Concept.new:

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare concepts, properties, and a relationship
Company = m.Concept("Company", identify_by={"id": Integer})
Company.name = m.Property(f"{Company} is named {String:name}")

Employee = m.Concept("Employee", identify_by={"id": Integer})
Employee.name = m.Property(f"{Employee} is named {String:name}")
Employee.company = m.Relationship(f"{Employee} works at {Company:company}")

# Reference existing Snowflake tables by path
companies_table = m.Table(
  "DB.SCHEMA.COMPANIES",
  schema={
    "id": Integer,
    "name": String,
  },
)
employees_table = m.Table(
  "DB.SCHEMA.EMPLOYEES",
  schema={
    "id": Integer,
    "name": String,
    "company_id": Integer,  # Foreign key column
    "start_date": String,   # Extra column
  },
)

# Define Company entities from the company table
m.define(Company.new(companies_table.to_schema()))

# Define Employee entities from the employee table.
# Exclude extra columns and map the foreign key column to a Company entity.
m.define(
  Employee.new(
    employees_table.to_schema(exclude=["start_date", "company_id"]),
    company=Company.filter_by(id=employees_table.company_id),
  )
)

# Verify entities were created as expected
df = m.select(Employee.id, Employee.name, Employee.company.name).to_df()
print(df)

companies_table.to_schema() maps companies_table columns into Company.new by name.
employees_table.to_schema(exclude=[...]) excludes start_date and company_id from mapping into Employee.new.
Company.filter_by(id=employees_table.company_id) converts the foreign key column into a Company reference.

import pandas as pd

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare concepts, properties, and a relationship
Company = m.Concept("Company", identify_by={"id": Integer})
Company.name = m.Property(f"{Company} is named {String:name}")

Employee = m.Concept("Employee", identify_by={"id": Integer})
Employee.name = m.Property(f"{Employee} is named {String:name}")
Employee.company = m.Relationship(f"{Employee} works at {Company:company}")

# Create small in-memory datasets for companies and employees
df_companies = pd.DataFrame([
  {"id": 10, "name": "Acme"},
  {"id": 20, "name": "Contoso"},
])

# `company_id` is a foreign key column (map to a Company entity reference).
# `start_date` is an extra column (exclude before mapping into Employee.new).
df_employees = pd.DataFrame([
  {"id": 1, "name": "Alice", "company_id": 10, "start_date": "2026-01-01"},
  {"id": 2, "name": "Bob", "company_id": 20, "start_date": "2026-02-01"},
])

# Stage the DataFrames as temporary table-like sources
companies = m.data(df_companies)
employees = m.data(df_employees)

# Define Company entities from the company table
m.define(Company.new(companies.to_schema()))

# Define Employee entities from the employee table.
# Exclude extra columns and map the foreign key column to a Company entity.
m.define(
    Employee.new(
        employees.to_schema(exclude=["start_date", "company_id"]),
        company=Company.filter_by(id=employees.company_id),
    )
)

# Verify entities were created as expected
df = m.select(Employee.id, Employee.name, Employee.company.name).to_df()
print(df)

companies.to_schema() maps DataFrame columns into Company.new by name.
employees.to_schema(exclude=[...]) excludes start_date and company_id from mapping into Employee.new.
Company.filter_by(id=employees.company_id) maps each employee row’s foreign key to a Company.

import pandas as pd

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare concepts, properties, and a relationship
Company = m.Concept("Company", identify_by={"id": Integer})
Company.name = m.Property(f"{Company} is named {String:name}")

Employee = m.Concept("Employee", identify_by={"id": Integer})
Employee.name = m.Property(f"{Employee} is named {String:name}")
Employee.company = m.Relationship(f"{Employee} works at {Company:company}")

# Load CSV rows into DataFrames (then stage them with Model.data)
companies_csv = pd.read_csv("companies.csv", encoding="utf-8")
# `company_id` is a foreign key column (map to a Company entity reference).
# `start_date` is an extra column (exclude before mapping into Employee.new).
employees_csv = pd.read_csv("employees.csv", encoding="utf-8")

companies = m.data(companies_csv)
employees = m.data(employees_csv)

# Define Company entities from the company table
m.define(Company.new(companies.to_schema()))

# Define Employee entities from the employee table.
# Exclude extra columns and map the foreign key column to a Company entity.
m.define(
  Employee.new(
    employees.to_schema(exclude=["start_date", "company_id"]),
    company=Company.filter_by(id=employees.company_id),
  )
)

# Verify entities were created as expected
df = m.select(Employee.id, Employee.name, Employee.company.name).to_df()
print(df)

companies.to_schema() maps CSV columns into Company.new by name.
employees.to_schema(exclude=[...]) excludes start_date and company_id from mapping into Employee.new.
Company.filter_by(id=employees.company_id) maps the foreign key column to a Company entity.

Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.
When you use exclude=[...] with to_schema, excluded column names are matched case-insensitively.
Concept.filter_by is a general property filter and may match multiple entities. If you want to refer to a single entity by identity, prefer Concept.to_identity.

Define relationship facts

Facts about relationships are defined by calling a Relationship with the appropriate arguments to produce a fact expression, then passing that expression to Model.define.

You can define relationship facts from Python literals or from table-like sources, as described in the sections below.

Define individual relationship facts explicitly

Call a Relationship with the appropriate arguments to produce a relationship fact expression, then pass that expression to Model.define:

from relationalai.semantics import Integer, Model, String

m = Model("MyModel")

# Declare Person and Company concepts
Person = m.Concept("Person", identify_by={"name": String})
Company = m.Concept("Company", identify_by={"name": String})

# Declare a relationship
Person.employers = m.Relationship(f"{Person} works at {Company:employers}")

# Define base entities.
m.define(
    alice := Person.new(name="Alice"),
    bob := Person.new(name="Bob"),
    acme := Company.new(name="Acme"),
    contoso := Company.new(name="Contoso"),
)

# Define relationship facts
m.define(
    alice.employers(acme),
    bob.employers(acme),
    bob.employers(contoso),
)

# Verify relationship facts were created as expected
df = m.select(Person.name, Person.employers.name).to_df()
print(df)

alice := Person.new(name="Alice") uses the walrus operator (:=) to create a New expression for a Person entity and assigns it to alice.
alice.employers(acme) builds an Expression representing the fact “Alice works at Acme”. This works because the resulting New expression supports dot access to the Person concept’s relationships.
m.define(...) persists both the entity expressions and the relationship fact expressions as base facts in the model.

Calling a Relationship produces an Expression object that must be passed to Model.define to take effect.
Relationship calls are positional. If you swap argument order or pass the wrong type, you can get wrong-looking results without an obvious error.

Define multiple facts simultaneously from tabular data

Choose this when relationship facts live in a join table (one row per “edge”) and you need to translate foreign key pairs into a binary relationship. Match entities from key columns with Concept.filter_by, then pass those references into relationship calls.

Build entity references from foreign key columns with Concept.filter_by, then define relationship calls with Model.define:

from relationalai.semantics import Integer, Model

m = Model("MyModel")

# Declare a binary relationship.
Person = m.Concept("Person", identify_by={"id": Integer})
Company = m.Concept("Company", identify_by={"id": Integer})
Person.employers = m.Relationship(f"{Person} works at {Company:employers}")

# Person and Company tables in Snowflake:
people = m.Table("DB.SCHEMA.PEOPLE")
companies = m.Table("DB.SCHEMA.COMPANIES")

# Define Person and Company entities from the tables.
m.define(
  Person.new(people.to_schema()),
  Company.new(companies.to_schema()),
)

# A join table where each row is an employment fact (a Person-Company pair).
employment = m.Table(
  "DB.SCHEMA.EMPLOYMENT",
  schema={
    "person_id": Integer,
    "company_id": Integer,
  },
)

# Match entities by the join table's foreign key columns.
person = Person.filter_by(id=employment.person_id)
company = Company.filter_by(id=employment.company_id)

# Define one relationship fact per input row.
m.define(person.employers(company))

# Verify relationship facts were created as expected.
df = m.select(Person.name, Person.employers.name).to_df()
print(df)

import pandas as pd

from relationalai.semantics import Integer, Model

m = Model("MyModel")

# Declare a binary relationship.
Person = m.Concept("Person", identify_by={"id": Integer})
Company = m.Concept("Company", identify_by={"id": Integer})
Person.employers = m.Relationship(f"{Person} works at {Company:employers}")

# Person and Company tables in memory:
people = m.data(pd.DataFrame([
  {"id": 1, "name": "Alice"},
  {"id": 2, "name": "Bob"},
]))
companies = m.data(pd.DataFrame([
  {"id": 10, "name": "Acme"},
  {"id": 20, "name": "Contoso"},
]))

# Define Person and Company entities from the tables.
m.define(
  Person.new(people.to_schema()),
  Company.new(companies.to_schema()),
)

# A join table where each row is an employment fact (a Person-Company pair).
df_employment = pd.DataFrame([
  {"person_id": 1, "company_id": 10},
  {"person_id": 2, "company_id": 10},
  {"person_id": 2, "company_id": 20},
])
employment = m.data(df_employment)

# Match entities by the join table's foreign key columns.
person = Person.filter_by(id=employment.person_id)
company = Company.filter_by(id=employment.company_id)

# Define one relationship fact per input row.
m.define(person.employers(company))

# Verify relationship facts were created as expected.
df = m.select(Person.name, Person.employers.name).to_df()
print(df)

import pandas as pd

from relationalai.semantics import Integer, Model

m = Model("MyModel")

# Declare a binary relationship.
Person = m.Concept("Person", identify_by={"id": Integer})
Company = m.Concept("Company", identify_by={"id": Integer})
Person.employers = m.Relationship(f"{Person} works at {Company:employers}")

# Person and Company tables from CSV:
df_people = pd.read_csv("people.csv", encoding="utf-8")
people = m.data(df_people)

df_companies = pd.read_csv("companies.csv", encoding="utf-8")
companies = m.data(df_companies)

# Define Person and Company entities from the tables.
m.define(
  Person.new(people.to_schema()),
  Company.new(companies.to_schema()),
)

# A join table where each row is an employment fact (a Person-Company pair).
df_employment = pd.read_csv("employment.csv", encoding="utf-8")
employment = m.data(df_employment)

# Match entities by the join table's foreign key columns.
person = Person.filter_by(id=employment.person_id)
company = Company.filter_by(id=employment.company_id)

# Define one relationship fact per input row.
m.define(person.employers(company))

# Verify relationship facts were created as expected.
df = m.select(Person.name, Person.employers.name).to_df()
print(df)

Person.new(...) and Company.new(...) define entities from separate sources.
Person.filter_by(...) and Company.filter_by(...) match those entities using the employment join table’s foreign key columns.
m.define(person.employers(company)) defines one relationship fact per join table row.
This works without a Python loop because employment.person_id and employment.company_id represent whole columns. The relationship call pairs values row by row, so you get one Person -> Company fact per row.

Concept.filter_by only matches existing entities. Unlike Concept.new, it does not create them. If a join-table row contains a person_id (or company_id) value that does not match any entity, that row produces no relationship fact.
Concept.filter_by can match more than one entity. In the example above, it is safe because id is the identity field and must be unique. If you want to refer to exactly one entity by identity, use Concept.to_identity.