Skip to content

Define base facts

This guide shows how to define base facts from source data with Model.define. You will learn how to load Python data, DataFrames, CSV files, or Snowflake tables into the concepts and relationships you’ve declared in your model.

Understand the difference between base and derived facts

Section titled “Understand the difference between base and derived facts”

A fact is a statement about the world that your model captures, for example “Alice is 30 years old” or “Bob works at Acme”. Facts are the building blocks of your model and the raw material for your queries.

There are two types of facts:

  • Base fact: A fact loaded directly from a source record, like a row in a CSV file or a Snowflake table.
  • Derived fact: A fact computed from other facts, for example filtering or grouping to produce new concept membership, relationships, or property values.

Next, you’ll see patterns for defining base facts from different source formats.

Define entities and their properties with Concept.new

Section titled “Define entities and their properties with Concept.new”

Create entity expressions for your base facts with Concept.new. Set property values by passing keyword arguments. Then pass the expressions to Model.define to add them as base facts in the model.

At a high level, Concept.new:

  • Enforces identity: It uses your concept’s declared identity (identify_by) to compute entity keys. If the concept has no explicit identity, identity is inferred from the keyword values you provide. If identity is declared, missing identity values raise an error when you define the expression.
  • Uses find-or-create behavior: If you create multiple Concept.new expressions with the same identity values, they refer to the same entity instead of creating duplicates.
  • Returns an expression: Produces a New expression you can pass to Model.define. You can also use it inside other facts, for example as an argument to a relationship.

You can use Concept.new with Python literals or with table-like sources, such as Snowflake tables or DataFrames.

Pass literal values directly into Concept.new calls to create entity expressions for small datasets or quick prototypes:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare some concepts
Person = m.Concept("Person", identify_by={"id": Integer})
Company = m.Concept("Company", identify_by={"name": String})
# Declare a property
Person.name = m.Property(f"{Person} is named {String:name}")
# Define some base entities from literal values
m.define(
Person.new(id=1, name="Alice"),
Person.new(id=2, name="Bob"),
Company.new(name="Acme"),
)
# Verify Person entities were created as expected
person_df = m.select(Person.id, Person.name).to_df()
print(person_df)
# Verify Company entities were created as expected
company_df = m.select(Company.name).to_df()
print(company_df)
  • identify_by={"id": Integer} makes id the stable entity key for Person.
  • Each Person.new(...) and Company.new(...) call creates an entity expression for one source record.
  • m.define(...) adds those entity expressions as base facts in the model.
  • Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
  • Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.

Choose this when your source data is tabular. You can stage rows in memory with Model.data or reference Snowflake rows with Model.Table. Use the table below to choose a mapping option:

What to useWhen to use it
Map columns explicitlyYou want precise control, column names do not match your Concept.new fields, or your table is wide and you only need one or two columns.
Use .to_schema to map all columnsColumn names already match the fields you want to pass into Concept.new.
Use .to_schema with excluded columnsYour source has extra columns you don’t need, column names don’t match your Concept.new fields, or you need to map foreign key columns to entity instances instead.

Load tabular data and reference columns directly in Concept.new calls:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")
# Reference an existing Snowflake table by path
t = m.Table(
"DB.SCHEMA.PEOPLE",
schema={
"id": Integer,
"name": String,
},
)
# Define entities from source rows
m.define(Person.new(id=t.id, name=t.name))
# Verify entities were created as expected
p = Person.ref("p")
df_people = m.select(p.id, p.name).to_df()
print(df_people) # Expect (1, "Alice") and (2, "Bob") in any order.
  • m.Table(...) creates a table-like reference to Snowflake rows.
  • t.id and t.name are column references you can pass into Concept.new.
  • m.define(Person.new(...)) turns each input row into a Person entity.
  • The final select query verifies which entities were created.
  • Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
  • Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.
  • If your in-memory input is a list of tuples instead of a DataFrame, unnamed integer columns are exposed as col0, col1, and so on.
  • When working with Snowflake tables, prefer t[0] when you want to avoid column name and case ambiguity entirely. Prefer t["COL_NAME"] when a column name isn’t a valid Python identifier. Dot access (t.col_name) is a convenience for columns that are valid Python identifiers.

Option 2: Use .to_schema to map all columns

Section titled “Option 2: Use .to_schema to map all columns”

Create a TableSchema object by calling Table.to_schema and pass it to Concept.new:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare a concept and a property
Person = m.Concept("Person", identify_by={"id": Integer})
Person.name = m.Property(f"{Person} is named {String:name}")
# Reference an existing Snowflake table by path
t = m.Table(
"DB.SCHEMA.PEOPLE",
schema={
"id": Integer,
"name": String,
},
)
# Map all columns into Concept.new by name
m.define(Person.new(t.to_schema()))
# Verify entities were created as expected
df = m.select(Person.id, Person.name).to_df()
print(df)
  • t.to_schema() builds a schema object containing all declared columns.
  • Person.new(t.to_schema()) maps columns into Concept.new by name.
  • This option stays short when your column names already match the fields you want.
  • The final select query verifies the loaded entities.
  • Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
  • Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.
  • to_schema() maps by column name. If your column names don’t match the Concept.new arguments you want, explicitly map columns to arguments or use exclude=[...] to rename them.

Option 3: Use .to_schema with excluded columns

Section titled “Option 3: Use .to_schema with excluded columns”

Use the exclude argument of Table.to_schema to exclude extra columns from mapping into Concept.new:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare concepts, properties, and a relationship
Company = m.Concept("Company", identify_by={"id": Integer})
Company.name = m.Property(f"{Company} is named {String:name}")
Employee = m.Concept("Employee", identify_by={"id": Integer})
Employee.name = m.Property(f"{Employee} is named {String:name}")
Employee.company = m.Relationship(f"{Employee} works at {Company:company}")
# Reference existing Snowflake tables by path
companies_table = m.Table(
"DB.SCHEMA.COMPANIES",
schema={
"id": Integer,
"name": String,
},
)
employees_table = m.Table(
"DB.SCHEMA.EMPLOYEES",
schema={
"id": Integer,
"name": String,
"company_id": Integer, # Foreign key column
"start_date": String, # Extra column
},
)
# Define Company entities from the company table
m.define(Company.new(companies_table.to_schema()))
# Define Employee entities from the employee table.
# Exclude extra columns and map the foreign key column to a Company entity.
m.define(
Employee.new(
employees_table.to_schema(exclude=["start_date", "company_id"]),
company=Company.filter_by(id=employees_table.company_id),
)
)
# Verify entities were created as expected
df = m.select(Employee.id, Employee.name, Employee.company.name).to_df()
print(df)
  • companies_table.to_schema() maps companies_table columns into Company.new by name.
  • employees_table.to_schema(exclude=[...]) excludes start_date and company_id from mapping into Employee.new.
  • Company.filter_by(id=employees_table.company_id) converts the foreign key column into a Company reference.
  • Concept.new uses find-or-create semantics. If you repeat the same identity key, it maps to the same entity instead of creating a second one.
  • Concept.new treats keyword arguments like property access. For example, Person.new(name="Alice") corresponds to Person.name. If a keyword refers to an undeclared property and model.implicit_properties is disabled, PyRel raises an error to help you catch typos and mismatches. model.implicit_properties is enabled by default.
  • When you use exclude=[...] with to_schema, excluded column names are matched case-insensitively.
  • Concept.filter_by is a general property filter and may match multiple entities. If you want to refer to a single entity by identity, prefer Concept.to_identity.

Facts about relationships are defined by calling a Relationship with the appropriate arguments to produce a fact expression, then passing that expression to Model.define.

You can define relationship facts from Python literals or from table-like sources, as described in the sections below.

Define individual relationship facts explicitly

Section titled “Define individual relationship facts explicitly”

Call a Relationship with the appropriate arguments to produce a relationship fact expression, then pass that expression to Model.define:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
# Declare Person and Company concepts
Person = m.Concept("Person", identify_by={"name": String})
Company = m.Concept("Company", identify_by={"name": String})
# Declare a relationship
Person.employers = m.Relationship(f"{Person} works at {Company:employers}")
# Define base entities.
m.define(
alice := Person.new(name="Alice"),
bob := Person.new(name="Bob"),
acme := Company.new(name="Acme"),
contoso := Company.new(name="Contoso"),
)
# Define relationship facts
m.define(
alice.employers(acme),
bob.employers(acme),
bob.employers(contoso),
)
# Verify relationship facts were created as expected
df = m.select(Person.name, Person.employers.name).to_df()
print(df)
  • alice := Person.new(name="Alice") uses the walrus operator (:=) to create a New expression for a Person entity and assigns it to alice.
  • alice.employers(acme) builds an Expression representing the fact “Alice works at Acme”. This works because the resulting New expression supports dot access to the Person concept’s relationships.
  • m.define(...) persists both the entity expressions and the relationship fact expressions as base facts in the model.
  • Calling a Relationship produces an Expression object that must be passed to Model.define to take effect.
  • Relationship calls are positional. If you swap argument order or pass the wrong type, you can get wrong-looking results without an obvious error.

Define multiple facts simultaneously from tabular data

Section titled “Define multiple facts simultaneously from tabular data”

Choose this when relationship facts live in a join table (one row per “edge”) and you need to translate foreign key pairs into a binary relationship. Match entities from key columns with Concept.filter_by, then pass those references into relationship calls.

Build entity references from foreign key columns with Concept.filter_by, then define relationship calls with Model.define:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
# Declare a binary relationship.
Person = m.Concept("Person", identify_by={"id": Integer})
Company = m.Concept("Company", identify_by={"id": Integer})
Person.employers = m.Relationship(f"{Person} works at {Company:employers}")
# Person and Company tables in Snowflake:
people = m.Table("DB.SCHEMA.PEOPLE")
companies = m.Table("DB.SCHEMA.COMPANIES")
# Define Person and Company entities from the tables.
m.define(
Person.new(people.to_schema()),
Company.new(companies.to_schema()),
)
# A join table where each row is an employment fact (a Person-Company pair).
employment = m.Table(
"DB.SCHEMA.EMPLOYMENT",
schema={
"person_id": Integer,
"company_id": Integer,
},
)
# Match entities by the join table's foreign key columns.
person = Person.filter_by(id=employment.person_id)
company = Company.filter_by(id=employment.company_id)
# Define one relationship fact per input row.
m.define(person.employers(company))
# Verify relationship facts were created as expected.
df = m.select(Person.name, Person.employers.name).to_df()
print(df)
  • Person.new(...) and Company.new(...) define entities from separate sources.
  • Person.filter_by(...) and Company.filter_by(...) match those entities using the employment join table’s foreign key columns.
  • m.define(person.employers(company)) defines one relationship fact per join table row.
  • This works without a Python loop because employment.person_id and employment.company_id represent whole columns. The relationship call pairs values row by row, so you get one Person -> Company fact per row.
  • Concept.filter_by only matches existing entities. Unlike Concept.new, it does not create them. If a join-table row contains a person_id (or company_id) value that does not match any entity, that row produces no relationship fact.
  • Concept.filter_by can match more than one entity. In the example above, it is safe because id is the identity field and must be unique. If you want to refer to exactly one entity by identity, use Concept.to_identity.