Skip to content

Derive facts with logic

Writing logic in PyRel is how you derive new facts from existing ones in order to capture domain knowledge and build up a rich semantic model. This guide shows you how to write logic with the PyRel DSL, using the building blocks of fragments, expressions, variables, and chains.

Derived facts are built from a small set of DSL building blocks. Understanding these building blocks and how they work together is key to writing effective logic in PyRel.

There are four main constructs you will use to build logic:

  • Fragments represent a unit of logic that can be built up and then materialized to produce results. They are built by chaining calls to where, select, and define.
  • Variables are placeholders for entities and values that can be reused in multiple expressions.
  • Expressions are a type of variable that represent the result of performing operations like comparisons or relationship traversals.
  • Chains represent paths through your model’s relationships and properties and can be used to traverse your model and extract values.

Methods like Model.where, Model.select, and Model.define return a Fragment. A fragment is a composable, lazy unit of logic. You build it up by chaining more calls to where, select, and define to add more conditions, outputs, and definitions. Each call returns a new fragment.

The three main fragment methods have different semantics:

  • Fragment.where adds filter conditions to the fragment. It does not specify outputs or definitions, only conditions.
  • Fragment.select specifies what you want to return when you materialize the fragment. It does not add conditions or definitions.
  • Fragment.define specifies what you want to define when you materialize the fragment. It does not add conditions or output values.

Building a fragment does not execute any logic or produce results until you materialize it. There are two main ways to materialize a fragment:

  • Fragment.to_df compiles and executes the fragment and returns the results as a DataFrame.
  • Fragment.into compiles and executes the fragment and writes the results into a Snowflake table.

For example, the following snippet builds a fragment that selects customer names for customers with pending orders, and then materializes it to a DataFrame:

from relationalai.semantics import String
m = Model("MyModel")
Customer = m.Concept("Customer")
Order = m.Concept("Order")
class Status(m.Enum):
PENDING = "pending"
SHIPPED = "shipped"
CANCELLED = "cancelled"
RETURNED = "returned"
Customer.orders = m.Relationship(f"{Customer} places {Order}")
Order.status = m.Property(f"{Order} has status {Status}")
# Build a fragment that selects names of customers with pending orders.
q = m.where(Customer.orders.status == Status.PENDING).select(Customer.name)
# Materialize the fragment to a DataFrame.
print(q.to_df())
  • q is a Fragment instance that represents the logic of filtering customers with pending orders and selecting their names.
  • m.where() creates a new fragment with the specified conditions, namely that the customer’s orders have a status of “pending”.
  • .select() adds a selection to the fragment, specifying that we want to retrieve the names of the customers that meet the conditions.
  • q.to_df() materializes the fragment by compiling and executing the logic, and returns the results as a DataFrame.

A Variable is a placeholder for an entity or value that can be reused in multiple expressions. You never have to explicitly create a Variable instance, since most DSL objects are variables, and most DSL methods return variables.

For example, the following fragment uses Concept objects as variables to represent entities that are members of that concept:

m.where(Customer.orders.status == Status.PENDING).select(Customer.name)
  • Customer is a variable that represents customer entities.
  • Customer.orders is a variable that represents orders placed by a customer.
  • Customer.orders.status is a variable that represents the status of a customer’s order.

An Expression is a kind of variable that represents the result of performing operations like comparisons or relationship traversals. Expressions can be used in Model.where to filter results, in Model.define to define new facts, and in Model.select to specify output values.

For instance, the following fragment uses expressions to filter customers with pending orders:

m.where(Customer.orders.status == Status.PENDING).select(Customer.name)
  • Customer.orders.status == Status.PENDING is a boolean expression that filters Customer entities based on the status of their orders.
  • Customer.orders.status is an expression that traverses the relationship from Customer to Order and accesses the status property.
  • Customer.orders and Customer.name are also expressions that represent the orders placed by a customer and the name of a customer, respectively.
  • Customer is not an expression; it is a variable that represents the concept of customers.

Write conditional definitions with Model.where and Model.define

Section titled “Write conditional definitions with Model.where and Model.define”

Derived facts are often conditional: for example, “an order needs review if its promised ship date is before a cutoff”. Conditional definitions are a two-step process:

  1. Write a condition with Model.where

    Use Model.where or Fragment.where to express the condition that determines when the derived fact applies. This can include comparisons, relationship traversals, and any other expressions that evaluate to a boolean.

  2. Define the derived fact with Model.define

    Use Model.define or Fragment.define to specify the derived fact you want to define when the condition is met. This can be a new relationship, a new property value, or concept membership.

Model.where and Fragment.where can take any number of arguments that are boolean expressions.

This includes:

  • Comparison expressions

    These are expressions that compare values using operators like <, >, ==, etc:

    Order.promised_ship_date < cutoff
    Order.status != Status.SHIPPED
    100 <= Order.total_amount <= 500

    For a full list of supported operators, see the Variable reference documentation.

  • Concept membership checks

    Call a Concept and pass an entity variable to check if that entity is a member of the concept:

    DelayedOrder(Order)
    NeedsReview(Order)
  • Relationship existence checks

    Call a relationship Chain and pass one entity variable for field in the final relationship to check if that relationship path exists:

    Customer.orders(Order)
    Order.shipments.carrier(Carrier)
  • Logical expressions

    You can combine expressions with logical operators, like & for AND, | for OR, and Model.not_ for NOT:

    (Order.promised_ship_date < cutoff) & (Order.status == Status.PENDING)
    (Order.status == Status.CANCELLED) | (Order.status == Status.RETURNED)
    my_model.not_(DelayedOrder(Order))
    • Python’s and, or, and not cannot be used for logical expressions in the PyRel DSL because they cannot be overloaded to build DSL logic. Always use &, |, and Model.not_ instead.

    • The & operator is often unnecessary because Model.where implicitly ANDs multiple arguments. Use it to group conditions with |, or to combine pre-built filter fragments. See Combine where clauses with &.

    • The | operator short-circuits: if the left side matches, the right side is not applied. For true set union semantics, use Model.union instead. Use Model.union(...) when you want to include matches from both branches.

When you want the definition to read like a statement with conditions, such as “an order needs review if its promised ship date is before a cutoff”, put define before where:

from datetime import datetime
from relationalai.semantics import Integer, Model
CUTOFF = datetime(2026, 1, 1)
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"order_id": Integer})
NeedsReview = m.Concept("NeedsReview")
# Define some order entities.
m.define(
Order.new(order_id=1, promised_ship_date=datetime(2025, 12, 1), status="shipped"),
Order.new(order_id=1, promised_ship_date=datetime(2025, 12, 15), status="pending"),
Order.new(order_id=2, promised_ship_date=datetime(2026, 2, 1), status="pending"),
)
# Define the derived fact with the condition.
m.define(NeedsReview(Order)).where(Order.status == "pending", Order.promised_ship_date < CUTOFF)
# Materialize the flagged orders.
df = m.select(Order.order_id, Order.promised_ship_date).where(NeedsReview(Order)).to_df()
print(df)
  • m.define(NeedsReview(Order)) defines the derived fact as concept membership in NeedsReview.
  • .where(...) adds the conditions that the order status is “pending” and the promised ship date is before the cutoff.
  • The verification query selects orders that are members of NeedsReview to confirm the definition works as expected. It also illustrates how select can be put before where when you want the output to read like “select these columns where this condition holds”.

When you want the condition to read like a filter, such as “orders that are pending and before a cutoff”, put where before define:

from datetime import datetime
from relationalai.semantics import Integer, Model
CUTOFF = datetime(2026, 1, 1)
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"order_id": Integer})
NeedsReview = m.Concept("NeedsReview")
# Define some order entities.
m.define(
Order.new(order_id=1, promised_ship_date=datetime(2025, 12, 1), status="shipped"),
Order.new(order_id=2, promised_ship_date=datetime(2025, 12, 15), status="pending"),
Order.new(order_id=3, promised_ship_date=datetime(2026, 2, 1), status="pending"),
)
# Filter first, then define the derived fact.
m.where(Order.status == "pending", Order.promised_ship_date < CUTOFF).define(NeedsReview(Order))
# Materialize the flagged orders.
df = m.where(NeedsReview(Order)).select(Order.order_id, Order.promised_ship_date).to_df()
print(df)
  • m.where(...) expresses the condition as a filter.
  • .define(NeedsReview(Order)) defines the derived fact only for the matches.
  • The verification query selects orders that are members of NeedsReview to confirm the definition works as expected. It also illustrates how where can be put before select when you want the output to read like “where this condition holds, select these columns”.

When you want to modularize a query, build up your filter in reusable steps. This is especially useful when you have a base condition (like “pending orders”) that you want to reuse across multiple derived facts and verification queries.

This snippet uses Model.where to build a reusable filter fragment, then refines it with another where before calling define:

from datetime import datetime
from relationalai.semantics import Integer, Model
CUTOFF = datetime(2026, 1, 1)
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"order_id": Integer})
NeedsReview = m.Concept("NeedsReview")
# Define some order entities.
m.define(
Order.new(order_id=1, promised_ship_date=datetime(2025, 12, 1), status="shipped"),
Order.new(order_id=2, promised_ship_date=datetime(2025, 12, 15), status="pending"),
Order.new(order_id=3, promised_ship_date=datetime(2026, 2, 1), status="pending"),
)
# Build a reusable base filter.
where_pending = m.where(Order.status == "pending")
# Refine it in a second step, then define the derived fact.
where_pending_before_cutoff = where_pending.where(Order.promised_ship_date < CUTOFF)
where_pending_before_cutoff.define(NeedsReview(Order))
# Materialize the flagged orders.
df = m.where(NeedsReview(Order)).select(Order.order_id, Order.promised_ship_date).to_df()
print(df)
  • where_pending = m.where(...) builds a reusable filter fragment that filters for pending orders.
  • where_pending_before_cutoff = where_pending.where(...) refines that fragment by adding another condition to filter pending orders before the cutoff.
  • where_pending_before_cutoff.define(...) defines the derived fact for orders that match both conditions.
  • Each where creates a new fragment, so you can keep the base filter (where_pending) and build multiple derived facts from it by refining it with different conditions.
  • This is more modular and reusable than writing one big where with all conditions at once.
  • It also allows you to materialize intermediate fragments (for example, where_pending.select(...).to_df()) to verify that each step is working as expected before you build on top of it.

When you already have separate where fragments and you want to combine them into one ANDed filter, combine them with &:

from datetime import datetime
from relationalai.semantics import Integer, Model
CUTOFF = datetime(2026, 1, 1)
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"order_id": Integer})
NeedsReview = m.Concept("NeedsReview")
# Define some order entities.
m.define(
Order.new(order_id=1, promised_ship_date=datetime(2025, 12, 1), status="shipped"),
Order.new(order_id=2, promised_ship_date=datetime(2025, 12, 15), status="pending"),
Order.new(order_id=3, promised_ship_date=datetime(2026, 2, 1), status="pending"),
)
where_pending = m.where(Order.status == "pending")
where_before_cutoff = m.where(Order.promised_ship_date > CUTOFF)
# Combine the fragments with & to define the derived fact.
where_pending_before_cutoff = where_pending & where_before_cutoff
where_pending_before_cutoff.define(NeedsReview(Order))
# Materialize the flagged orders.
df = m.where(NeedsReview(Order)).select(Order.order_id, Order.promised_ship_date).to_df()
print(df)
  • where_pending and where_before_cutoff are filter-only fragments.
  • & combines those fragments into a single ANDed filter.
  • Use & rather than Python and because Python and cannot be overloaded to build DSL logic.

You can also combine fragments with | to express fallback logic, where you want to take the left branch if it matches, and if not, take the right branch:

where_shipped = m.where(Order.status == "shipped")
where_after_cutoff = m.where(Order.promised_ship_date > CUTOFF)
where_shipped_or_after_cutoff = where_shipped | where_after_cutoff
# Materialize the combined fragment.
df = where_shipped_or_after_cutoff.select(Order.order_id, Order.promised_ship_date).to_df()
print(df)
  • | should be used only when you explicitly want fallback behavior (take the left branch if it matches, otherwise the right). If you are defining mutually exclusive categories, multiple conditional definitions (one per category) are often the clearest case-split pattern.
  • For either/or logic, prefer Model.union when you want to combine matches from multiple branches.

Derived concept membership lets you name reusable categories of entities based on conditions over their properties and relationships. For example, you might want to define a DelayedOrder concept for orders that have a promised ship date before their actual ship date:

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
# Model schema
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Shipment.shipped_at = m.Property(f"{Shipment} shipped at {DateTime}")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.promised_ship_date = m.Property(f"{Order} promised ship date is {DateTime}")
Order.shipments = m.Relationship(f"{Order} has shipment {Shipment}")
DelayedOrder = m.Concept("DelayedOrder", extends=[Order])
# Define some base Order and Shipment entities
m.define(
o := Order.new(id=1, promised_ship_date=datetime(2025, 12, 1)),
s := Shipment.new(id=1, shipped_at=datetime(2025, 12, 15)),
o.shipments(s),
)
# Define derived concept membership for delayed orders
m.where(
Order.shipments.shipped_at > Order.promised_ship_date,
).define(
DelayedOrder(Order)
)
# Verify: materialize delayed orders.
df = m.select(DelayedOrder.id).to_df()
print(df)
  • DelayedOrder extends Order, so members are still orders.
  • m.where(...) binds Shipment and Order and filters for late shipments.
  • .define(DelayedOrder(o)) defines membership for the matching orders.
  • The verification query ends with Fragment.to_df to execute and materialize results.
  • This pattern assumes you already defined base facts for Shipment.shipped_for, Shipment.shipped_at, and Order.promised_ship_date. If those facts are missing, the membership query will be empty.

When you want to compute a property value based on other values and conditions, use a conditional definition with Model.define to assign the property value for matches of the condition.

For example, you might want to compute a delay_in_days property for orders that have shipped late:

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
# Model schema
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Shipment.shipped_at = m.Property(f"{Shipment} shipped at {DateTime}")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.promised_ship_date = m.Property(f"{Order} promised ship date is {DateTime}")
Order.delay_in_days = m.Property(f"{DelayedOrder} is delayed {Integer} days")
Order.shipments = m.Relationship(f"{Order} has shipment {Shipment}")
# Define some base Order and Shipment entities
m.define(
o := Order.new(id=1, promised_ship_date=datetime(2025, 12, 1)),
s := Shipment.new(id=1, shipped_at=datetime(2025, 12, 1)),
o.shipments(s),
)
# Define derived property values for delayed orders.
m.where(
Order.shipments.shipped_at > Order.promised_ship_date,
).define(
Order.delay_in_days == datetime.diff("days", Order.shipments.shipped_at, Order.promised_ship_date)
)
# Verify: materialize orders and their delay in days.
df = m.select(
Order.id.alias("order_id"),
Order.delay_in_days.alias("delay_in_days") | 0,
).to_df()
print(df)

You can compute derived relationships to capture connections between entities that are not explicitly stated in your base facts, but can be inferred from them. For example, you might want to define a Customer concept and derive a Customer.delayed_orders relationship to link customers to their delayed orders:

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
# Model schema
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Shipment.shipped_at = m.Property(f"{Shipment} shipped at {DateTime}")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.promised_ship_date = m.Property(f"{Order} promised ship date is {DateTime}")
Customer = m.Concept("Customer", identify_by={"id": Integer})
Customer.orders = m.Relationship(f"{Customer} places {Order}")
Order.shipments = m.Relationship(f"{Order} has shipment {Shipment}")
# Derived relationship
Customer.delayed_orders = m.Relationship(f"{Customer} has delayed order {Order}")
# Define base facts
m.define(
c := Customer.new(id=1),
o := Order.new(id=1, promised_ship_date=datetime(2025, 12, 1)),
s := Shipment.new(id=1, shipped_at=datetime(2025, 12, 15)),
c.orders(o),
o.shipments(s),
)
# Define the derived relationship for delayed orders.
m.where(
Customer.orders.promised_ship_date < Customer.orders.shipments.shipped_at,
).define(
Customer.delayed_orders(Order)
)
# Verify: materialize customers and their delayed orders.
df = m.select(
Customer.id.alias("customer_id"),
Customer.delayed_orders.id.alias("delayed_order_id")
).to_df()
print(df)
  • The schema declares the base facts you need for the derivation: Customer.orders links customers to orders, and Order.shipments.shipped_at gives you the shipped timestamp to compare against Order.promised_ship_date.
  • Customer.delayed_orders is a derived relationship. Its direction is explicit: it links a Customer (left side) to an Order (right side).
  • The m.define(...) block creates one customer (c1), one order (o1), one shipment (s1), and connects them with c1.orders(o1) and o1.shipments(s1).
  • m.where(Customer.orders(Order), Order.promised_ship_date < Order.shipments.shipped_at) matches customer–order pairs where the order shipped after its promised ship date.
  • .define(Customer.delayed_orders(Order)) materializes those matches as new relationship facts.
  • The verification query selects Customer.id and Customer.delayed_orders.id and ends with Fragment.to_df to execute and show the derived links.

Match multiple entities of the same type with Concept.ref

Section titled “Match multiple entities of the same type with Concept.ref”

When you need to match multiple entities of the same concept in the same query, use Concept.ref to create distinct variable bindings for each match:

from relationalai.semantics import Model, String
m = Model("MyModel")
# Model schema
Order = m.Concept("Order", identify_by={"id": String})
# Define some base facts.
m.define(
Order.new(id="o1"),
Order.new(id="o2"),
Order.new(id="o3"),
)
# Get pairs of order ids.
df = m.where(
o1 = Order.ref(),
o2 = Order.ref(),
o1.id < o2.id,
).select(
o1.id.alias("order_1_id"),
o2.id.alias("order_2_id"),
)
  • Order.ref() creates a new variable binding for the Order concept.
  • Each call to Order.ref() creates a distinct variable, so o1 and o2 can match different orders in the same query.
  • m.select(o1.id, o2.id).to_df() materializes all pairs of order ids, including pairs where o1 and o2 are the same order.

Technically, only one of the Order.ref() calls needs to be distinct to match multiple orders. For example, you could have written the query with just one Order.ref() and one Order:

df = m.where(
o1 = Order
o2 = Order.ref(),
o1.id < o2.id,
).select(
o1.id.alias("order_1_id"),
o2.id.alias("order_2_id"),
)
Section titled “Match multiple related values with Chain.ref”

When you need to match multiple values from the same multi-valued relationship path in one query, use Chain.ref to create distinct path bindings. This is a good fit for pairwise logic over related entities, like “two different shipments for the same order”:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
# Model schema
Order = m.Concept("Order", identify_by={"id": Integer})
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Order.shipments = m.Relationship(f"{Order} has shipment {Shipment:shipment}")
# Define some base facts.
m.define(
o := Order.new(id=1),
s1 := Shipment.new(id=1),
s2 := Shipment.new(id=2),
s3 := Shipment.new(id=3),
o.shipments(s1),
o.shipments(s2),
o.shipments(s3),
)
# Get pairs of shipments for each order.
df = m.where(
s1 = Order.shipments.ref(),
s2 = Order.shipments.ref(),
s1.id < s2.id,
).select(
Order.id.alias("order_id"),
s1.id.alias("shipment_1_id"),
s2.id.alias("shipment_2_id"),
)
print(df)
  • Order.shipments is a Chain that represents the path from Order to Shipment through the shipments relationship.
  • Order.shipments.ref() creates a new variable binding for that path, allowing you to match multiple shipments for the same order in the same query.
  • Each call to Order.shipments.ref() creates a distinct path variable, so s1 and s2 can match different shipments for the same order in the same query.
  • The condition s1.id < s2.id ensures that we only get each pair of shipments once, and that s1 and s2 are not the same shipment
  • Chain.ref makes the path matches distinct when the same path can match multiple values. If you need multiple orders (a self-join on Order), you still need Order.ref.