Query a model
Build queries from a semantic model and get results back in Python. Use this when you want to explore a model, validate new facts while iterating, or extract a small result set for downstream code. Building a query is lazy, so compilation and runtime errors usually surface when you first materialize results.
- PyRel is installed and importable in Python. See Set Up Your Environment for instructions.
- You have a
Modelinstance and a declared semantic schema. See Create a Model Instance, Declare Concepts, and Declare Relationships and Properties. - You have defined the facts you want to query. See Define base facts and Define Requirements.
Understand fragments and materialization
Section titled “Understand fragments and materialization”A query in PyRel is a Fragment: a composable description of what you want to filter and return.
Most query-building methods return a new fragment instead of running work immediately. That makes it cheap to build up a query in small steps, reuse pieces, and debug the shape of your query before you execute it.
Execution happens when you materialize a fragment.
The two most common materialization paths are Fragment.to_df (to get a pandas DataFrame) and Fragment.into + Fragment.exec (to export into a Snowflake table).
Here’s a minimal “compose, then materialize” pattern using Model.select:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
q = m.select(Order.id)
# Nothing runs until you materialize.# This is where compilation, execution, and requirement checking happen.df = q.to_df()print(df.head())q = m.select(Order.id)builds a fragment. It describes the result shape (oneOrder.idcolumn) but does not execute.q.to_df()is the materialization step. This is where the fragment is compiled and executed, and where requirements (if any) are checked.- Because the snippet declares schema but does not define any facts, the DataFrame is valid but typically empty. That keeps the example focused on the “lazy until materialized” behavior.
Select values with select
Section titled “Select values with select”Selecting values defines what columns you want back when you materialize a fragment.
Choose the smallest explicit selection that matches your question so your DataFrame schema is predictable.
If you start broad, you can always refine later by calling select again on an existing fragment.
Select properties
Section titled “Select properties”You can select properties to return readable scalar columns, like “order id” and “promised ship date”. This is similar to selecting columns from a SQL table.
Use Model.select with properties:
from relationalai.semantics import DateTime, Integer, Modelfrom relationalai.semantics.std.datetime import datetime
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.promised_ship_date = m.Property(f"{Order} promised ship date is {DateTime}")
m.define( Order.new(id=1, promised_ship_date=datetime(2025, 12, 10)), Order.new(id=2, promised_ship_date=datetime(2025, 12, 12)),)
q = m.select( Order.id, Order.promised_ship_date,)
df = q.to_df()print(df.head())m.define(Order.new(...))createsOrderentities withidandpromised_ship_dateproperties.m.select(Order.id, Order.promised_ship_date)selects those properties as columns in the output.- The resulting DataFrame has two columns:
id(integers) andpromised_ship_date(datetimes). There is one row per order.
Select fields of relationships
Section titled “Select fields of relationships”Choose this when you need multiple fields of a relationship (including non-output fields) as separate output columns.
Relationship output fields are the default value.
To reference a non-output field, index the relationship by name to create a FieldRef.
from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})Shipment = m.Concept("Shipment", identify_by={"id": Integer})Carrier = m.Concept("Carrier", identify_by={"id": Integer})
Shipment.shipped_for = m.Relationship( f"{Shipment} via {Carrier:carrier} for {Order:order}")
m.define( o1 := Order.new(id=1), o2 := Order.new(id=2), s1 := Shipment.new(id=1), s2 := Shipment.new(id=2), c1 := Carrier.new(id=1), c2 := Carrier.new(id=2), s1.shipped_for(c1, o1), # Shipment 1 via Carrier 1 for Order 1 s2.shipped_for(c2, o2), # Shipment 2 via Carrier 2 for Order 2)
# Select fields by typeq = m.select( Shipment.id, Shipment.shipped_for[Carrier], Shipment.shipped_for[Order],)
# Alternatively, select fields by nameq = m.select( Shipment.id, Shipment.shipped_for["carrier"].id, Shipment.shipped_for["order"].id,)
# Select the relationship chain without indexing to get the output fieldq = m.select( Shipment.id, Shipment.shipped_for.id, # This is the same as Shipment.shipped_for[Carrier].id)
df = q.to_df()print(df.head())Shipment.shipped_for = m.Relationship(...)declares a relationship with three fields: a Shipment field, a Carrier field (named “carrier”), and an Order field (named “order”).Shipment.shipped_for[Carrier].idselects the Carrier field by type and gets theidproperty for entities in that field.Shipment.shipped_for["carrier"].idis equivalent.- The output DataFrame has three columns: a shipment id, a carrier id, and an order id.
Each row corresponds to a
(Shipment, Carrier, Order)triple that matches the relationship.
Alias selected values
Section titled “Alias selected values”Alias selected values when you want stable, readable column names in your output. This is especially helpful when you export results or when you want your DataFrame schema to stay stable as your model evolves.
Use alias(...) on selected values:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.status = m.Property(f"{Order} status is {String}")
m.define( Order.new(id=1, status="shipped"), Order.new(id=2, status="delayed"),)
q = m.select( Order.id.alias("order_id"), Order.status.alias("order_status"),)
df = q.to_df()print(df.head())Order.id.alias("order_id")sets the DataFrame column name for the selectedidproperty.Order.status.alias("order_status")does the same for the selectedstatusproperty.- Aliasing affects output column names. It does not change your model schema or property names.
(Advanced) Select concepts
Section titled “(Advanced) Select concepts”You can select a concept to get one row per entity. Although this isn’t very common in practice, it can be helpful for debugging.
Use Model.select with a concept handle:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order)df = q.to_df()print(df.head())m.define(Order.new(...))creates a small set ofOrderentities so the output has rows.m.select(Order)selects entities that are members of theOrderconcept.- The resulting DataFrame has one column (named
Order) and one row per order entity. The values in theOrdercolumn are internal entity references.
- Entity references are internal values that point to specific entities in the model. They are not human-readable, but they are stable and can be helpful for debugging.
Filter results with where
Section titled “Filter results with where”Filtering keeps only the rows that match the conditions you care about.
Start a filter fragment with Model.where, then select the columns you want and materialize.
from relationalai.semantics import DateTime, Integer, Modelfrom relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Shipment = m.Concept("Shipment", identify_by={"id": Integer})Shipment.shipped_at = m.Property(f"{Shipment} shipped at {DateTime}")
m.define( Shipment.new(id=1, shipped_at=datetime(2025, 12, 15)), Shipment.new(id=2, shipped_at=datetime(2025, 12, 9)),)
q = ( m.where( Shipment.shipped_at > datetime(2025, 12, 10), ).select( Shipment.id.alias("shipment_id"), Shipment.shipped_at, ))
df = q.to_df()print(df.head())m.define(...)adds two shipments with different ship dates, so the filter has something to include and exclude.m.where(Shipment.shipped_at > ...)keeps only shipments shipped after the cutoff date. With the sample facts, only shipment 1 matches.select(...)chooses which values become DataFrame columns.alias("shipment_id")makes the column name stable and easy to read.to_df()runs the query and returns results as a pandas DataFrame.
- You can write the same query with
selectbeforewhereorwherebeforeselect. Choose the order that reads best for your use case.
Rank and select top results with rank, asc, desc, and top
Section titled “Rank and select top results with rank, asc, desc, and top”Ranking can help you find the “most important” matches when your query returns a large number of rows.
Ordering helpers like asc and desc define how ranking works.
Use aggregates.rank with aggregates.desc when you want to rank output by descending value:
from relationalai.semantics import Integer, Modelfrom relationalai.semantics.std import aggregates
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define( Order.new(id=1, delay_in_days=5), Order.new(id=2, delay_in_days=2), Order.new(id=3, delay_in_days=7),)
q = m.select( Order.id.alias("order_id"), Order.delay_in_days, aggregates.rank(aggregates.desc(Order.delay_in_days)).alias("delay_rank"),)
df = q.to_df()print(df.head())aggregates.desc(Order.delay_in_days)defines the ordering for ranking. Larger delays are ranked first.
- You can use
aggregates.ascto rank in ascending order instead. In that case, smaller delays would be ranked first. aggregates.rank(...)produces a numeric rank column. It does not sort the output DataFrame.- Because the rank is a column, you can filter on it (for example
where(delay_rank <= 10)) or sort the DataFrame after materialization.
Use aggregates.top when you want a top-
from relationalai.semantics import Integer, Modelfrom relationalai.semantics.std import aggregates
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define( Order.new(id=1, delay_in_days=5), Order.new(id=2, delay_in_days=2), Order.new(id=3, delay_in_days=7),)
q = ( m.select(Order.id.alias("order_id"), Order.delay_in_days) .where(aggregates.top(10, Order.delay_in_days)))
df = q.to_df()print(df.head())aggregates.top(10, Order.delay_in_days)is a predicate you use inwhere(...). It keeps only the top-matches by the chosen value. - In this snippet, there are fewer than 10 orders, so all rows match.
Lower the
10to a smaller number to see the filter remove rows.
- Top-
selection is descending by default (largest values are kept). - You can use
aggregates.bottomto keep the smallest values instead.
Deduplicate rows with distinct
Section titled “Deduplicate rows with distinct”Use distinct to remove duplicate rows or to count unique values.
This comes up most often when relationship multiplicity produces duplicate rows in your output.
For distinct rows, you must wrap the entire selected row.
Use distinct as a SELECT DISTINCT-style wrapper:
from relationalai.semantics import DateTime, Integer, Model, String, distinctfrom relationalai.semantics.std import aggregatesfrom relationalai.semantics.std.datetime import datetime
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.status = m.Property(f"{Order} status is {String}")Order.created_at = m.Property(f"{Order} created at is {DateTime}")
m.define( Order.new(id=1, status="shipped", created_at=datetime(2025, 12, 1)), Order.new(id=2, status="delayed", created_at=datetime(2025, 12, 1)), Order.new(id=3, status="shipped", created_at=datetime(2025, 12, 1)),)
q = m.select(distinct(Order.status))print(q.to_df())
q_unique = m.select( aggregates.count(distinct(Order.status)).alias("unique_statuses"))print(q_unique.to_df())select(distinct(Order.status))returns unique scalar values ofOrder.status. This is theSELECT DISTINCTpattern.count(distinct(Order.status))counts how many unique status values exist. With the sample facts, the result is 2 (“shipped” and “delayed”).- Use
distinct(...)to control whether you are deduplicating rows or just the input to an aggregate.
- If you select more than one column, you must wrap the entire row in
distinct(...)to get unique rows. For example,select(distinct(Order.status, Order.created_at))deduplicates the(status, created_at)row pairs.
Aggregate results with count and per
Section titled “Aggregate results with count and per”Aggregates help you summarize results into small tables you can sanity-check quickly.
Use them to answer questions like “how many orders per status” while iterating on a model.
Grouping is explicit, and .per(...) defines the grouping dimensions.
Use aggregates.count with .per(...) to compute counts per group:
from relationalai.semantics import Integer, Model, Stringfrom relationalai.semantics.std import aggregates
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.status = m.Property(f"{Order} status is {String}")
m.define( Order.new(id=1, status="shipped"), Order.new(id=2, status="delayed"), Order.new(id=3, status="shipped"),)
q = m.select( Order.status, aggregates.count(Order).per(Order.status).alias("order_count"),)
df = q.to_df()print(df)aggregates.count(Order)counts how manyOrderentities match the current fragment..per(Order.status)groups the count by status. This is the equivalent of “group by status” in SQL-style aggregation.- The resulting DataFrame has one row per status and a corresponding count.
If you need a “unique count per group”, wrap the counted value with
distinct(...)first.
Materialize results with to_df
Section titled “Materialize results with to_df”Materialization turns a fragment into concrete results you can inspect and use in Python. Use it after your fragment selects the right columns and filters the right rows. If materialization fails, the fix is often configuration, credentials, or failing requirements rather than the query shape.
Materialize a fragment with Fragment.to_df:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id)
df = q.to_df() # Materialization happens here.print(df)m.define(Order.new(...))lazily creates some data to query.m.select(Order.id)builds a fragment that describes the query shape (one column with order ids) but does not run any work.q.to_df()executes the fragment and returns a pandas DataFrame. This is the point where compilation and execution happen, and where requirements (if any) are checked.
Export results to Snowflake tables with into and exec
Section titled “Export results to Snowflake tables with into and exec”Exporting writes a fragment’s results into a Snowflake table for downstream use. Use this when you want a stable table snapshot that other tools can query.
Use Model.Table to reference a destination table, then export with Fragment.into and Fragment.exec:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id.alias("order_id"))
out = m.Table("DB.SCHEMA.ORDER_EXPORT")
export = q.into(out)
# Configure the destination, then execute.# exec() returns a DataFrame for immediate verification.df = export.exec()print(df.shape)
# Calling exec() again on the same fragment is a no-op.print(export.exec())out = m.Table("DB.SCHEMA.ORDER_EXPORT")creates a destination table reference.DBandSCHEMAare placeholders for your Snowflake database and schema.q.into(out)attaches the export destination to the fragment. This configures where the results should be written.export.exec()runs the export once, writes rows into the destination table, and returns a DataFrame for immediate verification.- A second
exec()call is a no-op and may returnNone. This often surprises users when they re-run the same export fragment.
- You need write access to the destination schema.
into(..., update=True)changes the export behavior to update the destination table instead of replace it.