Skip to content

Query a model

Build queries from a semantic model and get results back in Python. Use this when you want to explore a model, validate new facts while iterating, or extract a small result set for downstream code. Building a query is lazy, so compilation and runtime errors usually surface when you first materialize results.

A query in PyRel is a Fragment: a composable description of what you want to filter and return.

Most query-building methods return a new fragment instead of running work immediately. That makes it cheap to build up a query in small steps, reuse pieces, and debug the shape of your query before you execute it.

Execution happens when you materialize a fragment. The two most common materialization paths are Fragment.to_df (to get a pandas DataFrame) and Fragment.into + Fragment.exec (to export into a Snowflake table).

Here’s a minimal “compose, then materialize” pattern using Model.select:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
q = m.select(Order.id)
# Nothing runs until you materialize.
# This is where compilation, execution, and requirement checking happen.
df = q.to_df()
print(df.head())
  • q = m.select(Order.id) builds a fragment. It describes the result shape (one Order.id column) but does not execute.
  • q.to_df() is the materialization step. This is where the fragment is compiled and executed, and where requirements (if any) are checked.
  • Because the snippet declares schema but does not define any facts, the DataFrame is valid but typically empty. That keeps the example focused on the “lazy until materialized” behavior.

Selecting values defines what columns you want back when you materialize a fragment. Choose the smallest explicit selection that matches your question so your DataFrame schema is predictable. If you start broad, you can always refine later by calling select again on an existing fragment.

You can select properties to return readable scalar columns, like “order id” and “promised ship date”. This is similar to selecting columns from a SQL table.

Use Model.select with properties:

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.promised_ship_date = m.Property(f"{Order} promised ship date is {DateTime}")
m.define(
Order.new(id=1, promised_ship_date=datetime(2025, 12, 10)),
Order.new(id=2, promised_ship_date=datetime(2025, 12, 12)),
)
q = m.select(
Order.id,
Order.promised_ship_date,
)
df = q.to_df()
print(df.head())
  • m.define(Order.new(...)) creates Order entities with id and promised_ship_date properties.
  • m.select(Order.id, Order.promised_ship_date) selects those properties as columns in the output.
  • The resulting DataFrame has two columns: id (integers) and promised_ship_date (datetimes). There is one row per order.

Choose this when you need multiple fields of a relationship (including non-output fields) as separate output columns.

Relationship output fields are the default value. To reference a non-output field, index the relationship by name to create a FieldRef.

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Carrier = m.Concept("Carrier", identify_by={"id": Integer})
Shipment.shipped_for = m.Relationship(
f"{Shipment} via {Carrier:carrier} for {Order:order}"
)
m.define(
o1 := Order.new(id=1),
o2 := Order.new(id=2),
s1 := Shipment.new(id=1),
s2 := Shipment.new(id=2),
c1 := Carrier.new(id=1),
c2 := Carrier.new(id=2),
s1.shipped_for(c1, o1), # Shipment 1 via Carrier 1 for Order 1
s2.shipped_for(c2, o2), # Shipment 2 via Carrier 2 for Order 2
)
# Select fields by type
q = m.select(
Shipment.id,
Shipment.shipped_for[Carrier],
Shipment.shipped_for[Order],
)
# Alternatively, select fields by name
q = m.select(
Shipment.id,
Shipment.shipped_for["carrier"].id,
Shipment.shipped_for["order"].id,
)
# Select the relationship chain without indexing to get the output field
q = m.select(
Shipment.id,
Shipment.shipped_for.id, # This is the same as Shipment.shipped_for[Carrier].id
)
df = q.to_df()
print(df.head())
  • Shipment.shipped_for = m.Relationship(...) declares a relationship with three fields: a Shipment field, a Carrier field (named “carrier”), and an Order field (named “order”).
  • Shipment.shipped_for[Carrier].id selects the Carrier field by type and gets the id property for entities in that field. Shipment.shipped_for["carrier"].id is equivalent.
  • The output DataFrame has three columns: a shipment id, a carrier id, and an order id. Each row corresponds to a (Shipment, Carrier, Order) triple that matches the relationship.

Alias selected values when you want stable, readable column names in your output. This is especially helpful when you export results or when you want your DataFrame schema to stay stable as your model evolves.

Use alias(...) on selected values:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.status = m.Property(f"{Order} status is {String}")
m.define(
Order.new(id=1, status="shipped"),
Order.new(id=2, status="delayed"),
)
q = m.select(
Order.id.alias("order_id"),
Order.status.alias("order_status"),
)
df = q.to_df()
print(df.head())
  • Order.id.alias("order_id") sets the DataFrame column name for the selected id property.
  • Order.status.alias("order_status") does the same for the selected status property.
  • Aliasing affects output column names. It does not change your model schema or property names.

You can select a concept to get one row per entity. Although this isn’t very common in practice, it can be helpful for debugging.

Use Model.select with a concept handle:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order)
df = q.to_df()
print(df.head())
  • m.define(Order.new(...)) creates a small set of Order entities so the output has rows.
  • m.select(Order) selects entities that are members of the Order concept.
  • The resulting DataFrame has one column (named Order) and one row per order entity. The values in the Order column are internal entity references.
  • Entity references are internal values that point to specific entities in the model. They are not human-readable, but they are stable and can be helpful for debugging.

Filtering keeps only the rows that match the conditions you care about. Start a filter fragment with Model.where, then select the columns you want and materialize.

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Shipment.shipped_at = m.Property(f"{Shipment} shipped at {DateTime}")
m.define(
Shipment.new(id=1, shipped_at=datetime(2025, 12, 15)),
Shipment.new(id=2, shipped_at=datetime(2025, 12, 9)),
)
q = (
m.where(
Shipment.shipped_at > datetime(2025, 12, 10),
).select(
Shipment.id.alias("shipment_id"),
Shipment.shipped_at,
)
)
df = q.to_df()
print(df.head())
  • m.define(...) adds two shipments with different ship dates, so the filter has something to include and exclude.
  • m.where(Shipment.shipped_at > ...) keeps only shipments shipped after the cutoff date. With the sample facts, only shipment 1 matches.
  • select(...) chooses which values become DataFrame columns. alias("shipment_id") makes the column name stable and easy to read.
  • to_df() runs the query and returns results as a pandas DataFrame.
  • You can write the same query with select before where or where before select. Choose the order that reads best for your use case.

Rank and select top results with rank, asc, desc, and top

Section titled “Rank and select top results with rank, asc, desc, and top”

Ranking can help you find the “most important” matches when your query returns a large number of rows. Ordering helpers like asc and desc define how ranking works.

Use aggregates.rank with aggregates.desc when you want to rank output by descending value:

from relationalai.semantics import Integer, Model
from relationalai.semantics.std import aggregates
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define(
Order.new(id=1, delay_in_days=5),
Order.new(id=2, delay_in_days=2),
Order.new(id=3, delay_in_days=7),
)
q = m.select(
Order.id.alias("order_id"),
Order.delay_in_days,
aggregates.rank(aggregates.desc(Order.delay_in_days)).alias("delay_rank"),
)
df = q.to_df()
print(df.head())
  • aggregates.desc(Order.delay_in_days) defines the ordering for ranking. Larger delays are ranked first.
  • You can use aggregates.asc to rank in ascending order instead. In that case, smaller delays would be ranked first.
  • aggregates.rank(...) produces a numeric rank column. It does not sort the output DataFrame.
  • Because the rank is a column, you can filter on it (for example where(delay_rank <= 10)) or sort the DataFrame after materialization.

Use aggregates.top when you want a top- filter:

from relationalai.semantics import Integer, Model
from relationalai.semantics.std import aggregates
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define(
Order.new(id=1, delay_in_days=5),
Order.new(id=2, delay_in_days=2),
Order.new(id=3, delay_in_days=7),
)
q = (
m.select(Order.id.alias("order_id"), Order.delay_in_days)
.where(aggregates.top(10, Order.delay_in_days))
)
df = q.to_df()
print(df.head())
  • aggregates.top(10, Order.delay_in_days) is a predicate you use in where(...). It keeps only the top- matches by the chosen value.
  • In this snippet, there are fewer than 10 orders, so all rows match. Lower the 10 to a smaller number to see the filter remove rows.
  • Top- selection is descending by default (largest values are kept).
  • You can use aggregates.bottom to keep the smallest values instead.

Use distinct to remove duplicate rows or to count unique values. This comes up most often when relationship multiplicity produces duplicate rows in your output. For distinct rows, you must wrap the entire selected row.

Use distinct as a SELECT DISTINCT-style wrapper:

from relationalai.semantics import DateTime, Integer, Model, String, distinct
from relationalai.semantics.std import aggregates
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.status = m.Property(f"{Order} status is {String}")
Order.created_at = m.Property(f"{Order} created at is {DateTime}")
m.define(
Order.new(id=1, status="shipped", created_at=datetime(2025, 12, 1)),
Order.new(id=2, status="delayed", created_at=datetime(2025, 12, 1)),
Order.new(id=3, status="shipped", created_at=datetime(2025, 12, 1)),
)
q = m.select(distinct(Order.status))
print(q.to_df())
q_unique = m.select(
aggregates.count(distinct(Order.status)).alias("unique_statuses")
)
print(q_unique.to_df())
  • select(distinct(Order.status)) returns unique scalar values of Order.status. This is the SELECT DISTINCT pattern.
  • count(distinct(Order.status)) counts how many unique status values exist. With the sample facts, the result is 2 (“shipped” and “delayed”).
  • Use distinct(...) to control whether you are deduplicating rows or just the input to an aggregate.
  • If you select more than one column, you must wrap the entire row in distinct(...) to get unique rows. For example, select(distinct(Order.status, Order.created_at)) deduplicates the (status, created_at) row pairs.

Aggregates help you summarize results into small tables you can sanity-check quickly. Use them to answer questions like “how many orders per status” while iterating on a model. Grouping is explicit, and .per(...) defines the grouping dimensions.

Use aggregates.count with .per(...) to compute counts per group:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.status = m.Property(f"{Order} status is {String}")
m.define(
Order.new(id=1, status="shipped"),
Order.new(id=2, status="delayed"),
Order.new(id=3, status="shipped"),
)
q = m.select(
Order.status,
aggregates.count(Order).per(Order.status).alias("order_count"),
)
df = q.to_df()
print(df)
  • aggregates.count(Order) counts how many Order entities match the current fragment.
  • .per(Order.status) groups the count by status. This is the equivalent of “group by status” in SQL-style aggregation.
  • The resulting DataFrame has one row per status and a corresponding count. If you need a “unique count per group”, wrap the counted value with distinct(...) first.

Materialization turns a fragment into concrete results you can inspect and use in Python. Use it after your fragment selects the right columns and filters the right rows. If materialization fails, the fix is often configuration, credentials, or failing requirements rather than the query shape.

Materialize a fragment with Fragment.to_df:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id)
df = q.to_df() # Materialization happens here.
print(df)
  • m.define(Order.new(...)) lazily creates some data to query.
  • m.select(Order.id) builds a fragment that describes the query shape (one column with order ids) but does not run any work.
  • q.to_df() executes the fragment and returns a pandas DataFrame. This is the point where compilation and execution happen, and where requirements (if any) are checked.

Export results to Snowflake tables with into and exec

Section titled “Export results to Snowflake tables with into and exec”

Exporting writes a fragment’s results into a Snowflake table for downstream use. Use this when you want a stable table snapshot that other tools can query.

Use Model.Table to reference a destination table, then export with Fragment.into and Fragment.exec:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id.alias("order_id"))
out = m.Table("DB.SCHEMA.ORDER_EXPORT")
export = q.into(out)
# Configure the destination, then execute.
# exec() returns a DataFrame for immediate verification.
df = export.exec()
print(df.shape)
# Calling exec() again on the same fragment is a no-op.
print(export.exec())
  • out = m.Table("DB.SCHEMA.ORDER_EXPORT") creates a destination table reference. DB and SCHEMA are placeholders for your Snowflake database and schema.
  • q.into(out) attaches the export destination to the fragment. This configures where the results should be written.
  • export.exec() runs the export once, writes rows into the destination table, and returns a DataFrame for immediate verification.
  • A second exec() call is a no-op and may return None. This often surprises users when they re-run the same export fragment.
  • You need write access to the destination schema.
  • into(..., update=True) changes the export behavior to update the destination table instead of replace it.