Skip to content

Aggregate and group data

Use std.aggregates to compute aggregate values over sets of matches in your model. This guide covers how to use different aggregate functions, control grouping with .per(), scope which matches contribute with aggregate-local .where(), and handle duplicates with distinct().

Understand aggregates and explicit grouping

Section titled “Understand aggregates and explicit grouping”

An aggregate is a value computed over a set of matches. In PyRel, aggregates do not “look at your selected columns” and then automatically group. You always control grouping explicitly.

Use this mental model:

  1. Step 1: Define what can contribute. An aggregate counts, sums, or averages whatever matches its pattern. If a fact does not match the aggregate-local .where(), it does not contribute.
  2. Step 2: Define what it is per. .per() tells PyRel which keys define a group. PyRel computes one aggregate value per distinct combination of those keys.

Output shape is the easiest way to sanity-check whether you grouped correctly:

  • No .per() → one value for the full matching set (a global aggregate).
  • .per(key) → one value per key.
  • .per(key1, key2, ...) → one value per key tuple.

Here is an example of a global count and a per-group count:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
5 collapsed lines
m.define(
Ticket.new(id=101, status="open"),
Ticket.new(id=102, status="open"),
Ticket.new(id=103, status="closed"),
)
global_ticket_count = aggregates.count(Ticket)
ticket_count_per_status = aggregates.count(Ticket).per(Ticket.status)
print(m.select(global_ticket_count.alias("global_tickets")).to_df())
print(m.select(Ticket.status, ticket_count_per_status.alias("tickets")).to_df())
  • aggregates.count(Ticket) counts matches of Ticket.
  • Without .per(), the result is a single global value.
  • With .per(Ticket.status), the result has one value per matching status.
  • When you group by a value key (like a status string), you usually select that same key next to the aggregate to validate the result.

Choose an aggregate based on the question you are answering and the value you are summarizing. If your match pattern can produce duplicates, use distinct() to control whether you are aggregating a multiset or a set.

Use this table to choose the right aggregate for your definition:

What to useWhen to use it
count()Use when you need “how many matches exist” for a pattern.
sum()Use when you need a total of a numeric value across matches.
avg()Use when you need the mean of a numeric value across matches.
min()Use when you need the smallest or earliest comparable value across matches.
max()Use when you need the largest or latest comparable value across matches.

Use aggregates.count() when you need a global count or a per-group count of matching facts:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Account = m.Concept("Account", identify_by={"id": Integer})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.account = m.Relationship(f"{Ticket} belongs to {Account:account}")
10 collapsed lines
m.define(
a1 := Account.new(id=1),
a2 := Account.new(id=2),
t1 := Ticket.new(id=101, status="open"),
t2 := Ticket.new(id=102, status="open"),
t3 := Ticket.new(id=103, status="closed"),
t1.account(a1),
t2.account(a1),
t3.account(a2),
)
ticket_count_per_account = (
aggregates.count(Ticket)
.per(Account)
.where(Ticket.account == Account)
)
ticket_count_per_account_status = (
aggregates.count(Ticket)
.per(Account, Ticket.status)
.where(Ticket.account == Account)
)
# Number of tickets per account
print(
m.select(
Account.id,
ticket_count_per_account.alias("tickets")
).to_df()
)
# Number of tickets per account of each status
print(
m.select(
Account.id,
Ticket.status,
ticket_count_per_account_status.alias("tickets")
).to_df()
)
  • count(Ticket) counts matching Ticket facts.
  • .per(Account) produces one count per account.
  • .per(Account, Ticket.status) produces one count per (account, status) pair.
  • Ticket.account == Account associates each ticket with an account group.
  • count() counts matches, not “unique entities”. If your match pattern can produce duplicates (for example, through a join), use distinct().
  • Per-group counts only include rows that have a value for the grouping key. If a ticket is missing Ticket.status, it contributes to count(Ticket) but not to count(Ticket).per(..., Ticket.status).
  • Aggregates omit groups with zero matches. If an account has zero tickets, it won’t appear in the per-account output. If you expect a group to show up with 0, set a default value using the fallback operator.

Use aggregates.sum() when each match contributes a numeric value you want to total:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Account = m.Concept("Account", identify_by={"id": Integer})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.resolution_minutes = m.Property(f"{Ticket} resolution minutes is {Integer:minutes}")
Ticket.account = m.Relationship(f"{Ticket} belongs to {Account:account}")
10 collapsed lines
m.define(
a1 := Account.new(id=1),
a2 := Account.new(id=2),
t1 := Ticket.new(id=101, status="closed", resolution_minutes=120),
t2 := Ticket.new(id=102, status="closed", resolution_minutes=45),
t3 := Ticket.new(id=103, status="open", resolution_minutes=10),
t1.account(a1),
t2.account(a1),
t3.account(a2),
)
total_resolution_minutes = (
aggregates.sum(Ticket.resolution_minutes)
.per(Account)
.where(Ticket.account == Account, Ticket.status == "closed")
)
print(m.select(Account.id, total_resolution_minutes.alias("total_resolution_minutes")).to_df())
  • The summed value is Ticket.resolution_minutes.
  • Only closed tickets contribute because the aggregate-local .where(...) scopes the contributing set.
  • total_resolution_minutes produces one value per account because of .per(Account).
  • Missing resolution_minutes values do not contribute to the sum.
  • sum() only adds values that exist. If Ticket.resolution_minutes is missing for a matching ticket, that ticket contributes no value.
  • Grouped sums omit groups with zero matches. If an account has no matching closed tickets, it won’t appear in the output.
  • If your match pattern can produce duplicates (for example, through a join), your sum can be larger than expected. Consider distinct() only when treating duplicates as the same contribution is correct for your metric.

Use aggregates.avg() when you need a per-group mean of a numeric value:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Queue = m.Concept("Queue", identify_by={"name": String})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.first_response_minutes = m.Property(f"{Ticket} first response minutes is {Integer:minutes}")
Ticket.queue = m.Relationship(f"{Ticket} assigned to {Queue:queue}")
10 collapsed lines
m.define(
q1 := Queue.new(name="triage"),
q2 := Queue.new(name="billing"),
t1 := Ticket.new(id=101, status="open", first_response_minutes=12),
t2 := Ticket.new(id=102, status="open", first_response_minutes=55),
t3 := Ticket.new(id=103, status="closed", first_response_minutes=5),
t1.queue(q1),
t2.queue(q1),
t3.queue(q2),
)
avg_first_response = (
aggregates.avg(Ticket.first_response_minutes)
.per(Queue)
.where(Ticket.queue == Queue, Ticket.status == "open")
)
print(m.select(Queue.name, avg_first_response.alias("avg_first_response_minutes")).to_df())
  • The averaged value is Ticket.first_response_minutes.
  • Only open tickets contribute because the aggregate-local .where(...) scopes the contributing set.
  • avg_first_response produces one value per queue because of .per(Queue).
  • When debugging an average, validate .count() and .sum() for the same grouping keys.
  • avg() averages only values that exist. If Ticket.first_response_minutes is missing for a matching ticket, that ticket contributes no value.
  • Grouped averages omit groups with zero matches. If a queue has no matching open tickets, it won’t appear in the output. If you expect a group to show up with 0 (or another default), set a default value using the fallback operator.
  • If your match pattern can produce duplicates (for example, through a join), your average can be skewed. Consider distinct() only when treating duplicates as the same contribution is correct for your metric.

Use aggregates.min() to find the smallest comparable value per group:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Account = m.Concept("Account", identify_by={"id": Integer})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.first_response_minutes = m.Property(f"{Ticket} first response minutes is {Integer:minutes}")
Ticket.account = m.Relationship(f"{Ticket} belongs to {Account:account}")
10 collapsed lines
m.define(
a1 := Account.new(id=1),
a2 := Account.new(id=2),
t1 := Ticket.new(id=101, first_response_minutes=25),
t2 := Ticket.new(id=102, first_response_minutes=7),
t3 := Ticket.new(id=103, first_response_minutes=18),
t1.account(a1),
t2.account(a1),
t3.account(a2),
)
min_first_response = (
aggregates.min(Ticket.first_response_minutes)
.per(Account)
.where(Ticket.account == Account)
)
print(m.select(Account.id, min_first_response.alias("min_first_response_minutes")).to_df())
  • The minimized value is Ticket.first_response_minutes.
  • min_first_response produces one value per account because of .per(Account).
  • Ticket.account == Account associates each ticket with an account group.
  • Groups with no matching tickets do not appear.
  • min() only considers values that exist. If Ticket.first_response_minutes is missing for a matching ticket, that ticket contributes no value.
  • Grouped minimums omit groups with zero matches. If an account has no tickets, it won’t appear in the output. If you expect a group to show up with a default value, set a default value using the fallback operator.
  • min() returns the minimum value, not “the row that produced it”. If you need the ticket ID for the minimum, select contributing rows (ticket + value) and validate which one is smallest.

Use aggregates.max() to find the largest comparable value per group:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Queue = m.Concept("Queue", identify_by={"name": String})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.resolution_minutes = m.Property(f"{Ticket} resolution minutes is {Integer:minutes}")
Ticket.queue = m.Relationship(f"{Ticket} assigned to {Queue:queue}")
10 collapsed lines
m.define(
q1 := Queue.new(name="triage"),
q2 := Queue.new(name="billing"),
t1 := Ticket.new(id=101, status="closed", resolution_minutes=240),
t2 := Ticket.new(id=102, status="closed", resolution_minutes=45),
t3 := Ticket.new(id=103, status="open", resolution_minutes=10),
t1.queue(q1),
t2.queue(q1),
t3.queue(q2),
)
max_resolution = (
aggregates.max(Ticket.resolution_minutes)
.per(Queue)
.where(Ticket.queue == Queue, Ticket.status == "closed")
)
print(m.select(Queue.name, max_resolution.alias("max_resolution_minutes")).to_df())
  • The maximized value is Ticket.resolution_minutes.
  • Only closed tickets contribute because the aggregate-local .where() scopes the contributing set.
  • max_resolution produces one value per queue because of .per(Queue).
  • Groups with no matching closed tickets do not appear.
  • max() only considers values that exist. If Ticket.resolution_minutes is missing for a matching ticket, that ticket contributes no value.
  • Grouped maxima omit groups with zero matches. If you expect a group to show up with a default value, set a default value using the fallback operator.
  • max() returns the maximum value, not “the row that produced it”. If you need the ticket ID for the maximum, select contributing rows (ticket + value) and validate which one is largest.

Use .per() to define the grouping keys for an aggregate. You can group by one key or multiple keys.

As a rule of thumb:

  • Use one key when your question is “per thing” (for example “per account”).
  • Use multiple keys when your question is “per thing and per breakdown” (for example “per account, per queue”).

Adding keys usually makes the result more granular and increases the number of rows.

This example groups by both account and queue:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Account = m.Concept("Account", identify_by={"id": Integer})
Queue = m.Concept("Queue", identify_by={"name": String})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.account = m.Relationship(f"{Ticket} belongs to {Account:account}")
Ticket.queue = m.Relationship(f"{Ticket} assigned to {Queue:queue}")
15 collapsed lines
m.define(
a1 := Account.new(id=1),
a2 := Account.new(id=2),
q1 := Queue.new(name="triage"),
q2 := Queue.new(name="billing"),
t1 := Ticket.new(id=101, status="open"),
t2 := Ticket.new(id=102, status="open"),
t3 := Ticket.new(id=103, status="open"),
t1.account(a1),
t2.account(a1),
t3.account(a2),
t1.queue(q1),
t2.queue(q2),
t3.queue(q1),
)
open_tickets_per_account_queue = (
aggregates.count(Ticket)
.per(Account, Queue)
.where(Ticket.account == Account, Ticket.queue == Queue, Ticket.status == "open")
)
df = m.select(
Account.id,
Queue.name,
open_tickets_per_account_queue.alias("open_tickets"),
).to_df()
print(df)
  • .per(Account, Queue) produces one value per (account, queue) pair.
  • The aggregate-local .where(...) associates each ticket with its account group and queue group.
  • Select the grouping keys next to the aggregate value to validate output shape.
  • If your .per() keys do not match your intended grouping level, you can get “too many rows”.

You can also use the module-level aggregates.per() function to define a reusable grouped aggregate that you can apply to different measures, and then specialize it with an aggregate function and a .where():

open_tickets_per_account_queue = (
aggregates.per(Account, Queue)
.count(Ticket)
.where(Ticket.account == Account, Ticket.queue == Queue, Ticket.status == "open")
)
  • .per(...) controls output shape. If you forget .per(...), you’ll compute a global aggregate.
  • Grouped aggregates only include matches that bind all the grouping keys. If a ticket is missing an account or queue, it won’t contribute to a per-(account, queue) result. See Handle missing data when groups or rows seem to “disappear”.
  • Prefer stable grouping keys. Grouping by concepts (like Account and Queue) or canonical identifiers (like Account.id) is usually easier to validate than grouping by free-form strings.

Scope aggregate inputs with aggregate-local .where()

Section titled “Scope aggregate inputs with aggregate-local .where()”

Aggregate-local .where() controls which matches contribute to an aggregate. This is different from filtering the rows you select with Fragment.where().

Use this rule of thumb:

  • Put predicates in the aggregate-local .where() when they define the contributing matches. This includes join edges (like Ticket.has_comment(Comment)) and any “only count these” filters.
  • Put predicates in the fragment .where() when they define which groups you want to display. This does not change how the aggregate is computed, only which keys show up in your output.

This example computes comment counts per ticket. The aggregate-local .where() makes the relationship predicate part of the aggregate’s contributing set:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Comment = m.Concept("Comment", identify_by={"id": Integer})
Ticket.has_comment = m.Relationship(f"{Ticket} has comment {Comment:comment}")
10 collapsed lines
m.define(
t1 := Ticket.new(id=1),
t2 := Ticket.new(id=2),
c1 := Comment.new(id=11),
c2 := Comment.new(id=12),
c3 := Comment.new(id=13),
t1.has_comment(c1),
t1.has_comment(c2),
t2.has_comment(c3),
)
comment_count = aggregates.count(Comment).per(Ticket).where(Ticket.has_comment(Comment))
print(m.select(Ticket.id, comment_count.alias("comment_count")).to_df())

For example, this filters the output to one ticket (but keeps the same per-ticket aggregate definition):

q = m.select(Ticket.id, comment_count.alias("comment_count")).where(Ticket.id == 1)
print(q.to_df())
  • Ticket.has_comment(Comment) is part of the aggregate’s match pattern. Only those comment matches contribute to the count.
  • When you scope contributions with aggregate-local .where(), the output shape stays stable. You can select the grouping key and the aggregate side-by-side to validate it.
  • Aggregate-local .where() is the easiest way to keep output shape stable while changing which matches contribute. When results look wrong, validate the contributing rows first (for example Ticket.id + Comment.id).
  • Groups with zero matches do not appear in grouped aggregates. If you expect a ticket with 0 comments to show up, set a default value using the fallback operator.
  • You can also pre-filter by defining a derived concept (for example “TicketWithComment”) and aggregating over that. Aggregate-local .where() is often the most direct approach when the predicate is “inside” the aggregate.

Deduplicate aggregate inputs with distinct()

Section titled “Deduplicate aggregate inputs with distinct()”

Use distinct() inside an aggregate when your match pattern can produce duplicates. This is common when a join expands one entity into multiple matching rows.

The key idea is that distinct() changes the contributing multiset into a set:

  • count(User, Comment) counts how many distinct (user, comment) pairs match. A user who comments twice contributes two pairs.
  • count(distinct(User)) counts unique users contributing to the aggregate.

This example counts comment-author matches per support ticket and compares that to a distinct-user count:

from relationalai.semantics import Integer, Model, String, distinct
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
User = m.Concept("User", identify_by={"id": Integer})
User.email = m.Property(f"{User} has email {String:email}")
Comment = m.Concept("Comment", identify_by={"id": Integer})
Ticket.has_comment = m.Relationship(f"{Ticket} has comment {Comment:comment}")
Comment.author = m.Relationship(f"{Comment} authored by {User:user}")
18 collapsed lines
m.define(
t1 := Ticket.new(id=1),
t2 := Ticket.new(id=2),
u1 := User.new(id=101, email="ava@example.com"),
u2 := User.new(id=102, email="ben@example.com"),
c1 := Comment.new(id=11),
c2 := Comment.new(id=12),
c3 := Comment.new(id=13),
c4 := Comment.new(id=14),
t1.has_comment(c1),
t1.has_comment(c2),
t2.has_comment(c3),
t2.has_comment(c4),
c1.author(u1),
c2.author(u1),
c3.author(u1),
c4.author(u2),
)
comment_author_matches = (
aggregates.count(User, Comment)
.per(Ticket)
.where(Ticket.has_comment(Comment), Comment.author == User)
)
unique_commenters = (
aggregates.count(distinct(User))
.per(Ticket)
.where(Ticket.has_comment(Comment), Comment.author == User)
)
df = m.select(
Ticket.id,
comment_author_matches.alias("comment_author_matches"),
unique_commenters.alias("unique_commenters"),
).to_df()
print(df)
  • count(User, Comment) counts comment-author matches. A user who comments twice contributes two matches (because the comment differs).
  • count(distinct(User)) counts unique users contributing to the aggregate.
  • When the numbers do not match your expectation, select the raw contributing rows first. That is usually faster than changing the aggregate.
  • distinct() placement matters. Wrap the expression that you want to treat as unique (for example distinct(User) or distinct(Ticket.resolution_minutes)).
  • distinct() can also change the meaning of .sum() and .avg() by deduplicating the value expression. Use it only when you want to treat duplicates as the same contribution.
  • distinct() inside an aggregate deduplicates the aggregate’s inputs. distinct() in a select() deduplicates output rows. See Deduplicate rows with distinct().

Use aggregates.rank() when you want an ordinal rank value computed from one or more ordering keys.

Here’s the mental model:

  • rank(...) produces a rank column. It does not sort your output.
  • Use aggregates.asc() and aggregates.desc() to make ordering explicit.
  • Add a stable secondary key (like an ID) when ties matter.

This example ranks accounts by their derived open ticket count:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Account = m.Concept("Account", identify_by={"id": Integer})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.account = m.Relationship(f"{Ticket} belongs to {Account:account}")
Account.open_ticket_count = m.Property(f"{Account} open ticket count is {Integer:count}")
13 collapsed lines
m.define(
a1 := Account.new(id=1),
a2 := Account.new(id=2),
a3 := Account.new(id=3),
t1 := Ticket.new(id=101, status="open"),
t2 := Ticket.new(id=102, status="open"),
t3 := Ticket.new(id=103, status="closed"),
t4 := Ticket.new(id=104, status="open"),
t1.account(a1),
t2.account(a1),
t3.account(a2),
t4.account(a3),
)
open_ticket_count = (
aggregates.count(Ticket)
.per(Account)
.where(Ticket.account == Account, Ticket.status == "open")
)
m.define(Account.open_ticket_count(open_ticket_count))
open_ticket_rank = aggregates.rank(
aggregates.desc(Account.open_ticket_count),
aggregates.asc(Account.id),
)
df = m.select(
Account.id,
Account.open_ticket_count,
open_ticket_rank.alias("rank_by_open_tickets"),
).to_df()
print(df)
  • Account.open_ticket_count materializes the aggregate metric as a reusable property.
  • rank(desc(...), asc(Account.id)) adds a deterministic secondary ordering key when counts tie.
  • rank() produces a rank column. It does not sort the output DataFrame.
  • If you need a “top N” list, use top() or bottom() as predicates in where(). rank() by itself does not filter results.
  • You can rank within a group by calling .per() on the rank expression. For example, rank(desc(Product.revenue)).per(Store) ranks products per store.
  • When a rank looks wrong, validate the ordering keys first. Select the key, the metric you’re ranking by, and the rank() column side-by-side.

Use aggregates.top() and aggregates.bottom() as predicates in .where() when you want to keep only the highest or lowest groups.

Here’s how to think about it:

  • top() keeps the largest values. bottom() keeps the smallest values.
  • The first ordering argument is your primary sort key. Any additional arguments act as tie-breakers.
  • Add .per() when you need “top N per group”.

This example keeps only the 2 accounts with the highest open ticket counts:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Account = m.Concept("Account", identify_by={"id": Integer})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.account = m.Relationship(f"{Ticket} belongs to {Account:account}")
Account.open_ticket_count = m.Property(f"{Account} open ticket count is {Integer:count}")
13 collapsed lines
m.define(
a1 := Account.new(id=1),
a2 := Account.new(id=2),
a3 := Account.new(id=3),
t1 := Ticket.new(id=101, status="open"),
t2 := Ticket.new(id=102, status="open"),
t3 := Ticket.new(id=103, status="open"),
t4 := Ticket.new(id=104, status="open"),
t1.account(a1),
t2.account(a1),
t3.account(a2),
t4.account(a3),
)
open_ticket_count = (
aggregates.count(Ticket)
.per(Account)
.where(Ticket.account == Account, Ticket.status == "open")
)
m.define(Account.open_ticket_count(open_ticket_count))
q = (
m.select(Account.id, Account.open_ticket_count)
.where(aggregates.top(2, Account.open_ticket_count, Account.id))
)
print(q.to_df())
  • top(2, Account.open_ticket_count, Account.id) keeps the highest counts.
  • The extra Account.id argument acts as a stable tie-breaker.
  • Use aggregates.bottom() to keep the smallest values instead.
  • top() and bottom() filter rows. They do not sort your output. If you want results ordered in the DataFrame, sort after materializing with .to_df().
  • Ties are the most common source of surprises. If multiple rows share the same boundary value, add a stable tie-breaker key (like an ID) so selection is deterministic.
  • To select extremes within each group, call .per() on the top() / bottom() expression. For example, top(3, Product.revenue).per(Store) keeps the top 3 products per store.

Build derived classifications from aggregates

Section titled “Build derived classifications from aggregates”

Use this pattern when you want a named, reusable category based on an aggregate metric. This is common for definitions like “high value customer” or “at risk account” where you want to classify entities based on the results of an aggregate.

Here’s how it works:

  1. Compute a per-group metric

    Start with one metric, one grouping key, and one condition for contributions.

  2. Assign the metric to a property (optional)

    A property makes the metric reusable in ranking, top/bottom filters, and other definitions.

  3. Define the classification and validate it

    Select the key, the metric, and the classification side-by-side so you can spot mistakes quickly.

This example defines HighVolumeAccount for accounts with more than 2 open tickets:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Account = m.Concept("Account", identify_by={"id": Integer})
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.account = m.Relationship(f"{Ticket} belongs to {Account:account}")
Account.open_ticket_count = m.Property(f"{Account} open ticket count is {Integer:count}")
HighVolumeAccount = m.Concept("HighVolumeAccount", extends=[Account])
12 collapsed lines
m.define(
a1 := Account.new(id=1),
a2 := Account.new(id=2),
t1 := Ticket.new(id=101, status="open"),
t2 := Ticket.new(id=102, status="open"),
t3 := Ticket.new(id=103, status="open"),
t4 := Ticket.new(id=104, status="closed"),
t1.account(a1),
t2.account(a1),
t3.account(a1),
t4.account(a2),
)
# Compute the per-account open ticket count.
open_ticket_count = (
aggregates.count(Ticket)
.per(Account)
.where(Ticket.account == Account, Ticket.status == "open")
)
m.define(Account.open_ticket_count(open_ticket_count))
# Define HighVolumeAccount as accounts with more than 2 open tickets.
m.define(HighVolumeAccount(Account)).where(Account.open_ticket_count > 2)
print(m.select(Account.id, Account.open_ticket_count).to_df())
print(m.select(HighVolumeAccount.id).to_df())
  • The metric is grouped with .per(Account), so it applies at the account level.
  • HighVolumeAccount(Account) defines concept membership for accounts that cross the threshold.
  • Selecting Account.id, open_ticket_count, and the derived concept makes the logic easy to validate.

Most surprising aggregate outputs come from one of a few root causes:

  • Missing .per() (accidental global aggregation)
  • Duplicate-generating joins (more matches than you expected)
  • Scoping the wrong .where() (filtering contributions vs filtering displayed groups)

Before you rewrite your definition logic, validate the shape and inputs first.

Use this checklist when results are too large, too small, or unstable:

  • Confirm the output shape first. Select your grouping keys next to the aggregate value. A missing .per() produces a global result, and extra .per() keys can produce “too many rows”.
  • Inspect the contributing rows. When a count or sum looks wrong, select the raw match pattern first. This is the fastest way to see duplicates, missing links, or unexpected join expansion.
  • Separate contribution scoping from display filtering. Use aggregate-local .where() to define what contributes. Use fragment .where() to choose which groups you show.
  • Account for missing values and zero-match groups. Missing values do not contribute to sum() / avg() / min() / max(). Grouped aggregates omit groups with zero matches. If you need explicit zeros, set a default value using the fallback operator.
  • Make ordering deterministic when ties matter. For rank(), top(), and bottom(), add a stable secondary key (like an ID) when boundary ties matter.

Use this table to map common symptoms to likely causes and fixes:

SymptomLikely causeFix
You get a single row when you expected one row per groupMissing .per()Add .per() and select the group key(s) next to the aggregate value. See Group results with per().
You get “too many rows”The grouping keys are too granularRemove unintended .per() keys and validate the grouping level by selecting only the key(s) you intend to group by.
Counts or sums are larger than expectedDuplicate-generating joins expanded the contributing matchesInspect the raw match pattern and decide whether you need to deduplicate inputs with distinct(). See Deduplicate aggregate inputs with distinct().
Groups disappear after you add a grouping keyThe grouping key is missing for some entitiesValidate the grouping key chain exists and handle missing values intentionally. See Handle missing data.
Counts are smaller than expectedPredicates are scoping away contributionsRe-check whether predicates belong in aggregate-local .where() (contributions) or fragment .where() (display). See Scope aggregate inputs with aggregate-local .where().
top() / bottom() picks surprising rows around the boundaryTies and ambiguous orderingAdd a stable tie-breaker ordering key so selection is deterministic. See Pick top and bottom results.
Your rank column looks right but the DataFrame isn’t orderedrank() does not sort outputSort after materializing with .to_df() if you need a sorted display.