Handle missing data
Missing values in PyRel usually don’t raise errors, but they can change what your logic matches and what your queries return. This guide gives you practical patterns for handling missing values and missing relationships with presence checks, defaults, and debug-first validation so your logic stays predictable as your schema evolves.
- PyRel is installed. See Set Up Your Environment for instructions.
- You are comfortable deriving facts with
Model.define()and filtering withModel.where(). See Derive facts with logic.
Understand how missing values behave
Section titled “Understand how missing values behave”Missing values show up in a few repeatable ways. Use this section to quickly identify which scenario you’re in.
The most common scenarios are:
- Missing property value: An entity exists, but a property fact is absent.
- Missing chain link: A link in a relationship chain is absent, so expressions that depend on it have no value.
- Missing value in a filter: A filter condition depends on missing data, so it does not match those entities. This is especially easy to miss in aggregates, where groups with zero matches may not appear.
Missing property value
Section titled “Missing property value”If a property fact is absent, the property expression has no value for that entity.
In query results, this shows up as a NULL value in that column:
from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Ticket.status = m.Property(f"{Ticket} status is {String:status}")
m.define( Ticket.new(id=1, status="open"), Ticket.new(id=2),)
df = m.select(Ticket.id, Ticket.status).to_df()print(df)Ticket id=2exists but has nostatusfact, so thestatuscolumn is missing for that row.- Missing property values show up as
NULL(or empty) cells in query results.
Missing chain link
Section titled “Missing chain link”If a link in a relationship chain is missing, any chained value that depends on it is missing too.
For example, if Ticket.assigned_to is missing, then Ticket.assigned_to.email also has no value:
from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Agent = m.Concept("Agent", identify_by={"id": Integer})Agent.email = m.Property(f"{Agent} has email {String:email}")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Ticket.assigned_to = m.Relationship(f"{Ticket} assigned to {Agent:agent}")
m.define( agent := Agent.new(id=1, email="ava@example.com"), Ticket.new(id=101, assigned_to=agent), Ticket.new(id=102),)
df = m.select( Ticket.id, Ticket.assigned_to.email.alias("assignee_email"),).to_df()print(df)Ticket id=102chained valueTicket.assigned_to.emailhas no value.- When a chain link is missing, the whole chained expression is missing.
Missing value in a filter
Section titled “Missing value in a filter”Where missing values can be especially tricky is in filter conditions, where they can cause entities to silently not match. For instance, aggregates can hide missing data by omitting groups with no matches.
In this example, aggregates.count() filters out groups with zero matches instead of returning 0:
from relationalai.semantics import Integer, Modelfrom relationalai.semantics.std import aggregates
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Comment = m.Concept("Comment", identify_by={"id": Integer})Ticket.has_comment = m.Relationship(f"{Ticket} has comment {Comment:comment}")
m.define( t1 := Ticket.new(id=1), t2 := Ticket.new(id=2), c1 := Comment.new(id=1), t1.has_comment(c1),)
comment_count = aggregates.count(Comment).per(Ticket).where(Ticket.has_comment(Comment))
q = m.select(Ticket.id, comment_count.alias("comment_count"))df = q.to_df()print(df)Ticket id=2has no comments, but it does not appear with0in the final DataFrame.- The count is computed only for tickets that have at least one matching relationship row.
Find entities with missing relationships or properties
Section titled “Find entities with missing relationships or properties”Use Model.not_() to find tickets with missing status:
from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Ticket.status = m.Property(f"{Ticket} status is {String:status}")
MissingStatusTicket = m.Concept("MissingStatusTicket", extends=[Ticket])
m.define( Ticket.new(id=201, status="open"), Ticket.new(id=202), Ticket.new(id=203, status="closed"),)
m.define(MissingStatusTicket(Ticket)).where(m.not_(Ticket.status))
print(m.select(MissingStatusTicket.id).to_df())m.not_(Ticket.status)matches tickets where thestatusproperty has no value.- In this data,
Ticket id=202is missingstatus, so it appears inMissingStatusTicket.
m.not_(a, b)negates the full conjunction, which is not the same asm.not_(a) & m.not_(b). Group your conditions with parentheses when you mixm.not_()with|or&.
Use defaults with the fallback operator (|)
Section titled “Use defaults with the fallback operator (|)”Use | to provide a default value when a chain or property value is missing:
from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Ticket.priority = m.Property(f"{Ticket} has priority {String:priority}")
m.define( Ticket.new(id=301, priority="p0"), Ticket.new(id=302, priority="p2"), Ticket.new(id=303),)
priority = Ticket.priority | "unknown"
df = m.select( Ticket.id, Ticket.priority.alias("priority_raw"), priority.alias("priority"),).to_df()print(df)Ticket.priority | "unknown"returns the priority when it exists and the default string otherwise.- Defaults can make
Model.select()output easier to scan while you iterate.
- Defaults can hide missing data. If you need to track missing as its own state, build a presence flag before you apply defaults.
Combine matches from multiple branches
Section titled “Combine matches from multiple branches”Sometimes a fact can be missing in one place but present in another. For example, a ticket might be labeled by automation or by a triage agent.
Use Model.union() when you want to include matches from all branches.
This helps when each branch is optional and you want to keep every present value (instead of picking a single “best” value).
If neither branch matches, the unioned expression has no value.
from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Ticket.automation_label = m.Property(f"{Ticket} labeled by automation {String:label}")Ticket.triage_label = m.Property(f"{Ticket} labeled in triage {String:label}")
m.define( Ticket.new(id=401, automation_label="overdue"), Ticket.new( id=402, automation_label="vip", triage_label="needs_escalation", ), Ticket.new(id=403),)
# Combine automation and triage labelslabels = m.union(Ticket.triage_label, Ticket.automation_label)# Use fallback if no label is presentlabels = labels | "no label"
df = m.select(Ticket.id, labels.alias("label")).to_df()print(df)Ticket id=401is labeled by automation, so it appears in the union.Ticket id=402is labeled by both automation and triage, so it appears twice in the union.Ticket id=403has no labels, so the union is empty and the fallback applies.
Choose between fallback (|) and Model.union()
Section titled “Choose between fallback (|) and Model.union()”You often need either/or logic when data can come from different sources or follow different paths. Choosing the wrong operator can either hide data (fallback) or over-combine data (union).
Use this table to choose the right operator for your intent:
| What to use | When to use it |
|---|---|
a | b | Use when you want the first available value in a priority order, and you are OK treating the later branches as fallbacks. |
Model.union(a, b) | Use when you want to combine matches from multiple branches into one output (set union), even when multiple branches match. |
Use explicit presence flags
Section titled “Use explicit presence flags”Define presence flags as unary properties so you can filter and group without losing track of missingness:
from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.has_status = m.Property(f"{Ticket} has status")
m.define( Ticket.new(id=401, status="open"), Ticket.new(id=402), Ticket.new(id=403, status="closed"),)
m.define(Ticket.has_status).where(Ticket.status)
df = m.select( Ticket.id, Ticket.status, Ticket.has_status.as_bool().alias("has_status"),).to_df()print(df)Ticket.has_statusis a reusable presence flag you can reference in multiple definitions.- In
Model.select(),Ticket.has_status.as_bool()turns the unary property into a readable boolean column. - Selecting both
Ticket.statusandhas_statusmakes it clear which rows are missing.
Define derived concepts for missing data
Section titled “Define derived concepts for missing data”Define a derived concept when you want a named, reusable set of entities you can select, join, and build on. This pattern is useful for categories where the missing case has real-world meaning, like a support ticket that needs triage.
Use this recipe:
-
Define a presence flag
Start by defining a presence flag for the property or relationship you want to check. This gives you a reusable expression to check for missingness in your definitions.
-
Define the missing-case concept
Then define a derived concept for the missing case using
m.not_(presence_flag)as the condition. This gives you a clear, reusable classification for entities that are missing the data. -
Optionally, define a present-case concept
If you need a separate category for entities that have the data and meet a condition, define another derived concept for that case. Keep the present-case logic separate from the missing-case logic.
In the following example, a Ticket.has_response presence flag is defined first, then a NeedsResponse concept is defined for tickets that are missing a first response:
from relationalai.semantics import DateTime, Integer, Modelfrom relationalai.semantics.std.datetime import datetime
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})Ticket.first_response_at = m.Property(f"{Ticket} first responded at {DateTime:first_response_at}")
NeedsResponse = m.Concept("NeedsResponse", extends=[Ticket])
m.define( Ticket.new(id=501, first_response_at=datetime(2026, 2, 1, 10, tz="UTC")), Ticket.new(id=502),)
# Define presence flag for first responseTicket.has_response = m.Property(f"{Ticket} has first response")m.define(Ticket.has_response).where(Ticket.first_response_at)
# Define NeedsResponse for tickets missing a first responsem.define(NeedsResponse(Ticket)).where(m.not_(Ticket.has_response))
print(m.select(NeedsResponse.id).to_df())Avoid common pitfalls
Section titled “Avoid common pitfalls”Missing-data bugs usually look like “nothing matches” or “fewer rows than expected”. Before you rewrite your definition logic, validate the inputs and make missingness visible.
Use this checklist when results are empty or unexpectedly small:
- Select the raw chain first.
If a
Model.where()filter returns no rows, run a quickModel.select()that shows the chained value you are filtering on. If the chain is missing, the filter condition is missing too, so it will not match. - Keep missingness detectable.
Defaults like
Ticket.priority | "unknown"can make output easier to read, but they also hide which rows were missing. Define a presence flag (for example,Ticket.has_priority) before you apply a default. - Group negation on purpose.
m.not_(a, b)negates the full conjunction. That is not the same asm.not_(a) & m.not_(b). Add parentheses when you mixm.not_()with|and&. - Choose fallback vs union intentionally.
a | bmeans “first available value”.Model.union(a, b)means “combine matches from both branches”. - Validate incrementally.
Select intermediate expressions (raw value, presence flag, defaulted value, final boolean condition) with
.to_df()before you finalize the definition.
Use this table to map common symptoms to likely causes and fixes:
| Symptom | Likely cause | Fix |
|---|---|---|
A Model.where() filter matches zero rows | The chained value you filter on is missing for the entities you expect | Select the raw chain (Model.select(...)) and confirm it exists before you add the filter. |
| A definition “drops” entities after you add a comparison | Missing inputs are treated as missing, not false, so the comparison does not match | Add a presence flag first, and branch your logic into “missing” vs “present and meets condition”. |
After you add a default with |, you can no longer tell what was missing | The default replaces missing values in outputs and downstream logic | Keep a separate presence flag column/property and select it alongside the defaulted value. |
| A negated condition matches too many or too few entities | Negation is grouped differently than you intended | Re-check m.not_(a, b) vs separate negations, and add parentheses when combining with & and |. |
Model.union() returns more rows than you expected | Multiple branches match and you are combining them (set union) | If you want priority order, use fallback (|) instead of Model.union(). |