Skip to content

Handle missing data

Missing values in PyRel usually don’t raise errors, but they can change what your logic matches and what your queries return. This guide gives you practical patterns for handling missing values and missing relationships with presence checks, defaults, and debug-first validation so your logic stays predictable as your schema evolves.

Missing values show up in a few repeatable ways. Use this section to quickly identify which scenario you’re in.

The most common scenarios are:

  • Missing property value: An entity exists, but a property fact is absent.
  • Missing chain link: A link in a relationship chain is absent, so expressions that depend on it have no value.
  • Missing value in a filter: A filter condition depends on missing data, so it does not match those entities. This is especially easy to miss in aggregates, where groups with zero matches may not appear.

If a property fact is absent, the property expression has no value for that entity. In query results, this shows up as a NULL value in that column:

from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
m.define(
Ticket.new(id=1, status="open"),
Ticket.new(id=2),
)
df = m.select(Ticket.id, Ticket.status).to_df()
print(df)
  • Ticket id=2 exists but has no status fact, so the status column is missing for that row.
  • Missing property values show up as NULL (or empty) cells in query results.

If a link in a relationship chain is missing, any chained value that depends on it is missing too. For example, if Ticket.assigned_to is missing, then Ticket.assigned_to.email also has no value:

from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Agent = m.Concept("Agent", identify_by={"id": Integer})
Agent.email = m.Property(f"{Agent} has email {String:email}")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.assigned_to = m.Relationship(f"{Ticket} assigned to {Agent:agent}")
m.define(
agent := Agent.new(id=1, email="ava@example.com"),
Ticket.new(id=101, assigned_to=agent),
Ticket.new(id=102),
)
df = m.select(
Ticket.id,
Ticket.assigned_to.email.alias("assignee_email"),
).to_df()
print(df)
  • Ticket id=102 chained value Ticket.assigned_to.email has no value.
  • When a chain link is missing, the whole chained expression is missing.

Where missing values can be especially tricky is in filter conditions, where they can cause entities to silently not match. For instance, aggregates can hide missing data by omitting groups with no matches.

In this example, aggregates.count() filters out groups with zero matches instead of returning 0:

from relationalai.semantics import Integer, Model
from relationalai.semantics.std import aggregates
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Comment = m.Concept("Comment", identify_by={"id": Integer})
Ticket.has_comment = m.Relationship(f"{Ticket} has comment {Comment:comment}")
m.define(
t1 := Ticket.new(id=1),
t2 := Ticket.new(id=2),
c1 := Comment.new(id=1),
t1.has_comment(c1),
)
comment_count = aggregates.count(Comment).per(Ticket).where(Ticket.has_comment(Comment))
q = m.select(Ticket.id, comment_count.alias("comment_count"))
df = q.to_df()
print(df)
  • Ticket id=2 has no comments, but it does not appear with 0 in the final DataFrame.
  • The count is computed only for tickets that have at least one matching relationship row.

Find entities with missing relationships or properties

Section titled “Find entities with missing relationships or properties”

Use Model.not_() to find tickets with missing status:

from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
MissingStatusTicket = m.Concept("MissingStatusTicket", extends=[Ticket])
m.define(
Ticket.new(id=201, status="open"),
Ticket.new(id=202),
Ticket.new(id=203, status="closed"),
)
m.define(MissingStatusTicket(Ticket)).where(m.not_(Ticket.status))
print(m.select(MissingStatusTicket.id).to_df())
  • m.not_(Ticket.status) matches tickets where the status property has no value.
  • In this data, Ticket id=202 is missing status, so it appears in MissingStatusTicket.
  • m.not_(a, b) negates the full conjunction, which is not the same as m.not_(a) & m.not_(b). Group your conditions with parentheses when you mix m.not_() with | or &.

Use defaults with the fallback operator (|)

Section titled “Use defaults with the fallback operator (|)”

Use | to provide a default value when a chain or property value is missing:

from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.priority = m.Property(f"{Ticket} has priority {String:priority}")
m.define(
Ticket.new(id=301, priority="p0"),
Ticket.new(id=302, priority="p2"),
Ticket.new(id=303),
)
priority = Ticket.priority | "unknown"
df = m.select(
Ticket.id,
Ticket.priority.alias("priority_raw"),
priority.alias("priority"),
).to_df()
print(df)
  • Ticket.priority | "unknown" returns the priority when it exists and the default string otherwise.
  • Defaults can make Model.select() output easier to scan while you iterate.
  • Defaults can hide missing data. If you need to track missing as its own state, build a presence flag before you apply defaults.

Sometimes a fact can be missing in one place but present in another. For example, a ticket might be labeled by automation or by a triage agent.

Use Model.union() when you want to include matches from all branches. This helps when each branch is optional and you want to keep every present value (instead of picking a single “best” value). If neither branch matches, the unioned expression has no value.

from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.automation_label = m.Property(f"{Ticket} labeled by automation {String:label}")
Ticket.triage_label = m.Property(f"{Ticket} labeled in triage {String:label}")
m.define(
Ticket.new(id=401, automation_label="overdue"),
Ticket.new(
id=402,
automation_label="vip",
triage_label="needs_escalation",
),
Ticket.new(id=403),
)
# Combine automation and triage labels
labels = m.union(Ticket.triage_label, Ticket.automation_label)
# Use fallback if no label is present
labels = labels | "no label"
df = m.select(Ticket.id, labels.alias("label")).to_df()
print(df)
  • Ticket id=401 is labeled by automation, so it appears in the union.
  • Ticket id=402 is labeled by both automation and triage, so it appears twice in the union.
  • Ticket id=403 has no labels, so the union is empty and the fallback applies.

Choose between fallback (|) and Model.union()

Section titled “Choose between fallback (|) and Model.union()”

You often need either/or logic when data can come from different sources or follow different paths. Choosing the wrong operator can either hide data (fallback) or over-combine data (union).

Use this table to choose the right operator for your intent:

What to useWhen to use it
a | bUse when you want the first available value in a priority order, and you are OK treating the later branches as fallbacks.
Model.union(a, b)Use when you want to combine matches from multiple branches into one output (set union), even when multiple branches match.

Define presence flags as unary properties so you can filter and group without losing track of missingness:

from relationalai.semantics import Integer, Model, String
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.status = m.Property(f"{Ticket} status is {String:status}")
Ticket.has_status = m.Property(f"{Ticket} has status")
m.define(
Ticket.new(id=401, status="open"),
Ticket.new(id=402),
Ticket.new(id=403, status="closed"),
)
m.define(Ticket.has_status).where(Ticket.status)
df = m.select(
Ticket.id,
Ticket.status,
Ticket.has_status.as_bool().alias("has_status"),
).to_df()
print(df)
  • Ticket.has_status is a reusable presence flag you can reference in multiple definitions.
  • In Model.select(), Ticket.has_status.as_bool() turns the unary property into a readable boolean column.
  • Selecting both Ticket.status and has_status makes it clear which rows are missing.

Define a derived concept when you want a named, reusable set of entities you can select, join, and build on. This pattern is useful for categories where the missing case has real-world meaning, like a support ticket that needs triage.

Use this recipe:

  1. Define a presence flag

    Start by defining a presence flag for the property or relationship you want to check. This gives you a reusable expression to check for missingness in your definitions.

  2. Define the missing-case concept

    Then define a derived concept for the missing case using m.not_(presence_flag) as the condition. This gives you a clear, reusable classification for entities that are missing the data.

  3. Optionally, define a present-case concept

    If you need a separate category for entities that have the data and meet a condition, define another derived concept for that case. Keep the present-case logic separate from the missing-case logic.

In the following example, a Ticket.has_response presence flag is defined first, then a NeedsResponse concept is defined for tickets that are missing a first response:

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("SupportModel")
Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.first_response_at = m.Property(f"{Ticket} first responded at {DateTime:first_response_at}")
NeedsResponse = m.Concept("NeedsResponse", extends=[Ticket])
m.define(
Ticket.new(id=501, first_response_at=datetime(2026, 2, 1, 10, tz="UTC")),
Ticket.new(id=502),
)
# Define presence flag for first response
Ticket.has_response = m.Property(f"{Ticket} has first response")
m.define(Ticket.has_response).where(Ticket.first_response_at)
# Define NeedsResponse for tickets missing a first response
m.define(NeedsResponse(Ticket)).where(m.not_(Ticket.has_response))
print(m.select(NeedsResponse.id).to_df())

Missing-data bugs usually look like “nothing matches” or “fewer rows than expected”. Before you rewrite your definition logic, validate the inputs and make missingness visible.

Use this checklist when results are empty or unexpectedly small:

  • Select the raw chain first. If a Model.where() filter returns no rows, run a quick Model.select() that shows the chained value you are filtering on. If the chain is missing, the filter condition is missing too, so it will not match.
  • Keep missingness detectable. Defaults like Ticket.priority | "unknown" can make output easier to read, but they also hide which rows were missing. Define a presence flag (for example, Ticket.has_priority) before you apply a default.
  • Group negation on purpose. m.not_(a, b) negates the full conjunction. That is not the same as m.not_(a) & m.not_(b). Add parentheses when you mix m.not_() with | and &.
  • Choose fallback vs union intentionally. a | b means “first available value”. Model.union(a, b) means “combine matches from both branches”.
  • Validate incrementally. Select intermediate expressions (raw value, presence flag, defaulted value, final boolean condition) with .to_df() before you finalize the definition.

Use this table to map common symptoms to likely causes and fixes:

SymptomLikely causeFix
A Model.where() filter matches zero rowsThe chained value you filter on is missing for the entities you expectSelect the raw chain (Model.select(...)) and confirm it exists before you add the filter.
A definition “drops” entities after you add a comparisonMissing inputs are treated as missing, not false, so the comparison does not matchAdd a presence flag first, and branch your logic into “missing” vs “present and meets condition”.
After you add a default with |, you can no longer tell what was missingThe default replaces missing values in outputs and downstream logicKeep a separate presence flag column/property and select it alongside the defaulted value.
A negated condition matches too many or too few entitiesNegation is grouped differently than you intendedRe-check m.not_(a, b) vs separate negations, and add parentheses when combining with & and |.
Model.union() returns more rows than you expectedMultiple branches match and you are combining them (set union)If you want priority order, use fallback (|) instead of Model.union().