Skip to content

Design a model

A good semantic model starts with a small, explicit design. Use this guide to scope a small model you can implement and validate without rework.

  • You have a target business use case with at least one decision or analysis outcome.
  • You know the rough shape of your source data.

Start by defining a clear scope before you pick concepts, relationships, or properties. Scope is the set of questions your model must answer and the workflow it supports. A tight scope keeps the first version small enough to implement and validate.

For example, if your goal is to identify delayed orders and explain the likely cause, include orders, shipments, and the timestamps needed to detect delay and categorize causes. Explicitly exclude related work that is not required for those questions, like inventory planning, returns/refunds, customer support tickets, and carrier contract optimization.

Start with 1–3 questions like:

  • Which orders are delayed beyond their promised ship date?
  • Which delayed orders are missing a shipment record?
  • Which delayed orders were handed off late to the carrier?

Make scope concrete with a simple goal/scope table:

GoalIn scopeOut of scope
Identify orders delayed beyond their promised ship date.Orders and the timestamps needed to compare promised vs actual ship timing.Shipments, carrier handoff timing, returns/refunds.
Explain delays caused by missing shipment records.Orders, shipments, and the link between them.Carrier handoff timing, tracking scan events, inventory planning.
Explain delays caused by late handoff to the carrier.Orders, shipments, and the timestamps needed to detect late handoff.Tracking scan events, delivery status, contract optimization.

Step 2: Determine concepts and relationships

Section titled “Step 2: Determine concepts and relationships”

Once you have a clear scope, you can determine the core concepts and relationships you need to model:

  • Concepts are the key entities in your domain (for example, Order, Shipment, Customer).
  • Relationships are the domain verbs that connect concepts (for example, Order placed_by Customer, Order has_shipment Shipment).

Start by listing the concepts you need. For the running example, that includes:

  • Customer: Places orders.
  • Order: The unit you measure delay on.
  • Shipment: The event (or record) that indicates an order was shipped.
  • Product: The item being sold.
  • OrderItem: The line-level link between an order and the products it contains.

Then sketch the relationships between them:

CustomerOrderOrderItemProductShipment placed_byhas_itemfor_productshipped_for

Identifiers determine when two records represent the same entity in your model. Choose them before you implement concepts or start joining sources. If an identifier is unstable, you’ll create duplicate entities and brittle joins.

Use a concept-to-identifier table to force a decision for every concept:

ConceptIdentifier strategyNotes
Customercustomer_idUse the customer ID from your system of record. Avoid emails—they can change.
Orderorder_idUse an order ID that stays the same across edits and reimports. Avoid row numbers.
Productsku (or product_id)Use SKU if it’s stable and globally unique. Otherwise use a product master ID.
Shipmentshipment_idCarrier IDs can collide. If you have multiple carriers, use (carrier, shipment_id).
OrderItem(order_id, line_number)Line items often have no ID. (order_id, line_number) is usually enough.

Properties are the attributes you use to answer your in-scope questions and implement your first rules. Choose them after you have picked concepts, relationships, and identifiers. Start with the minimum set you need for initial queries.

Capturing every source column increases complexity with little value. If a field is only used to connect two concepts, model it as a relationship.

A simple must-have vs later table helps keep scope small:

ConceptMust-haveWhy it matters (ties to questions)Later (out of scope)
Customercustomer_idNeeded to group and debug delayed orders by customer.Customer lifetime value, segmentation.
Orderorder_id, placed_at, promised_ship_date, customer_idpromised_ship_date is required to detect delayed orders. customer_id is required to relate orders to customers. You usually model that link as Order placed_by Customer.Discounts, coupons, marketing attribution.
OrderItemorder_id, line_number, skuNeeded to relate an order to the products it contains. This supports explanations like “delays are concentrated in a specific SKU”.Unit price, tax, promotions, per-item fulfillment events.
ProductskuNeeded to identify and group items by product. Keep product attributes out of scope until they answer a specific question.Full catalog attributes, merchandising metadata.
Shipmentshipment_id, order_id, shipped_at, carrier, tracking_numberNeeded to detect missing shipments and late handoff to the carrier. order_id is required to relate a shipment back to an order.Full scan events, delivery signature.

If your must-have list exceeds ~10–15 items, it is likely over-scoped. Pick the single question with highest value and trim.

Source mapping connects your model design to real tables, files, or in-memory objects. Do this before you start writing Model.define workflows. Keep it minimal: map only what you need to answer your first questions and support your identifier strategy.

Follow these steps to create a source mapping:

  1. List the elements you will implement

    Write down the concepts and relationships your first model version needs. Keep the list aligned to your in-scope questions.

  2. Choose a concrete source and key fields for each element

    For each element, name the table/file/object you will pull from. Record the identifier fields and join keys you will rely on.

  3. Record the minimum transformations and assumptions

    Note the few rules you must apply to make the data usable. For example: normalization, type casts, deduping, and late-arriving data.

  4. Sanity-check one happy-path example

    Pick one known-good row per concept and confirm it produces exactly one entity with the identifier strategy you chose.

A mapping matrix makes the plan concrete. Here are some example rows in a mapping matrix:

Model elementSource (type)Source field → model mappingNotes
Customercustomers (table)customers.customer_id column maps to Customer identifier
Orderorders (table)orders.order_id column maps to Order identifier
orders.prom_date maps to Order.promised_ship_date
orders.prom_date` is a string, so it must be converted to a date
Order placed_by Customerorders (table)orders.customer_id column maps to “Order placed_by Customer” relationshipSome orders do not have a customer_id so we need to handle NULLs
OrderItemorder_items (table)order_items.(order_id, line_number) columns map to OrderItem identifier
order_items.sku column maps to “OrderItem for_product Product” relationships
Shipment shipped_for Ordershipments (table)shipments.order_id column maps to “Shipment shipped_for Order” relationshipSome orders have multiple shipments because they were shipped in parts as products became available

Once every in-scope concept and relationship has at least one mapped source, you are ready to start implementation. Start by creating a Model instance in Python, declaring concepts, relationship and properties, and data sources, then move on to defining base facts and derived logic.

Use the following worksheet template to help you work through the design steps in a structured way and capture your decisions in one place:

# Model design worksheet
## Scope
Write 1–3 questions your model must answer:
- One sentence question
- One sentence question
- One sentence question
| Goal | In scope | Out of scope |
| --- | --- | --- |
| One sentence outcome | Workflow + elements included | Explicit non-goals |
## Concepts and relationships
- Concepts: `ConceptA`, `ConceptB`, `ConceptC`
- Relationships (domain verbs): `ConceptA` *verb* `ConceptB`, `ConceptB` *verb* `ConceptC`
Draft a relationship diagram (keep it small).
## Identifiers
| Concept | Identifier strategy | Stable or temporary? | Notes |
| --- | --- | --- | --- |
| `Concept` | `field` (or `field1 + field2`) | `stable` or `temporary` | Why this key works, or replacement plan |
## Properties
| Concept | Must-have | Later (out of scope) | Notes |
| --- | --- | --- | --- |
| `Concept` | `prop1`, `prop2` | `prop3`, `prop4` | Tie must-haves to key question(s) |
## Source mapping
| Semantic element | Source | Key fields | Notes |
| --- | --- | --- | --- |
| `Concept or relationship` | `table/file/object` | `join keys / id fields` | Transforms, gaps, and workarounds |