Design a model
A good semantic model starts with a small, explicit design. Use this guide to scope a small model you can implement and validate without rework.
- You have a target business use case with at least one decision or analysis outcome.
- You know the rough shape of your source data.
Step 1: Define the model’s scope
Section titled “Step 1: Define the model’s scope”Start by defining a clear scope before you pick concepts, relationships, or properties. Scope is the set of questions your model must answer and the workflow it supports. A tight scope keeps the first version small enough to implement and validate.
For example, if your goal is to identify delayed orders and explain the likely cause, include orders, shipments, and the timestamps needed to detect delay and categorize causes. Explicitly exclude related work that is not required for those questions, like inventory planning, returns/refunds, customer support tickets, and carrier contract optimization.
Start with 1–3 questions like:
- Which orders are delayed beyond their promised ship date?
- Which delayed orders are missing a shipment record?
- Which delayed orders were handed off late to the carrier?
Make scope concrete with a simple goal/scope table:
| Goal | In scope | Out of scope |
|---|---|---|
| Identify orders delayed beyond their promised ship date. | Orders and the timestamps needed to compare promised vs actual ship timing. | Shipments, carrier handoff timing, returns/refunds. |
| Explain delays caused by missing shipment records. | Orders, shipments, and the link between them. | Carrier handoff timing, tracking scan events, inventory planning. |
| Explain delays caused by late handoff to the carrier. | Orders, shipments, and the timestamps needed to detect late handoff. | Tracking scan events, delivery status, contract optimization. |
Step 2: Determine concepts and relationships
Section titled “Step 2: Determine concepts and relationships”Once you have a clear scope, you can determine the core concepts and relationships you need to model:
- Concepts are the key entities in your domain (for example,
Order,Shipment,Customer). - Relationships are the domain verbs that connect concepts (for example,
Orderplaced_byCustomer,Orderhas_shipmentShipment).
Start by listing the concepts you need. For the running example, that includes:
Customer: Places orders.Order: The unit you measure delay on.Shipment: The event (or record) that indicates an order was shipped.Product: The item being sold.OrderItem: The line-level link between an order and the products it contains.
Then sketch the relationships between them:
Step 3: Choose concept identifiers
Section titled “Step 3: Choose concept identifiers”Identifiers determine when two records represent the same entity in your model. Choose them before you implement concepts or start joining sources. If an identifier is unstable, you’ll create duplicate entities and brittle joins.
Use a concept-to-identifier table to force a decision for every concept:
| Concept | Identifier strategy | Notes |
|---|---|---|
Customer | customer_id | Use the customer ID from your system of record. Avoid emails—they can change. |
Order | order_id | Use an order ID that stays the same across edits and reimports. Avoid row numbers. |
Product | sku (or product_id) | Use SKU if it’s stable and globally unique. Otherwise use a product master ID. |
Shipment | shipment_id | Carrier IDs can collide. If you have multiple carriers, use (carrier, shipment_id). |
OrderItem | (order_id, line_number) | Line items often have no ID. (order_id, line_number) is usually enough. |
Step 4: Choose properties
Section titled “Step 4: Choose properties”Properties are the attributes you use to answer your in-scope questions and implement your first rules. Choose them after you have picked concepts, relationships, and identifiers. Start with the minimum set you need for initial queries.
Capturing every source column increases complexity with little value. If a field is only used to connect two concepts, model it as a relationship.
A simple must-have vs later table helps keep scope small:
| Concept | Must-have | Why it matters (ties to questions) | Later (out of scope) |
|---|---|---|---|
Customer | customer_id | Needed to group and debug delayed orders by customer. | Customer lifetime value, segmentation. |
Order | order_id, placed_at, promised_ship_date, customer_id | promised_ship_date is required to detect delayed orders. customer_id is required to relate orders to customers. You usually model that link as Order placed_by Customer. | Discounts, coupons, marketing attribution. |
OrderItem | order_id, line_number, sku | Needed to relate an order to the products it contains. This supports explanations like “delays are concentrated in a specific SKU”. | Unit price, tax, promotions, per-item fulfillment events. |
Product | sku | Needed to identify and group items by product. Keep product attributes out of scope until they answer a specific question. | Full catalog attributes, merchandising metadata. |
Shipment | shipment_id, order_id, shipped_at, carrier, tracking_number | Needed to detect missing shipments and late handoff to the carrier. order_id is required to relate a shipment back to an order. | Full scan events, delivery signature. |
If your must-have list exceeds ~10–15 items, it is likely over-scoped. Pick the single question with highest value and trim.
Step 5: Map the model to source data
Section titled “Step 5: Map the model to source data”Source mapping connects your model design to real tables, files, or in-memory objects.
Do this before you start writing Model.define workflows.
Keep it minimal: map only what you need to answer your first questions and support your identifier strategy.
Follow these steps to create a source mapping:
-
List the elements you will implement
Write down the concepts and relationships your first model version needs. Keep the list aligned to your in-scope questions.
-
Choose a concrete source and key fields for each element
For each element, name the table/file/object you will pull from. Record the identifier fields and join keys you will rely on.
-
Record the minimum transformations and assumptions
Note the few rules you must apply to make the data usable. For example: normalization, type casts, deduping, and late-arriving data.
-
Sanity-check one happy-path example
Pick one known-good row per concept and confirm it produces exactly one entity with the identifier strategy you chose.
A mapping matrix makes the plan concrete. Here are some example rows in a mapping matrix:
| Model element | Source (type) | Source field → model mapping | Notes |
|---|---|---|---|
Customer | customers (table) | customers.customer_id column maps to Customer identifier | |
Order | orders (table) | orders.order_id column maps to Order identifierorders.prom_date maps to Order.promised_ship_date | orders.prom_date` is a string, so it must be converted to a date |
Order placed_by Customer | orders (table) | orders.customer_id column maps to “Order placed_by Customer” relationship | Some orders do not have a customer_id so we need to handle NULLs |
OrderItem | order_items (table) | order_items.(order_id, line_number) columns map to OrderItem identifierorder_items.sku column maps to “OrderItem for_product Product” relationships | |
Shipment shipped_for Order | shipments (table) | shipments.order_id column maps to “Shipment shipped_for Order” relationship | Some orders have multiple shipments because they were shipped in parts as products became available |
Once every in-scope concept and relationship has at least one mapped source, you are ready to start implementation.
Start by creating a Model instance in Python, declaring concepts, relationship and properties, and data sources, then move on to defining base facts and derived logic.
Model design worksheet
Section titled “Model design worksheet”Use the following worksheet template to help you work through the design steps in a structured way and capture your decisions in one place:
# Model design worksheet
## Scope
Write 1–3 questions your model must answer:
- One sentence question- One sentence question- One sentence question
| Goal | In scope | Out of scope || --- | --- | --- || One sentence outcome | Workflow + elements included | Explicit non-goals |
## Concepts and relationships
- Concepts: `ConceptA`, `ConceptB`, `ConceptC`- Relationships (domain verbs): `ConceptA` *verb* `ConceptB`, `ConceptB` *verb* `ConceptC`
Draft a relationship diagram (keep it small).
## Identifiers
| Concept | Identifier strategy | Stable or temporary? | Notes || --- | --- | --- | --- || `Concept` | `field` (or `field1 + field2`) | `stable` or `temporary` | Why this key works, or replacement plan |
## Properties
| Concept | Must-have | Later (out of scope) | Notes || --- | --- | --- | --- || `Concept` | `prop1`, `prop2` | `prop3`, `prop4` | Tie must-haves to key question(s) |
## Source mapping
| Semantic element | Source | Key fields | Notes || --- | --- | --- | --- || `Concept or relationship` | `table/file/object` | `join keys / id fields` | Transforms, gaps, and workarounds |