Declare data sources
Data sources are the inputs to your semantic model. They include Snowflake tables, CSV files, in-memory DataFrames, and Python literals. This guide shows you how to declare those sources in your model. See Define Base Facts for the next step: turning source data into entities and relationships.
- PyRel is installed and importable in Python. See Set Up Your Environment for instructions.
- You have a
Modelinstance and at least one declared concept and relationship. See Create a Model Instance, Declare Concepts, and Declare Relationships and Properties.
Choose the right source type
Section titled “Choose the right source type”Your source type determines how you reference fields (columns) in semantic declarations and queries. Most projects start with in-memory sources during prototyping and move to Snowflake tables when the data is shared and stable. Most inputs fall into two buckets:
- Table-like sources.
These are row-and-column datasets.
This includes Snowflake tables/views and in-memory tabular data like DataFrames.
It also includes CSV data you load into pandas.
Use
Model.Tablefor Snowflake. UseModel.datafor DataFrames and small inline datasets. - Model constants.
These are fixed values you write in Python.
Use Python literals for one-off constants.
Use
Model.Enumfor a small, fixed set of named constants that are queryable in the model.
This table helps you choose the right source type for your project:
| I have a… | I can use… |
|---|---|
| Snowflake table or view | Model.Table to reference it by path. See Use a Snowflake table with Model.Table |
| pandas DataFrame in Python | Model.data to treat it as a temporary table in queries and definitions. See Use a DataFrame with Model.data |
| CSV file on disk | Model.data for local iteration; load it into Snowflake and switch to Model.Table when it’s shared or large. See Use CSV data with Model.data |
| A few explicit rows for examples or tests | Model.data to keep the example self-contained. See Use inline Python data with Model.data |
| A small set of model-specific named constants | Model.Enum for named values you can store and query; use Python literals for one-off constants. See Create model constants with Model.Enum |
Use a Snowflake table with Model.Table
Section titled “Use a Snowflake table with Model.Table”Model.Table gives you a table-like object backed by a Snowflake table or view in the semantics DSL.
Choose this when your data already lives in Snowflake and you want a stable, shareable source mapping.
Discover schema lazily
Section titled “Discover schema lazily”Choose this when you want to start querying a Snowflake table without writing a column list up front. PyRel fetches the column names and types the first time you access columns in a query or definition. This makes quick exploration faster, but the first column access can fail later if the table path, permissions, or schema are not what you expect.
-
Declare the table
Create a
Modeland declare a table reference withModel.Table:from relationalai.semantics import Modelm = Model("MyModel")t = m.Table("DB.SCHEMA.CUSTOMERS") -
Verify by selecting all columns
Select all columns with
Model.select. The first*taccess triggers lazy schema discovery:m.select(*t).to_df()
- The value returned by
Model.Tableis aTableobject. It behaves like a table with columns you can reference in queries and definitions. - Prefer bracket access (
t["COL"]) when a column name isn’t a valid Python identifier or contains spaces. - When you omit
schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or schema are not what you expect, you can get an error later when you run a query that references the table. - Turning table rows into entities and relationships is a separate step.
Declare schema explicitly
Section titled “Declare schema explicitly”Choose this when the table schema is stable and you want PyRel to know the column names and types immediately.
Providing schema={...} to Model.Table makes the columns available right away and helps you catch missing or misnamed columns earlier.
The tradeoff is that you need to keep the schema mapping in sync if the underlying Snowflake table changes.
-
Declare the table with a schema
Pass
schema={...}toModel.Table:from relationalai.semantics import Integer, Model, Stringm = Model("MyModel")t = m.Table("DB.SCHEMA.CUSTOMERS",schema={"CUSTOMER_ID": Integer,"NAME": String,},) -
Verify by selecting all columns
Verify access by selecting all columns with
Model.select:m.select(*t).to_df()
- The value returned by
Model.Tableis aTableobject. It behaves like a table with columns you can reference in queries and definitions. - Prefer bracket access (
t["COL"]) when a column name isn’t a valid Python identifier or contains spaces. - When you provide
schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or schema are not what you expect, you can get an error as soon as you create the table object. - Turning table rows into entities and relationships is a separate step.
Use CSV data with Model.data
Section titled “Use CSV data with Model.data”CSV files are a convenient starting point for local iteration.
PyRel treats CSV data as in-memory tabular data after you load it in Python and pass it to Model.data.
Choose the variant that matches how you prefer to parse CSVs.
Load a CSV with pandas.read_csv
Section titled “Load a CSV with pandas.read_csv”Choose this when you already use pandas for cleanup and type normalization.
This variant reads a CSV file into a DataFrame and then wraps it with Model.data.
-
Create a sample CSV file
Create a file named
sample.csvin your working directory (or anywhere you can reference by path) with the following contents:customer_id,name1,Alice2,Bob -
Load the CSV file with
Model.dataRead the file with
pandas.read_csv, then callModel.data:from pathlib import Pathimport pandas as pdfrom relationalai.semantics import Modelm = Model("MyModel")csv_path = Path("sample.csv")# If you created the file elsewhere, update the path.# Example: csv_path = Path("/absolute/path/to/sample.csv")df = pd.read_csv(csv_path, encoding="utf-8")d = m.data(df) -
Verify by selecting all columns
Select all columns with
Model.selectand expand the columns with*d:m.select(*d).to_df()
pandas.read_csvinfers dtypes. If a column should stay a string, passdtype=topandas.read_csvor normalize types before you callModel.data. For example, treat an ID with leading zeros as a string.- If you see unexpected column names, fix them in pandas before you reference them in definitions. For example, trim leading and trailing whitespace.
- Mapping CSV-backed columns into entities and relationships is a separate step.
Load a CSV with csv.DictReader
Section titled “Load a CSV with csv.DictReader”Choose this when you want to avoid pandas and keep dependencies minimal.
This variant parses CSV text into a list of dictionaries and passes it to Model.data.
-
Create a sample CSV file
Create a file named
sample.csvin your working directory (or anywhere you can reference by path) with the following contents:customer_id,name1,Alice2,Bob -
Load the CSV file with
Model.dataParse the file with
csv.DictReader, then callModel.data:import csvfrom pathlib import Pathfrom relationalai.semantics import Modelm = Model("MyModel")csv_path = Path("sample.csv")# If you created the file elsewhere, update the path.# Example: csv_path = Path("/absolute/path/to/sample.csv")with csv_path.open("r", encoding="utf-8", newline="") as f:rows = list(csv.DictReader(f))d = m.data(rows) -
Verify by selecting all columns
Select all columns with
Model.selectand expand the columns with*d:m.select(*d).to_df()
csv.DictReaderreturns strings for all values. If you need numeric types, convert values in Python before you callModel.data.- Always open the file with
newline=""(as shown) so thecsvmodule handles newlines consistently across platforms. Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions.- Mapping CSV-backed columns into entities and relationships is a separate step.
Use a DataFrame with Model.data
Section titled “Use a DataFrame with Model.data”A DataFrame source lets you reuse transformed in-memory tabular data as an input to model definitions. Choose this when you already have a pandas DataFrame from preprocessing, feature engineering, or notebook exploration.
-
Wrap a DataFrame with
Model.dataStart from a DataFrame with stable column names, then call
Model.data:import pandas as pdfrom relationalai.semantics import Modelm = Model("MyModel")df = pd.DataFrame([{"customer_id": 1, "name": "Alice"},{"customer_id": 2, "name": "Bob"},])d = m.data(df) -
Verify by selecting columns
Select a couple of columns with
Model.selectto confirm the mapping is what you expect:m.select(d.customer_id, d.name).to_df()
Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions.- You can use either dot access (
d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier. You can also usem.select(*d)to select all columns without referencing them by name. - If results look surprising, check
df.dtypesand normalize critical columns before you callModel.data. - Mapping DataFrame-backed columns into entities and relationships is a separate step.
Use inline Python data with Model.data
Section titled “Use inline Python data with Model.data”Inline data is the fastest way to seed small, explicit rows for examples and tests. Choose this when you want the smallest possible repro without relying on external files or a database. Keep inline datasets small and schema-like so they don’t drift from your production sources.
Provide rows as a list of dictionaries
Section titled “Provide rows as a list of dictionaries”Choose this variant when you want column names to come directly from your Python keys.
-
Create a
Datasource from rowsCall
Model.datawith a list of dictionaries:from relationalai.semantics import Modelm = Model("MyModel")d = m.data([{"name": "Alice", "age": 10},{"name": "Bob", "age": 30},]) -
Preview the columns
Query the columns with
Model.select:m.select(d.name, d.age).to_df()
Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions.- You can use either dot access (
d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier. You can also usem.select(*d)to select all columns without referencing them by name. - If you have exactly one active model, you can also use the top-level
datahelper as a convenience wrapper aroundModel.data. - Mapping inline data columns into entities and relationships is a separate step.
Provide rows as a list of tuples
Section titled “Provide rows as a list of tuples”Choose this variant when your data is naturally row-oriented and you want to provide the column names explicitly.
-
Create a
Datasource and set column namesPass
columns=[...]so your column names are stable and readable in later declarations:from relationalai.semantics import Modelm = Model("MyModel")d = m.data([(0, 72.5), (1, 71.9)],columns=["minute", "temperature"],) -
Preview the columns
Preview the columns with
Model.select:m.select(d.minute, d.temperature).to_df()
Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions. You can use either dot access (d.minute) or bracket access (d["minute"]) to reference columns.- If you omit
columnsfor tuple rows, you can access columns by 0-based integer index, such asd[0]andd[1]. They are also exposed with the default namescol0,col1,col2, … so you can writed.col0ord["col0"]if you prefer. - Mapping inline data columns into entities and relationships is a separate step.
Create model constants with Model.Enum
Section titled “Create model constants with Model.Enum”Model.Enum creates a small, fixed set of named constants inside your model.
Choose this when you want values that behave like model entities (so you can store them, join on them, and query them) rather than one-off Python literals.
Enum members are defined lazily the first time you reference them in a query or definition.
-
Declare an enum type
Define an enum by subclassing
Model.Enum:from relationalai.semantics import Modelm = Model("MyModel")class Status(m.Enum):ACTIVE = "ACTIVE"INACTIVE = "INACTIVE" -
Verify by selecting an enum member
Reference an enum member in a query with
Model.select:m.select(Status.ACTIVE).to_df()
- If you only need a one-off constant, prefer a Python literal.
- You can use enum members in queries and definitions just like other concepts and relationships. They are stored in the model and can be joined on, returned in results, and used in logic.