Skip to content

Declare data sources

Data sources are the inputs to your semantic model. They include Snowflake tables, CSV files, in-memory DataFrames, and Python literals. This guide shows you how to declare those sources in your model. See Define Base Facts for the next step: turning source data into entities and relationships.

Your source type determines how you reference fields (columns) in semantic declarations and queries. Most projects start with in-memory sources during prototyping and move to Snowflake tables when the data is shared and stable. Most inputs fall into two buckets:

  • Table-like sources. These are row-and-column datasets. This includes Snowflake tables/views and in-memory tabular data like DataFrames. It also includes CSV data you load into pandas. Use Model.Table for Snowflake. Use Model.data for DataFrames and small inline datasets.
  • Model constants. These are fixed values you write in Python. Use Python literals for one-off constants. Use Model.Enum for a small, fixed set of named constants that are queryable in the model.

This table helps you choose the right source type for your project:

I have a…I can use…
Snowflake table or viewModel.Table to reference it by path. See Use a Snowflake table with Model.Table
pandas DataFrame in PythonModel.data to treat it as a temporary table in queries and definitions. See Use a DataFrame with Model.data
CSV file on diskModel.data for local iteration; load it into Snowflake and switch to Model.Table when it’s shared or large. See Use CSV data with Model.data
A few explicit rows for examples or testsModel.data to keep the example self-contained. See Use inline Python data with Model.data
A small set of model-specific named constantsModel.Enum for named values you can store and query; use Python literals for one-off constants. See Create model constants with Model.Enum

Model.Table gives you a table-like object backed by a Snowflake table or view in the semantics DSL. Choose this when your data already lives in Snowflake and you want a stable, shareable source mapping.

Choose this when you want to start querying a Snowflake table without writing a column list up front. PyRel fetches the column names and types the first time you access columns in a query or definition. This makes quick exploration faster, but the first column access can fail later if the table path, permissions, or schema are not what you expect.

  1. Declare the table

    Create a Model and declare a table reference with Model.Table:

    from relationalai.semantics import Model
    m = Model("MyModel")
    t = m.Table("DB.SCHEMA.CUSTOMERS")
  2. Verify by selecting all columns

    Select all columns with Model.select. The first *t access triggers lazy schema discovery:

    m.select(*t).to_df()
  • The value returned by Model.Table is a Table object. It behaves like a table with columns you can reference in queries and definitions.
  • Prefer bracket access (t["COL"]) when a column name isn’t a valid Python identifier or contains spaces.
  • When you omit schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or schema are not what you expect, you can get an error later when you run a query that references the table.
  • Turning table rows into entities and relationships is a separate step.

Choose this when the table schema is stable and you want PyRel to know the column names and types immediately. Providing schema={...} to Model.Table makes the columns available right away and helps you catch missing or misnamed columns earlier. The tradeoff is that you need to keep the schema mapping in sync if the underlying Snowflake table changes.

  1. Declare the table with a schema

    Pass schema={...} to Model.Table:

    from relationalai.semantics import Integer, Model, String
    m = Model("MyModel")
    t = m.Table(
    "DB.SCHEMA.CUSTOMERS",
    schema={
    "CUSTOMER_ID": Integer,
    "NAME": String,
    },
    )
  2. Verify by selecting all columns

    Verify access by selecting all columns with Model.select:

    m.select(*t).to_df()
  • The value returned by Model.Table is a Table object. It behaves like a table with columns you can reference in queries and definitions.
  • Prefer bracket access (t["COL"]) when a column name isn’t a valid Python identifier or contains spaces.
  • When you provide schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or schema are not what you expect, you can get an error as soon as you create the table object.
  • Turning table rows into entities and relationships is a separate step.

CSV files are a convenient starting point for local iteration. PyRel treats CSV data as in-memory tabular data after you load it in Python and pass it to Model.data. Choose the variant that matches how you prefer to parse CSVs.

Choose this when you already use pandas for cleanup and type normalization. This variant reads a CSV file into a DataFrame and then wraps it with Model.data.

  1. Create a sample CSV file

    Create a file named sample.csv in your working directory (or anywhere you can reference by path) with the following contents:

    customer_id,name
    1,Alice
    2,Bob
  2. Load the CSV file with Model.data

    Read the file with pandas.read_csv, then call Model.data:

    from pathlib import Path
    import pandas as pd
    from relationalai.semantics import Model
    m = Model("MyModel")
    csv_path = Path("sample.csv")
    # If you created the file elsewhere, update the path.
    # Example: csv_path = Path("/absolute/path/to/sample.csv")
    df = pd.read_csv(csv_path, encoding="utf-8")
    d = m.data(df)
  3. Verify by selecting all columns

    Select all columns with Model.select and expand the columns with *d:

    m.select(*d).to_df()
  • pandas.read_csv infers dtypes. If a column should stay a string, pass dtype= to pandas.read_csv or normalize types before you call Model.data. For example, treat an ID with leading zeros as a string.
  • If you see unexpected column names, fix them in pandas before you reference them in definitions. For example, trim leading and trailing whitespace.
  • Mapping CSV-backed columns into entities and relationships is a separate step.

Choose this when you want to avoid pandas and keep dependencies minimal. This variant parses CSV text into a list of dictionaries and passes it to Model.data.

  1. Create a sample CSV file

    Create a file named sample.csv in your working directory (or anywhere you can reference by path) with the following contents:

    customer_id,name
    1,Alice
    2,Bob
  2. Load the CSV file with Model.data

    Parse the file with csv.DictReader, then call Model.data:

    import csv
    from pathlib import Path
    from relationalai.semantics import Model
    m = Model("MyModel")
    csv_path = Path("sample.csv")
    # If you created the file elsewhere, update the path.
    # Example: csv_path = Path("/absolute/path/to/sample.csv")
    with csv_path.open("r", encoding="utf-8", newline="") as f:
    rows = list(csv.DictReader(f))
    d = m.data(rows)
  3. Verify by selecting all columns

    Select all columns with Model.select and expand the columns with *d:

    m.select(*d).to_df()
  • csv.DictReader returns strings for all values. If you need numeric types, convert values in Python before you call Model.data.
  • Always open the file with newline="" (as shown) so the csv module handles newlines consistently across platforms.
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
  • Mapping CSV-backed columns into entities and relationships is a separate step.

A DataFrame source lets you reuse transformed in-memory tabular data as an input to model definitions. Choose this when you already have a pandas DataFrame from preprocessing, feature engineering, or notebook exploration.

  1. Wrap a DataFrame with Model.data

    Start from a DataFrame with stable column names, then call Model.data:

    import pandas as pd
    from relationalai.semantics import Model
    m = Model("MyModel")
    df = pd.DataFrame(
    [
    {"customer_id": 1, "name": "Alice"},
    {"customer_id": 2, "name": "Bob"},
    ]
    )
    d = m.data(df)
  2. Verify by selecting columns

    Select a couple of columns with Model.select to confirm the mapping is what you expect:

    m.select(d.customer_id, d.name).to_df()
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
  • You can use either dot access (d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier. You can also use m.select(*d) to select all columns without referencing them by name.
  • If results look surprising, check df.dtypes and normalize critical columns before you call Model.data.
  • Mapping DataFrame-backed columns into entities and relationships is a separate step.

Inline data is the fastest way to seed small, explicit rows for examples and tests. Choose this when you want the smallest possible repro without relying on external files or a database. Keep inline datasets small and schema-like so they don’t drift from your production sources.

Choose this variant when you want column names to come directly from your Python keys.

  1. Create a Data source from rows

    Call Model.data with a list of dictionaries:

    from relationalai.semantics import Model
    m = Model("MyModel")
    d = m.data(
    [
    {"name": "Alice", "age": 10},
    {"name": "Bob", "age": 30},
    ]
    )
  2. Preview the columns

    Query the columns with Model.select:

    m.select(d.name, d.age).to_df()
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
  • You can use either dot access (d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier. You can also use m.select(*d) to select all columns without referencing them by name.
  • If you have exactly one active model, you can also use the top-level data helper as a convenience wrapper around Model.data.
  • Mapping inline data columns into entities and relationships is a separate step.

Choose this variant when your data is naturally row-oriented and you want to provide the column names explicitly.

  1. Create a Data source and set column names

    Pass columns=[...] so your column names are stable and readable in later declarations:

    from relationalai.semantics import Model
    m = Model("MyModel")
    d = m.data(
    [(0, 72.5), (1, 71.9)],
    columns=["minute", "temperature"],
    )
  2. Preview the columns

    Preview the columns with Model.select:

    m.select(d.minute, d.temperature).to_df()
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions. You can use either dot access (d.minute) or bracket access (d["minute"]) to reference columns.
  • If you omit columns for tuple rows, you can access columns by 0-based integer index, such as d[0] and d[1]. They are also exposed with the default names col0, col1, col2, … so you can write d.col0 or d["col0"] if you prefer.
  • Mapping inline data columns into entities and relationships is a separate step.

Model.Enum creates a small, fixed set of named constants inside your model. Choose this when you want values that behave like model entities (so you can store them, join on them, and query them) rather than one-off Python literals. Enum members are defined lazily the first time you reference them in a query or definition.

  1. Declare an enum type

    Define an enum by subclassing Model.Enum:

    from relationalai.semantics import Model
    m = Model("MyModel")
    class Status(m.Enum):
    ACTIVE = "ACTIVE"
    INACTIVE = "INACTIVE"
  2. Verify by selecting an enum member

    Reference an enum member in a query with Model.select:

    m.select(Status.ACTIVE).to_df()
  • If you only need a one-off constant, prefer a Python literal.
  • You can use enum members in queries and definitions just like other concepts and relationships. They are stored in the model and can be joined on, returned in results, and used in logic.