Predictive reasoning

Use predictive reasoning for questions that require a model trained on historical data to make predictions about future or previously unseen events. This overview explains the workflow, helps you decide when predictive reasoning fits, and points you to the right next guide. If you need to control which predictive reasoner PyRel uses, see Configure predictive reasoner settings.

What predictive reasoning is

Predictive reasoning is a workflow for solving prediction problems: questions where the answer is not present in the existing data and must instead be predicted from historical examples.

PyRel’s predictive reasoner trains a Graph Neural Network (GNN) directly on your relational data. Concepts become node types, and edges built from the relationships between them form the graph’s topology. Column values become node features. You don’t need to flatten your schema into a single wide table or hand-engineer features — the graph preserves the structure that the GNN learns from.

In PyRel, you typically:

Declare the concepts that participate in the prediction (for example, Customer, Product, Transaction) and populate them from Snowflake tables.
Define train, validation, and test splits as select fragments or relationships over those concepts.
Build a Graph by adding edges that reflect the foreign-key relationships between concepts.
Configure which properties are features and how to preprocess them with a PropertyTransformer.
Train a GNN with .fit(), then generate predictions with .predictions(domain=...).

The reasoner supports two broad task families:

Task family	What it predicts	Typical examples
Node prediction (binary, multiclass, or multilabel classification, regression)	A categorical label or a numeric value for each entity	Customer churn, smoker status, student interest categories, weekly sales forecasting, review score
Link prediction (including recurring links)	Whether two entities are, or will be, connected	Next-purchase prediction, post-to-post references, repeat interactions

You can read more about the supported task types in the Define a learning task page.

When to use predictive reasoning

Use predictive reasoning when the answer depends on signal spread across multiple related tables, cannot be captured by a fixed rule, join, or formula, and must be inferred rather than directly retrieved from the existing data.

Common scenarios include:

Customer or subscriber churn. Predict whether a customer will stop engaging based on past transactions, product characteristics, and demographics — for example, churn predicted from a history of purchases, or from review activity.
Demand and sales forecasting. Predict a numeric outcome per entity from historical events plus entity attributes — for example, weekly sales per store-department, weekly sales per product, or ad click-through rate.
Next-action and repeat-interaction prediction. Predict which items an entity will interact with next, framed as link prediction between two node types — for example, predicting which products a customer will re-purchase, or which ads a user will click.
Categorizing entities by their neighborhood. Assign one or more labels to an entity based on the entities it is related to — for example, inferring a student’s subject interests from the classes they take, as a multiclass or multilabel classification task.

Other types of reasoning may be a better fit if:

Your answer can be derived exactly from existing facts, joins, or aggregations — use rules-based reasoning.
You need to make a choice subject to constraints, possibly optimizing an objective — use prescriptive reasoning.
Your question is about graph structure itself — reachability, centrality, or communities — rather than a learned prediction, use graph reasoning.

What are GNNs?

A Graph Neural Network (GNN) is a type of neural network that learns from data organized as a graph — a set of nodes (entities) connected by edges (relationships between them). Unlike a tabular model that sees each row in isolation, a GNN can combine information across many connected rows at once.

The RelationalAI predictive reasoner turns your relational database into a graph automatically:

Each row of a table becomes a node. A table with 100 rows contributes 100 nodes. All rows from the same table share a node type.
Foreign-key relationships become edges. When one table references another — for example, a transaction whose customer_id matches a customer’s customer_id — the GNN sees that as an edge between the two corresponding nodes.

Once the graph is built, the GNN learns by combining each node’s attributes with those of its neighbors — a process called message passing — round after round. At every round, each node gathers information from the nodes it’s directly connected to and folds that into its own representation. After several rounds, every node has effectively pulled in signal from increasingly distant neighbors across the graph.

A schema with primary-foreign key links, the corresponding GNN graph, and the message-passing process

(a) A simple schema with primary–foreign key links. (b) The corresponding GNN graph, with the source node shown in black. (c) The message-passing process for that node: it aggregates information from its neighbors, who in turn gather information from their neighbors, and so on — propagating signal outward through the graph.

This is how the GNN exploits the relational structure of your data: a customer’s likelihood of churning, for example, depends not just on their own attributes but also on the products they’ve bought, the transactions linking them, and the patterns of other customers connected through those products. Tabular models that see each row in isolation can’t pick up that kind of cross-row signal — GNNs are built to.

The rules that govern how each node combines neighbor information are learned during training by comparing the model’s predictions against known historical outcomes and adjusting until predictions are as accurate as possible.

Where to go next

Pick the next guide based on which step you are on:

Solve a classification problemA complete walkthrough on a synthetic academic dataset: load data from Snowflake, build a graph, define task splits, train a GNN, and inspect predictions.

Construct a graphBuild the graph the GNN learns over, including multiple edge types between two concepts and self-referential edges.

Define a learning taskDefine train, validation, and test splits that tell the GNN what to predict.

Encode featuresUse PropertyTransformer to choose which properties will be fed as features to the GNN and how to preprocess them.

Configure and train a GNNSet training parameters such as epochs, batch size, learning rate, and the evaluation metric, then run training.

Monitor trainingTrack training progress and inspect experiment results in Snowflake.

Make predictionsGenerate and query predictions attached to your concepts.

Use a pretrained modelLoad a previously trained model and generate predictions with it.

Understand GNN workflowsThe valid call sequences for the fit and load workflows, and the combinations that raise errors.