Fraud Detection

Experience level: Advanced
Reasoning types: Graph, Rules-based, Predictive, Prescriptive
Industry: Financial Services
Tags: GNNFraudPredict-then-OptimizeClassificationMILPMulti-Reasoner

What this template is for

Fraud and risk teams face four interconnected problems: discovering suspicious structure in the transaction graph, classifying accounts by their behavior, scoring transactions as they arrive, and deciding which alerts to investigate given finite human capacity. Traditionally these live in separate tools, so the signal from one rarely informs the next, and the investigator queue is set by a naive sort rather than by the value at stake.

This template shows all four working together on one semantic model in RelationalAI: graph structure feeds behavioral rules, both feed a learned per-transaction fraud score, and that score drives an investigator-budget allocation that captures the most expected loss the audit team’s hours can reach.

It chains RelationalAI’s graph, rules, predictive, and prescriptive reasoners into a single predict-then-optimize pipeline over a shared ontology — from account PageRank through a graph neural network (GNN) classifier to a mixed-integer linear programming (MILP) audit allocation.

Who this is for

Data scientists building end-to-end ML-to-optimization pipelines on transaction graphs
Fraud analysts combining heuristic flags with learned signals to prioritize audits
ML engineers exploring GNN-based prediction on relational/graph data
Operations researchers interested in predict-then-optimize patterns

Assumes familiarity with Python, basic ML concepts (binary classification, ROC AUC), and mixed-integer programming.

What you’ll build

Graph: PageRank on an Account-Account funds-flow graph, exposing account centrality as a GNN feature
Rules: derived activity_count property per account, fed to the GNN as an integer feature alongside the raw transaction fields
Predictive: a GNN binary classifier on the Account-Transaction graph, predicting isFraud per transaction
Bridge: a layer combining GNN probabilities with a rule-based heuristic flag into a per-transaction alert_score
Prescriptive: a knapsack-style investigator-budget MILP that maximizes expected loss averted (alert_score × transaction_amount) subject to a fixed-hours audit budget (audit cost scales with transaction size) plus a per-receiver cap
The same five-stage pipeline running against either a bundled CSV subset (local demo) or a full Snowflake dataset (reference path)

What’s included

Runners:
- fraud_detection_local.py — primary, runnable out of the box. Runs all five stages (Graph / Rules / Predictive / Bridge / Prescriptive) end-to-end on the bundled demo CSVs.
- fraud_detection.py — reference pattern for adapting the pipeline to your own Snowflake data. Same five stages, GPU-trained.
- fraud_detection_rules.ipynb — original rule-based identity-graph notebook, kept as a complementary intro.
Runbook: runbook.md — a paste-testable walkthrough that reproduces the template step by step with the RAI skills; as important a reference as the script itself.
Model: Account, Transaction, plus two graphs (Account-Account for PageRank; Transaction-to-Account for the GNN), derived account properties, and the alert-score bridge
Sample data: a small class-balanced transactions subset sampled from a public mobile-money dataset (CC BY-SA 4.0) — see Sample data below for details and attribution
Outputs: class-balance profile, GNN ROC-AUC, top-K alert queue, optimal audit schedule, MILP-vs-naive uplift

Prerequisites

Access

To run the local demo (fraud_detection_local.py) you need any Snowflake account with the RAI Native App. No external data, no GPU. The bundled CSVs under data/paysim_mini/ ship with the template; the GNN trains on CPU in a few minutes.

The predictive reasoner needs a writable Snowflake schema where it can create experiments and models. The script defaults to FRAUD_DETECTION.EXPERIMENTS (configurable via exp_database / exp_schema in the script). One-time setup, run as ACCOUNTADMIN or any role with privileges to run the commands below:

-- Use a database you own (FRAUD_DETECTION shown; pick anything writable)
CREATE DATABASE IF NOT EXISTS FRAUD_DETECTION;
CREATE SCHEMA IF NOT EXISTS FRAUD_DETECTION.EXPERIMENTS;

GRANT USAGE ON DATABASE FRAUD_DETECTION TO APPLICATION RELATIONALAI;
GRANT USAGE ON SCHEMA FRAUD_DETECTION.EXPERIMENTS TO APPLICATION RELATIONALAI;
GRANT CREATE EXPERIMENT ON SCHEMA FRAUD_DETECTION.EXPERIMENTS TO APPLICATION RELATIONALAI;
GRANT CREATE MODEL ON SCHEMA FRAUD_DETECTION.EXPERIMENTS TO APPLICATION RELATIONALAI;

To adapt to your own Snowflake pipeline (fraud_detection.py as reference) you’ll additionally need:

A dataset in Snowflake with an accounts table plus a transactions table that references accounts as sender and receiver, and pre-built train / val / test split tables. The as-shipped fraud_detection.py targets a full PaySim mobile-money dataset loaded at FRAUD_DB.PAYSIM.{ACCOUNTS, TRANSACTIONS, TRAIN, VAL, TEST} as a worked example; see Sample data for the source.
A GPU-enabled RAI engine for GNN training at dataset scale (PaySim is ~6M rows).

Tools

Python >= 3.10
RelationalAI Python SDK (relationalai[gnn] == 1.8)
For the rule-based notebook only: jupyter

Quickstart

Download ZIP:
Terminal window
```
curl -O https://docs.relational.ai/templates/zips/v1/fraud-detection.zip
unzip fraud-detection.zip
cd fraud-detection
```
You can also download the template ZIP using the “Download ZIP” button at the top of this page.

Create venv:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

Install:
Terminal window
```
python -m pip install .
```
Configure:
Terminal window
```
rai init
```
After rai init generates the config file, add the following to your raiconfig.yaml:
```
data:
    ensure_change_tracking: true
```
Run the local demo on the bundled subset (CPU, a few minutes):
Terminal window
```
python fraud_detection_local.py
```

Adapting to your own Snowflake data

fraud_detection.py is the reference for wiring this pattern against a real Snowflake dataset (accounts + transactions + train/val/test task tables):

Point the table references at your data:

DATABASE = "YOUR_DB"
SCHEMA = "YOUR_SCHEMA"   # schema with ACCOUNTS, TRANSACTIONS, TRAIN, VAL, TEST

Adjust the PropertyTransformer to match your columns — drop your PKs/FKs explicitly, annotate categoricals and continuous fields, and set time_col on your timestamp column.
If your task tables use different column names, update the Relationship templates (and any TrainTable.<column> accesses) to match.
Run against a GPU-enabled RAI engine:
Terminal window
```
python fraud_detection.py
```

Your TRANSACTIONS table must carry an audit_cost column (hours per audit) — the MILP knapsack constraint reads it directly. Materialize it via a SQL CASE expression so the cost model lives in Snowflake, not Python:

CREATE OR REPLACE TABLE FRAUD_DB.PAYSIM.TRANSACTIONS AS
  SELECT *, CASE WHEN amount > 1000000 THEN 5.0 ELSE 1.0 END AS audit_cost
  FROM FRAUD_DB.PAYSIM.RAW_TRANSACTIONS;

Build the train/val/test tables from the main transaction table by step cutoff:

CREATE OR REPLACE TABLE FRAUD_DB.PAYSIM.TRAIN AS
  SELECT transaction_id, step_ts, is_fraud FROM FRAUD_DB.PAYSIM.TRANSACTIONS
  WHERE step <= 520;
CREATE OR REPLACE TABLE FRAUD_DB.PAYSIM.VAL AS
  SELECT transaction_id, step_ts, is_fraud FROM FRAUD_DB.PAYSIM.TRANSACTIONS
  WHERE step BETWEEN 521 AND 631;
CREATE OR REPLACE TABLE FRAUD_DB.PAYSIM.TEST AS
  SELECT transaction_id, step_ts FROM FRAUD_DB.PAYSIM.TRANSACTIONS
  WHERE step > 631;

Expected output (local run, abbreviated)

Real numbers from a verified end-to-end run on the bundled subset (CPU, no external data, no GPU). Exact scores shift a little with numerical noise between CPU and GPU runs, but the structure and magnitude are consistent.

Stage 5: Prescriptive -- investigator-budget allocation
  MILP Status: OPTIMAL
  MILP (cost-aware + per-receiver cap) -> $111,854,667 captured
  Naive top-by-alert-score (same 80 hours)  -> $67,947,657 captured
  MILP uplift over naive sort: $+43,907,010

The MILP captures materially more expected loss than a naive sort-by-alert-score under the same 80-hour budget, because it trades off per-audit cost (audit hours scale with transaction size) against catch value and respects the per-receiver cap. GNN training on GPU is not bit-for-bit reproducible even with a fixed seed, so the exact captured and uplift dollars shift from run to run (a separate run gave 60.2M naive baseline); the large MILP-over-naive uplift is the stable result. The runbook.md walkthrough covers fraud_detection.py, the same chain run against the full PaySim schema on Snowflake at a much larger scale.

Template structure

.
├── README.md                       # this file
├── pyproject.toml                  # dependencies
├── fraud_detection_local.py        # primary: 5-stage pipeline on bundled CSVs (CPU)
├── fraud_detection.py              # reference pattern: same pipeline in Snowflake (GPU)
├── fraud_detection_rules.ipynb     # rule-based identity-graph intro (no ML)
└── data/
    └── paysim_mini/
        ├── transactions.csv        # ~16K sampled transactions (class-balanced)
        ├── accounts.csv            # derived unique accounts from nameOrig ∪ nameDest
        ├── train.csv               # 70% temporal split with is_fraud label
        ├── val.csv                 # 15%
        ├── test.csv                # 15%, no label
        ├── sample.py               # one-time sampler from a local PaySim dump
        └── LICENSE.txt             # CC BY-SA 4.0 + PaySim attribution

Start here: run python fraud_detection_local.py for the full five-stage pipeline end to end (CPU, no external setup), or follow runbook.md to reproduce it step by step with the RAI skills. Use fraud_detection.py (requires GPU) as the adaptation reference when you wire this pattern into your own Snowflake data, and explore fraud_detection_rules.ipynb for a rule-based-only take on identity graphs.

Sample data

The bundled mini dataset is sampled from the PaySim synthetic mobile-money transactions dataset by Edgar Lopez-Rojas, released under CC BY-SA 4.0.

16K transactions sampled with class balance inflated from PaySim’s native 0.13% fraud up to 50% so the GNN has enough positive signal to learn from on CPU. Real-world fraud-detection runs should preserve native imbalance and use class weighting.
Fraud is confined to CASH_OUT and TRANSFER transaction types — this is a documented PaySim quirk. The GNN’s job is to distinguish fraudulent CASH_OUT/TRANSFER from normal CASH_OUT/TRANSFER via graph context, not to rediscover the type filter.
See data/paysim_mini/LICENSE.txt for full attribution and citation.

Model overview

Two concepts carry the pipeline: Account and Transaction. Each stage enriches them with new properties the next stage reads, so the model grows accretively across the run.

Key entities: Account (a participant in the transaction network) and Transaction (one transfer between two accounts).
Primary identifiers: Account.account_id (string, e.g. customer prefix C or merchant prefix M); Transaction.transaction_id (integer).
Important invariants: is_flagged_fraud and the GNN’s is_fraud label are 0/1; alert_score is a [0, 1] blend of the flag and the GNN probability; audit-cost hours and transaction amounts are non-negative; the MILP’s audit decision is binary.

For the full concept and property definitions, see fraud_detection_local.py; runbook.md builds them step by step with the RAI skills (covering the Snowflake-scale fraud_detection.py path).

How it works

The five stages thread through the shared ontology, each stage’s output becoming a property the next stage reads.

Accounts + Transactions (Snowflake tables or bundled CSVs)
  → Stage 1 -- Graph:       PageRank on Account-Account funds-flow graph
  → Stage 2 -- Rules:       Account.activity_count (per-sender derivation)
  → Stage 3 -- Predictive:  GNN binary classification (Transaction.predictions.probs)
  → Stage 4 -- Bridge:      alert_score blends GNN prob with is_flagged_fraud
  → Stage 5 -- Prescriptive: knapsack MILP (hours budget + per-receiver cap)

Build the graphs. Account and Transaction are populated from CSVs (local) or Snowflake (full), then two graphs are constructed for different reasoners: a directed Transaction-to-Account bipartite graph that the GNN consumes, and a directed Account-Account funds-flow graph the graph reasoner runs PageRank over.

Stage 1 — Graph reasoner: account PageRank. PageRank runs on the funds-flow graph, and each account’s score is bound to an explicit Account.pagerank property so it surfaces as a GNN feature column.

Stage 2 — Rules reasoner: account activity. A derived property aggregates each sending account’s transaction count into Account.activity_count. Both pagerank (continuous) and activity_count (integer) go into the PropertyTransformer so the GNN sees them as features alongside the raw transaction fields.

Stage 3 — Predictive: GNN binary classifier. Task relationships encode the is_fraud label on the train/validation splits and omit it on test. The GNN trains on the Transaction-to-Account graph with the enriched features and emits a per-transaction fraud probability. Both scripts use temporal task relationships keyed on the transaction timestamp.

Stage 4 — Bridge: blend GNN probability with heuristic flag. The dataset carries an is_flagged_fraud heuristic. A convex mix (weighted by ALPHA_FLAG) combines it with the GNN probability into a per-transaction alert_score in [0, 1].

Stage 5 — Prescriptive: knapsack MILP investigator-budget allocation. An auditor’s time is the scarce resource: the total investigation budget is fixed in hours, and the time to audit a transaction grows with its size. The MILP maximizes expected loss averted (alert_score × amount) subject to that budget, plus a per-receiver cap to prevent flooding one account. Because cost and value both scale with transaction size, ranking by alert_score alone is provably suboptimal — a high-score 500K transfers at 1 hour each. The MILP trades them off correctly; the output prints both the MILP objective and a naive sort-and-take baseline so the tradeoff is visible.

See fraud_detection_local.py (and fraud_detection.py for the Snowflake path) for the implementation, and runbook.md for the skill-driven reproduction.

Customize this template

Focus on the first changes most users will make.

Use your own data

Replace the bundled CSVs (or Snowflake tables) with your own accounts and transactions. Keep customer_id-style string primary keys and a stable transaction primary key.
The PropertyTransformer is the main place to localize: drop your primary and foreign keys, and list your categorical versus continuous fields.

Tune parameters

ALPHA_FLAG (0..1) — weight on the rule-based flag versus the GNN probability.
AUDIT_BUDGET_HOURS / PER_ACCOUNT_CAP — investigator budget knobs. Raise the budget to audit more transactions; tighten the cap to spread audits across more receivers.
LARGE_AMOUNT_THRESHOLD / SMALL_AUDIT_COST_HOURS / LARGE_AUDIT_COST_HOURS — the audit-cost curve. Make the jump steeper to reward the MILP’s knapsack-style tradeoffs more aggressively.
GNN hyperparameters (n_epochs, lr, train_batch_size, …) — see the rai-predictive-training skill for tuning guidance.

Extend the model

Swap PageRank for other centrality measures (betweenness, eigenvector) or add community labels (Louvain / Infomap) as a categorical GNN feature.
Author additional rules (e.g. balance-change anomalies, velocity spikes) and feed them into both the GNN features and the alert_score blend.
Fold a rule-based flag directly into the MILP as a hard constraint (e.g. never skip an already-is_flagged_fraud=True transaction) rather than as an alert-score contributor.

Scale up / productionize

Use fraud_detection.py as the reference for running against a full Snowflake dataset on a GPU-enabled RAI engine (see Adapting to your own Snowflake data under Quickstart).
Pin relationalai in pyproject.toml and set a fixed GNN seed for reproducible runs; schedule the pipeline to refresh the audit queue as new transactions land.

Troubleshooting

GNN training fails or is very slow

For the full-scale fraud_detection.py path, a GPU-enabled engine is required — PaySim’s 6M rows are too large for CPU.
For the local path, the bundled 16K-row subset fits comfortably on CPU (~2-5 min).
Check that the task-table columns in your Relationship templates actually exist on the CSVs (transaction_id, step_ts, is_fraud).

Predictions are all near 0 or all near 1

Re-check class balance on the train split (printed before training). If it’s extremely imbalanced, either raise the positive sample rate or add class weighting.
Inspect the PropertyTransformer with VERBOSE_DATASET = True — misconfigured feature types dilute signal.
Try more epochs; classification may need 10-20 epochs even on balanced data.

MILP infeasible or degenerate

Infeasible: AUDIT_BUDGET_HOURS is tighter than the cheapest feasible audit, or the per-receiver cap is already saturated. Widen the budget or the per-receiver cap.
Degenerate (selects 0 transactions): no transactions have an alert_score. Confirm Transaction.predictions was populated (test split present + GNN fit succeeded).

Spinner floods the log when running in CI / non-TTY

Set STREAM_LOGS = False at the top of the script (the default). The GNN continues training server-side; only the client-side log stream is suppressed.

Learn more

Core concepts

Multi-reasoner workflows — chaining reasoners and enriching a shared ontology, as this predict-then-optimize pipeline does.
PyRel v1 modeling — concepts, properties, and derived rules.

Reasoner reference

Graph reasoner — building graphs from an ontology and running PageRank and centrality.
Predictive reasoner (GNN) — graph neural network classification, the PropertyTransformer, and temporal task relationships.
Prescriptive reasoner — the Problem API, decision variables, constraints, and objectives for the knapsack MILP.

Deeper dives

GNN training and tuning — hyperparameters, evaluation metrics, and predictions; see also the rai-predictive-training skill.

Support

File issues at the RelationalAI templates repository.