Money-Laundering Motif Detection
Detect three classes of layering motif on the same transaction ledger, each demonstrating a different CSP technique that rules / paths / graph reasoning alone cannot enforce: per-vertex aggregate equality (butterfly), pairwise distinctness over a chosen subset (smurf army), and cardinality on a chosen subset (KYC-mix burst).
What this template is for
Banking compliance and financial-crime teams hunt for “layering” — the obfuscation phase of money laundering, where a launderer routes funds across a coordinated cohort of intermediary accounts so no single transaction triggers a FinCEN currency-transaction-report (CTR) filing. The template ships three motifs — each a different layering shape, each anchoring a distinct CSP technique class — that share an ontology and run as three independent scripts:
| Motif | CSP technique class | Topology | Runner |
|---|---|---|---|
| Butterfly with per-hub flow conservation | Per-vertex equality of two aggregates over a decision-selected edge subset | Scatter-gather (one source -> K hubs -> one destination), per-hub conservation in dollar amount | motif_butterfly.py |
| Smurf army with pairwise distinct beneficial owners | Pairwise constraints over decision-selected vertices + sum-equals-target | Fan-in (K sources -> one destination) with pairwise-distinct bo_id, sum-to-target, tight time window | motif_smurf_army.py |
| KYC-mix burst with cardinality on a chosen subset | Cardinality constraint over a decision-selected vertex subset | Fan-in with KYC-tier floor (>= 4 retail) within a tight time window | motif_kyc_burst.py |
The structural shape of a layering scheme — fan-out, fan-in, scatter-gather — is what graph-pattern matchers handle well, and it’s exactly what launderers hide behind by adding decoy edges and lookalike clusters. The signal that separates a real motif from its lookalikes is a constraint that has to hold jointly across whichever subset of accounts and edges the detector picks — arithmetic across the chosen edges, pairwise distinctness across the chosen accounts, or a count distribution over the chosen subset. None of those are expressible in a graph pattern or a path query: those see one edge or one walk at a time, never “this set of K elements, taken together, satisfies the joint condition.” Each motif is grounded in a documented AML typology — the eight-pattern taxonomy from IBM AMLworld (Altman et al., NeurIPS 2023) and the structuring / smurf-army typology from FATF and FinCEN (31 CFR § 1010.311).
The same patterns apply outside banking. Per-vertex aggregate equality detects collusion rings in marketplace fraud and recurring-billing abuse with fee splitting. Pairwise distinctness over a chosen subset detects ration-card sharing in welfare fraud and identity-cohort scams in subscription services. Cardinality distribution over a chosen subset detects compliance-mix violations in regulatory audits and KYC-cohort coordination in any structuring-style scheme.
Who this is for
- Bank financial-crime / AML compliance teams investigating layering patterns
- Fintech risk engineers building structuring-detection alert pipelines
- Bank IT teams building investigative tools that surface candidate cases for human review
- Operations researchers learning multi-motif subgraph detection where each motif demonstrates a distinct CSP technique class
What you’ll build
A shared ontology in model_setup.py and three independent CSP runners, each adding its own decision properties and constraints to the same Account / Transaction model. Every runner solves a separate Problem and prints the analyst-facing motif tables.
Shared ontology (in model_setup.py):
Accountconcept withbo_id(beneficial owner cluster),kyc_tier, andjurisdictionpropertiesTransactionconcept withsrc,dst,amount_dollars,ts_minutes
Each motif owns its own decisions and constraints. The bullets below summarise the load-bearing CSP integrity constraint per motif (the part that rules / paths / graph reasoning cannot enforce); the “How it works” section walks each one with code.
-
Butterfly (
motif_butterfly.py):- Decisions:
Transaction.is_motif,Account.is_source/is_hub/is_dest - Per-account flow conservation in count (source out-degree K, hub in/out-degree 1, dest in-degree K)
- Global motif-edge count (= 2K) — closes a gap in the per-account ICs above for accounts with one-sided traffic (no outgoing or no incoming rows), where the per-account IC isn’t instantiated and the role variables would otherwise be free
- Layer constraints (no source-to-dest direct motif edge, no hub-to-hub motif edge) — the count flow ICs admit non-butterfly shapes like
S->D, S->H1, S->H2, H1->H3, H2->D, H3->D(still 2K edges, still 1-in / 1-out per hub) on customer ledgers that have under-threshold direct or hub-to-hub edges; these ICs forbid those shapes - Per-hub flow conservation in dollar amount. Each chosen hub’s incoming dollars equal its outgoing dollars within
CONSERVATION_TOLERANCE_DOLLARS. Written in big-M form (active whenis_hub == 1, vacuous otherwise) with the M coefficient computed from the data. - Same-beneficial-owner pairwise filter over chosen hub pairs
- Sub-threshold amount filter
- Decisions:
-
Smurf army (
motif_smurf_army.py):- Decisions:
Account.is_smurf,Transaction.is_smurf_tx - Exactly N smurfs sending to a single target merchant
- Pairwise distinct beneficial owners across chosen smurfs
- Sum-equals-target on chosen smurf-tx amounts within tolerance
- All chosen smurf-tx within
SMURF_WINDOW_MINUTESof each other - Sub-threshold amount filter
- Decisions:
-
KYC-mix burst (
motif_kyc_burst.py):- Decisions:
Account.is_burst,Transaction.is_burst_tx - Exactly N accounts in a coordinated burst to a single target merchant
>= RETAIL_FLOORretail-tier accounts among chosen — a cardinality constraint over the selected subset- All chosen burst-tx within
BURST_WINDOW_MINUTESof each other - Sub-threshold amount filter
- Decisions:
Multi-solution enumeration via problem.solve(..., solution_limit=K) is available on every runner for batch analyst-triage workflows. The butterfly and KYC-burst runners exhaust to OPTIMAL on the bundled data; the smurf runner hits SOLUTION_LIMIT because pairwise-distinct beneficial-owner cohorts can swap any one chosen smurf for the dup-BO decoys (47 or 48), opening more cohorts than MAX_SMURF_MOTIFS=5. Raise the limit to enumerate further.
What’s included
model_setup.py— shared ontology (Account,Transaction) + CSV load. Every motif runner importscreate_model()and adds its own decisions and constraints on top.motif_butterfly.py— per-hub flow conservation over a decision-selected edge subset.motif_smurf_army.py— pairwise distinct beneficial owners + sum-target + tight time window.motif_kyc_burst.py— cardinality on the chosen subset + tight time window.data/accounts.csv— 90 accounts across 69 beneficial-owner clusters, including two butterfly-cluster hubs (bo_id=100,bo_id=200), a five-account smurf cohort with pairwise-distinct BOs (bo_id=1701..1705), a five-account KYC-mix burst cohort (4 retail + 1 business across 4 jurisdictions), plus dup-BO / out-of-window / over-amount decoys for each motif and 30 unrelated noise accounts.data/transactions.csv— 138 directed transactions: 12 butterfly motif edges, 5 smurf motif edges, 5 KYC-burst motif edges, named decoys / cross-cluster traffic, and 60 noise transactions deterministically generated viadata/generate.py.data/generate.py— the deterministic generator that produced the bundled CSVs. Re-run only if you change the planted-motif design or want to regenerate at a different size; the template ships its outputs so users don’t need to run it.pyproject.toml— Python package configuration.
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
Quickstart
-
Download ZIP:
Terminal window curl -O https://docs.relational.ai/templates/zips/v1/money_laundering_motif_detection.zipunzip money_laundering_motif_detection.zipcd money_laundering_motif_detection -
Create venv:
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install --upgrade pip -
Install:
Terminal window python -m pip install . -
Configure:
Terminal window rai init -
Run any of the three motif detectors. They share the bundled data and run independently:
Terminal window python motif_butterfly.pypython motif_smurf_army.pypython motif_kyc_burst.py -
Expected output. The bundled dataset plants exactly the motifs each detector targets, plus dup-BO / out-of-window / over-amount decoys for each. Solver build strings and exact wall times will vary; the structure of the output is stable.
motif_butterfly.pyfinds the two planted scatter-gathers (cluster100->WireRecipientCorp, cluster200->OffshoreLLC) and exhausts the search:Solve result:• status: OPTIMAL• num_points: 2Result: 2 butterfly motif(s) found and search exhausted.Butterfly: chosen motif transactions (one row per motif edge per solution):solution tx_id src_account_id src_name dst_account_id dst_name amount ts_min0 0 1 1 SourceShellCorp 2 HubAccA1 9000 5...11 1 12 9 HubAccB3 10 OffshoreLLC 7780 44motif_smurf_army.pyhitsMAX_SMURF_MOTIFS=5because the bundled data admits multiple sum-target cohorts: the planted five-distinct-BO cohort plus cohorts that swap one of the dup-BO decoys (47 or 48, bothbo_id=1706) in for one of the planted smurfs.distinct_bo_icis the IC that prevents the {47, 48, + three planted} cohort — which sums in-range — from being returned. The first cohort enumerated by the solver is typically the planted one:Solve result:• status: SOLUTION_LIMIT• num_points: 5Result: 5 smurf-army cohort(s) found; hit MAX_SMURF_MOTIFS=5.More may exist -- raise the limit to enumerate further.Smurf army: chosen smurf accounts per solution (with distinct BOs and jurisdictions):solution smurf_id smurf_name bo_id kyc_tier jurisdiction0 0 42 Adrian Park 1701 retail US1 0 43 Mira Volkov 1702 retail UK2 0 44 Dimitri Solanki 1703 private Cayman3 0 45 Sofia Reyes 1704 retail Singapore4 0 46 Tomas Hellman 1705 retail Germany...motif_kyc_burst.pyfinds two valid 5-account cohorts that satisfy the retail floor (4 retail, all within 60 min) — the planted cohort with Brookline Industries plus an alternative-business swap (Newland Capital Inc):Solve result:• status: OPTIMAL• num_points: 2KYC-burst: chosen burst accounts per solution (with KYC tier and jurisdiction):solution burst_id burst_name kyc_tier jurisdiction bo_id0 0 52 Trent Hayes retail US 22011 0 53 Samira Choudhury retail UK 22022 0 54 Diego Ramos retail Cayman 22033 0 55 Nikhil Tan retail Singapore 22044 0 56 Brookline Industries business US 22055 1 52 Trent Hayes retail US 22016 1 53 Samira Choudhury retail UK 22027 1 54 Diego Ramos retail Cayman 22038 1 55 Nikhil Tan retail Singapore 22049 1 57 Newland Capital Inc business Germany 2206
Template structure
.├── README.md├── pyproject.toml├── model_setup.py # shared ontology + data load├── motif_butterfly.py # per-hub flow conservation├── motif_smurf_army.py # pairwise distinct BOs + sum-target├── motif_kyc_burst.py # cardinality on chosen subset└── data/ ├── accounts.csv ├── transactions.csv └── generate.py # deterministic generator (one-time, not runtime)How it works
The three motifs share an Account / Transaction ontology defined in model_setup.py. Each runner adds its own decision-valued properties, builds its own Problem, and constrains it with a different class of CSP technique. The “What you’ll build” section above lists every constraint per motif; below is the load-bearing CSP IC for each, with the part that rules / paths / graph reasoning cannot express called out explicitly.
Butterfly: per-vertex aggregate equality over a decision-selected edge subset
For every account the solver picks as a hub, the dollar amount it receives via motif edges must equal what it forwards, within CONSERVATION_TOLERANCE_DOLLARS. The constraint is arithmetic over a decision-selected subset of edges — it cannot be evaluated until the solver has chosen which transactions are in the motif and which accounts are hubs. Written in big-M form so the constraint is active when is_hub == 1 and vacuous when is_hub == 0, with the M coefficient computed from the data:
T_out = Transaction.ref()conservation_pos_ic = model.where(Transaction.dst == Account, T_out.src == Account).require( sum(Transaction.amount_dollars * Transaction.is_motif).per(Transaction.dst) - sum(T_out.amount_dollars * T_out.is_motif).per(T_out.src) + CONSERVATION_BIG_M * Account.is_hub <= CONSERVATION_TOLERANCE_DOLLARS + CONSERVATION_BIG_M)A path enumeration sees one walk at a time and never the joint condition; a rules-only encoding can sum-per-vertex but cannot bind that sum to “the chosen subset of edges, where the chosen accounts are hubs.”
We use big-M rather than a half-reified
implies(Account.is_hub == 1, ...)for this constraint. Half-reification introduces a free Boolean auxiliary per non-hub account that MiniZinc treats as part of the search space, returning thousands of trivially-distinct solutions for the same role/motif assignment. Big-M with a data-computed bound has no auxiliary, so enumeration stays clean and the solver exhausts after the data’s actual motifs.
Smurf army: pairwise constraints over a decision-selected vertex subset
Among the N chosen smurfs, no two may share a beneficial owner. The constraint is over the selected subset, not an edge filter: an account is allowed to exist in the same bo_id cluster as another, but cannot be a smurf in the same cohort. A second Account.ref() handle plus the bare Account lets the constraint range over ordered pairs, with Account.id < S2.id to avoid double-counting:
S2 = Account.ref()distinct_bo_ic = model.where(Account.id < S2.id, Account.bo_id == S2.bo_id).require( Account.is_smurf + S2.is_smurf <= 1)Pre-filtered pair tables can’t bind to “the K-subset the solver itself picks.” Combined with sum-equals-target on the chosen smurf-tx amounts and a tight pairwise time window, this motif’s CSP shape is two joint conditions on the same chosen subset: the sum must hit a known launder target while the pairwise distinctness holds on the same chosen set.
KYC-mix burst: cardinality over a decision-selected vertex subset
Among the N chosen burst accounts, at least RETAIL_FLOOR must be retail-tier. Rules can label each account’s tier; only CSP enforces the count over the selected subset:
retail_floor_ic = model.require( sum(Account.is_burst).where(Account.kyc_tier == "retail") >= RETAIL_FLOOR)The launder-grade burst signature is “K accounts with a retail-tier floor transacting together in a tight window.” A graph-pattern matcher can find the time-window cluster but has no way to express “and at least M of the chosen K are retail-tier.”
Solver call and enumeration
Every runner calls problem.solve("minizinc", time_limit_sec=60, solution_limit=...). The variable subconcept returned by solve_for(...) exposes a .values(solution_index, value) relationship that indexes per-solution outputs; filtering on value == 1 surfaces the rows the solver picked. The populated property reflects only the first solution, so for multi-solution output the inspect blocks always go through .values(...).
These are pure satisfaction problems with no objective, so status: OPTIMAL in the solve_info().display() output means the solver enumerated all feasible motifs up to solution_limit. It is not an optimisation verdict. SOLUTION_LIMIT means the limit was hit before enumeration exhausted — raise MAX_*_MOTIFS to surface more.
Customize this template
- Use your own data by replacing the two CSV files with your accounts and transactions; every runner picks them up automatically. The shared ontology in
model_setup.pyreads columns by name (id,name,bo_id,kyc_tier,jurisdiction,tx_id,src_id,dst_id,amount_dollars,ts_minutes). Drop a column you don’t have by deleting the correspondingAccount.<name> = model.Property(...)line and any constraint that references it. - Test against real public data. The bundled CSVs are synthetic and tuned to make the three motifs visible. For production-style validation, point the runners at a slice of IBM AMLworld HI-Small / LI-Small (CDLA-Sharing-1.0; the dataset paper is Altman et al., NeurIPS 2023). AMLworld natively labels the eight-pattern taxonomy used in the butterfly motif (scatter-gather), and a ~500-2000 transaction slice will solve in seconds. KYC tiers and jurisdictions are not in AMLworld natively; the smurf-army and KYC-burst runners would need either synthetic augmentation columns or a different vertex category mapped from AMLworld’s
Bank/Payment Formatcolumns. - Raise the solution limit on a real ledger. Each runner’s
MAX_*_MOTIFSis sized for the demo. On a production ledger with thousands of accounts you may want 100+ so the analyst inbox surfaces the full population of candidates. Thetime_limit_secargument toproblem.solveis your safety net. - Point the smurf and burst runners at a different target merchant by editing
SMURF_TARGET_DESTINATION_IDinmotif_smurf_army.pyandBURST_TARGET_DESTINATION_IDinmotif_kyc_burst.py. On a real ledger you would set these from your watchlist of known-launder destinations. - Tune the thresholds by editing the per-runner constants. The FinCEN CTR threshold (
AMOUNT_THRESHOLD_DOLLARS = 10_000) lives inmodel_setup.py. Per-hub conservation tolerance, smurf target dollars, smurf window, burst window, and retail floor are at the top of each motif’s runner. - Add a temporal-ordering IC to the butterfly (so hubs must receive before they forward) by declaring two
Transaction.ref()handles and a pairwise time constraint on hub-incoming vs hub-outgoing motif edges. Useful when your scheme runs on a known cadence. - Sweep the smurf target by parameterizing
SMURF_TARGET_DOLLARSand re-running; alternatively, replace the equality with a minimum cumulative deposit constraint (>= SMURF_TARGET_DOLLARS - tol) if your watchlist tracks “at least $X laundered through this merchant” rather than a known exact total. - Add a jurisdiction-diversity floor to the KYC-mix burst with a per-jurisdiction sum + a global count: declare an auxiliary
Account.is_burst_in_<j>per jurisdictionjreified asis_burst * (jurisdiction == j), then constrain the count of jurisdictions with at least one chosen account to be>= 3. The bundled data already plants four distinct jurisdictions in the burst cohort; this would make that distribution a hard constraint.
References
- Altman et al., Realistic Synthetic Financial Transactions for Anti-Money Laundering Models, NeurIPS 2023. The IBM AMLworld paper. Defines the canonical eight-pattern AML topology taxonomy (fan-in, fan-out, scatter-gather, gather-scatter, cycle, bipartite, stack, random) and explicitly notes “total in equals total out” for the conservation patterns — the property the butterfly motif here encodes. Public dataset releases at IBM/AML-Data under CDLA-Sharing-1.0.
- Starnini et al., Smurf-Based Anti-Money Laundering in Time-Evolving Transaction Networks, ECML PKDD 2021. Time-window plus flow-balance signal in real-world transaction graphs (>180M transactions, >31M bank accounts).
- Pareja et al., The Shape of Money Laundering: Subgraph Representation, arXiv:2404.19109. Subgraph-classification benchmark on Elliptic2; multi-leg / multi-account topologies.
- FATF (Financial Action Task Force) Typologies Reports — canonical typology source for placement / layering / integration stages, including the structuring and smurf-army patterns the motif runners encode.
- FinCEN Currency Transaction Report (CTR) regulation, 31 CFR § 1010.311 — the $10K reporting threshold whose evasion drives the structuring typology.
- Tookitaki, Smurfing and Structuring: AML Detection and Reporting. Practitioner guidance on the KYC-tier-diverse smurf-army profile that the KYC-mix burst motif encodes.
Troubleshooting
Solver returns INFEASIBLE
- Check data quality first when using your own CSVs. Five silent failure modes can produce INFEASIBLE or spurious motifs without any error message: (a) dangling foreign keys —
src_idordst_idvalues not present inaccounts.csvare silently dropped fromTransaction.src/Transaction.dst, narrowing the edge graph; (b) duplicate primary keys — duplicateidinaccounts.csvortx_idintransactions.csvcollapse rows on the identify-by join, discarding all but one row’s property values; (c) self-loop transactions (src_id == dst_id) — the same account satisfies both source and destination roles, which can trivially clear flow-conservation ICs with no actual flow; (d) non-positive amounts — zero or negativeamount_dollarspasses the< AMOUNT_THRESHOLD_DOLLARSfilter, admitting economically meaningless motif edges; (e) negative timestamps — shift the burst/window IC arithmetic. A quick pandas sanity check before running:assert acc["id"].is_unique,assert tx["tx_id"].is_unique,assert tx["src_id"].isin(acc["id"]).all(),assert tx["dst_id"].isin(acc["id"]).all(),assert (tx["src_id"] != tx["dst_id"]).all(),assert (tx["amount_dollars"] > 0).all(),assert (tx["ts_minutes"] >= 0).all(). - The data may not contain a structurally valid instance of the motif you ran. For the butterfly, you need at least one account that fans out to K distinct accounts (under threshold each) which then converge on a single destination, where each hub’s incoming amount matches its outgoing within
CONSERVATION_TOLERANCE_DOLLARS. For the smurf army, you need N source accounts with pairwise-distinct BOs whose deposits to a single target sum toSMURF_TARGET_DOLLARSwithin tolerance and all fall withinSMURF_WINDOW_MINUTES. For the KYC-mix burst, you need N accounts to a single target whose retail-tier count satisfies the floor constraint and all fall withinBURST_WINDOW_MINUTES. - Confirm the target-merchant id (
SMURF_TARGET_DESTINATION_ID,BURST_TARGET_DESTINATION_ID) matches an account that actually receives the candidate edges in your data. - Relax constraints one at a time (raise tolerance / window, drop K, raise the threshold) to confirm whether the data or a specific constraint is the bottleneck.
- Beneficial-ownership data inconsistencies: for the butterfly, if every account has a unique
bo_idno pair can be hubs together; for the smurf army the opposite — with too many accounts sharing abo_idyou may not get N pairwise-distinct candidates. - KYC-mix burst data inconsistencies: if no four candidate accounts to the target are retail-tier, the retail floor is unreachable.
How many motifs / cohorts will each runner return?
- Up to
MAX_*_MOTIFS(10/5/5 by default for butterfly/smurf/burst) or however many feasible motifs exist in the data, whichever is smaller.solve_info().num_pointsreports the actual count after the solve. On the bundled data, butterfly exhausts at 2 motifs and KYC-burst at 2 cohorts; the smurf runner hits the limit (5) because pairwise-distinct-BO cohorts can swap any chosen smurf for a dup-BO decoy that still sums in-range — thedistinct_bo_icis the constraint that prevents the dup-BO cohort, and seeing it bind is the point of running the smurf demo. - Solution ordering is not guaranteed across runs or solver versions; the set of motifs is, but solution index 0 may swap with solution index 1 between runs. Treat the
solutioncolumn as a label, not a ranking. - To get a ranked answer (e.g. surface the largest scheme first), switch to optimisation —
problem.maximize(sum(Transaction.is_motif * Transaction.amount_dollars))returns the motif with the largest total laundered amount undersolution_limit=1. For a top-K ranked list, run an iterative exclusion-cut loop (re-solve after forbidding each previous motif’s edge set) or sort the enumerated motifs in post-processing.
Import error for relationalai or model_setup
- Confirm your virtual environment is active:
which pythonshould point to.venv. - Reinstall dependencies:
python -m pip install .. - The motif runners import from
model_setupas a sibling module. Run them from inside the template directory (so the directory is on Python’s path) orpython -m motif_butterflyfrom the parent.
Authentication or configuration errors
- Run
rai initto create or update your RelationalAI/Snowflake configuration. - If you have multiple profiles, set
export RAI_PROFILE=<your_profile>.
MiniZinc solver not available
- This template uses the MiniZinc constraint solver. Ensure the RAI Native App version supports MiniZinc.
- HiGHS is not appropriate here — this is a discrete satisfaction model with categorical decisions and structural propagation, not LP/MILP.