Skip to content

This feature is currently in Preview.

Encode features

A Graph Neural Network learns from numeric tensors, but the tables that feed it usually carry a mix of column types — identifiers, free-form text, categorical codes, continuous measurements, dates, and so on. Before training, every column has to be turned into the right kind of representation: free text into language-model embeddings, etc. On top of that, not every column is useful as a feature: primary keys, foreign keys, and target columns should be excluded, and in temporal tasks one column needs to be marked as the timestamp used for ordering events. Here you can read more about time columns.

The PropertyTransformer class is how you configure all of this. You group concept fields by their semantic type, list the fields you want to drop, and (for temporal models) declare which column carries time. Any field of any Concept that is not explicitly dropped or annotated with a semantic type will be used as a feature for this Concept in the GNN model. Its semantic type will be inferred to the best of our ability.

The PropertyTransformer is then passed to the GNN constructor via the property_transformer argument. If you don’t pass one, every field of every Concept will be used as a feature.

Each constructor argument corresponds to a semantic type, and the pipeline encodes each column according to the type it’s assigned.

ArgumentSemantic typeWhen to use it
textFree-form textNames, descriptions, comments — anything embedded by a language model.
categoryDiscrete labelsEnum codes, status strings, boolean flags, or numeric IDs that represent categories rather than magnitudes.
datetimeDate / timestampDates and timestamps that carry temporal signal.
continuousContinuous numericReal-valued measurements like price, age, rating, latitude.
integerInteger numericInteger columns that should be treated as numbers, not categories (for example, counts and quantities).
dropExcludedColumns the GNN should ignore — primary keys, foreign keys, target columns, and any other fields that aren’t features.

There is also a special argument:

  • time_col — marks the column the model uses for temporal ordering. Each concept may have at most one time_col, and a time_col field cannot be dropped.

We infer the semantic types of all fields you don’t mention to the best of our ability.

Identifier and foreign-key columns are usually the first thing to drop — letting the model see them as features would let it memorize specific rows instead of learning generalizable patterns. Target columns (the labels you’re predicting) also belong here.

pt = PropertyTransformer(
drop=[
Customer.customer_id,
Product.product_id,
Transaction.customer_id,
Transaction.product_id,
],
)

You can also drop every field of a concept by passing the concept itself instead of a specific field. This is useful when a concept’s role in the graph is purely structural — for example, a join concept whose only purpose is to connect two other concepts via edges, with no per-row attributes worth featurizing.

pt = PropertyTransformer(
category=[Product.product_code],
drop=[Transaction], # drop every field of Transaction
)

When a concept is passed to drop, individual fields of that concept can still be opted back in by listing them under another argument.

The same shortcut works for any argument, not just drop. Pass a concept to set the default semantic type for all of its fields, then list individual fields under another argument to override.

pt = PropertyTransformer(
category=[Customer], # all Customer fields default to category
continuous=[Customer.age], # ...except age, which is continuous
drop=[Customer.customer_id], # ...and customer_id, which is dropped
)

This is useful when most of a concept’s columns share the same type and only a few need different handling.

Mark a column as the time axis with time_col

Section titled “Mark a column as the time axis with time_col”

When your task is temporal — for example, “predict whether a customer will churn after this date” or “forecast next-week sales” — the GNN samples neighbors based on time. Tell it which column carries the timestamp by passing the field to time_col:

pt = PropertyTransformer(
datetime=[Transaction.t_dat],
time_col=[Transaction.t_dat],
)

A column listed under time_col is typically also annotated as datetime so the encoder treats its values as time. The two roles are independent: datetime controls how the column is featurized, while time_col controls how it orders neighbors during sampling. Marking a field as datetime does not implicitly make it the time axis — you have to opt in explicitly.

A concept can carry at most one time_col, while it can have any number of datetime fields. If a concept has multiple timestamp columns, mark only the one that best represents the chronological order of events as the time_col; the rest can stay as plain datetime features.

Put it together: a temporal classification example

Section titled “Put it together: a temporal classification example”

A customer-churn task uses three concepts — Customer, Product, and Transaction — with a mix of categorical, continuous, text, and datetime columns. Drop the keys, group the remaining columns by type, and mark the transaction date as the time axis:

from relationalai.semantics.reasoners.predictive import PropertyTransformer
pt = PropertyTransformer(
drop=[
Customer.customer_id, Product.product_id,
Transaction.customer_id, Transaction.product_id
],
category=[
Customer.active, Customer.club_member_status,
Customer.fashion_news_frequency, Product.product_code
],
continuous=[Customer.age, Transaction.price],
text=[Product.prod_name],
datetime=[Transaction.t_dat],
time_col=[Transaction.t_dat],
)