Skip to content

relationalai.std.strings.levenshtein()

levenshtein(string1: str|Producer, string2: str|Producer) -> Expression

Calculates the Levenshtein distance between two strings, which measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into the other. If string1 or string2 is a Producer, then levenshtein() filters out non-string values. Must be called in a rule or query context.

NameTypeDescription
string1Producer or Python str objectThe first string.
string2Producer or Python str objectThe second string.

An Expression object.

Use levenshtein() to calculate the distance between pairs of strings:

import relationalai as rai
from relationalai.std import aggregates, strings
# =====
# SETUP
# =====
model = rai.Model("MyModel")
Person = model.Type("Person")
with model.rule():
Person.add(id=1).set(name="Alice")
Person.add(id=2).set(name="Alicia")
Person.add(id=3).set(name="Bob")
Person.add(id=4).set(name=-1) # Non-string name
# =======
# EXAMPLE
# =======
# Set a multi-valued most_similar_to property on each person to other people
# whose names have the smallest Levenshtein distance from their own.
with model.rule():
person, other = Person(), Person()
person != other
# Calculate the Levenshtein distance between the names of each pair of people.
dist = strings.levenshtein(person.name, other.name)
# Filter to others with smallest distance per person.
aggregates.bottom(1, dist, per=[person])
# Set the most_similar_to property to the other people with the smallest distance.
person.most_similar_to.extend([other])
# Since levenshtein() filters out non-string values, the most_similar_to property
# is not set for the person with id=4.
with model.query() as select:
person = Person()
response = select(
person.id,
person.name,
person.most_similar_to.id,
person.most_similar_to.name
)
print(response.results)
# id name id2 name2
# 0 1 Alice 2 Alicia
# 1 2 Alicia 1 Alice
# 2 3 Bob 1 Alice