relationalai.std.strings.levenshtein()

levenshtein(string1: str|Producer, string2: str|Producer) -> Expression

Calculates the Levenshtein distance between two strings, which measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into the other. If string1 or string2 is a Producer, then levenshtein() filters out non-string values. Must be called in a rule or query context.

Parameters

Name	Type	Description
`string1`	`Producer` or Python `str` object	The first string.
`string2`	`Producer` or Python `str` object	The second string.

Returns

An Expression object.

Example

Use levenshtein() to calculate the distance between pairs of strings:

import relationalai as rai
from relationalai.std import aggregates, strings


# =====
# SETUP
# =====

model = rai.Model("MyModel")
Person = model.Type("Person")

with model.rule():
    Person.add(id=1).set(name="Alice")
    Person.add(id=2).set(name="Alicia")
    Person.add(id=3).set(name="Bob")
    Person.add(id=4).set(name=-1)  # Non-string name


# =======
# EXAMPLE
# =======

# Set a multi-valued most_similar_to property on each person to other people
# whose names have the smallest Levenshtein distance from their own.
with model.rule():
    person, other = Person(), Person()
    person != other
    # Calculate the Levenshtein distance between the names of each pair of people.
    dist = strings.levenshtein(person.name, other.name)
    # Filter to others with smallest distance per person.
    aggregates.bottom(1, dist, per=[person])
    # Set the most_similar_to property to the other people with the smallest distance.
    person.most_similar_to.extend([other])

# Since levenshtein() filters out non-string values, the most_similar_to property
# is not set for the person with id=4.
with model.query() as select:
    person = Person()
    response = select(
        person.id,
        person.name,
        person.most_similar_to.id,
        person.most_similar_to.name
    )

print(response.results)
#    id    name  id2   name2
# 0   1   Alice    2  Alicia
# 1   2  Alicia    1   Alice
# 2   3     Bob    1   Alice