Have you ever wanted to enumerate through all the permutations of a set in C#?
Given two strings, Levenshtein gives us the distance between these strings expressed as the minimum count of character operations (deletions, insertions, or substitutions) that are necessary to convert one string to the other.
To give a few examples, here are all the Levenshtein distances of the strings “Jack”, “Jake”, “Moe”, “Jo”, “Joe”, “Stew”, and “Stewart”:
|First string||Second string||Levenshtein|
The Levenshtein value of two identical strings is always 0. That is why I have omitted all the rows where the first string is the same as the second string. Also, the Levenshtein function is commutative. That is, Levenshtein(s, t) equals Levenshtein(t, s).
Levenshtein is a rather simple algorithm – but it’s a useful one. It allows us to easily compare anything that can be expressed as a string.
There’s a rather good implementation of the Levenshtein algorithm on the Wikipedia page, but I’ve made a few changes of my own. Why? Well, even though the changes I’ve made are minor, my implementation is faster. In fact, depending on the hardware, my implementation is 5% to 20% faster.
To give an example, I’ve run both implementations (the one on Wikipedia, and mine) on a modern laptop with a 4th generation i7 processor 1073741824 (2^30) times – that is a bit more than a billion calls. This many calls take 00:09:10.1372578 (9 minutes and ≈10 seconds) on the Wikipedia algorithm, and 00:07:32.4470743 (7 minutes and ≈32 seconds) on mine. That’s 17,76% faster – an improvement of about 424 extra calls per millisecond.
So, without further ado, here’s the Levenshtein code: