Levenshtein distance


Definition: (1) The smallest number of insertions, deletions, and substitutions required to change one string or tree into another. (2) A Θ(m × n) algorithm to compute the distance between strings, where m and n are the lengths of the strings.

Also known as edit distance.

Generalization (I am a kind of ...)
string matching with errors.

Aggregate child (... is a part of or used in me.)
edit operation.

See also double metaphone, soundex, Jaro-Winkler, Hamming distance.

Note: (1) From Algorithms and Theory of Computation Handbook, page 14-35, Copyright © 1999 by CRC Press LLC. Appearing in the Dictionary of Computer Science, Engineering and Technology, Copyright © 2000 CRC Press LLC.

Author: CRC-A


Michael Gilleland's Levenshtein distance (Java, C++, Visual Basic), includes a great explanation and links to code in Perl, C, JavaScript, Python, and many more languages. Many implementations (Ada, C++, Lisp, Io, Java, OCaml, Octave, PHP, Python, Ruby, Visual Basic).

More information

Wikipedia entry which has links to implementations. March 2003 pictures of Levenshtein at a reception.

Vladimir I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Doklady Akademii Nauk SSSR, 163(4):845-848, 1965 (Russian). English translation in Soviet Physics Doklady, 10(8):707-710, 1966.
(Doklady is Russian for "Report". Sometimes transliterated in English as Doclady or Dokladi.)

Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 22 June 2015.
HTML page formatted Fri Feb 23 10:06:08 2018.

Cite this as:
Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, "Levenshtein distance", in Dictionary of Algorithms and Data Structures [online], Vreda Pieterse and Paul E. Black, eds. 22 June 2015. (accessed TODAY) Available from: