NIST

string matching with errors

(algorithm)

Definition: Searching for approximate (e.g., up to a predefined number of symbol mismatches, insertions, and deletions) occurrences of a pattern string in a text string. Preprocessing, e.g., building an index, may or may not be allowed.

Also known as approximate string matching.

Specialization (... is a kind of me.)
Ratcliff/Obershelp pattern recognition, phonetic coding, Jaro-Winkler, Levenshtein distance, string matching with mismatches.

See also string matching, inverted index.

Note: From Algorithms and Theory of Computation Handbook, page 13-18, Copyright © 1999 by CRC Press LLC. Appearing in the Dictionary of Computer Science, Engineering and Technology, Copyright © 2000 CRC Press LLC.

For large collections that are searched often, it is far faster, though more complicated, to start with an inverted index or a suffix tree.

Author: CRC-A

Implementation

See implementations at string matching. Steven Skiena's Stony Brook Algorithm Repository (C and Pascal)

More information

Qi Xiao Yang, Sung Sam Yuan, Lu Chun, Li Zhao, and Sun Peng, Faster Algorithm of String Comparison 2001.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 26 July 2021.
HTML page formatted Mon Jul 26 11:58:06 2021.

Cite this as:
Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, "string matching with errors", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed. 26 July 2021. (accessed TODAY) Available from: https://www.nist.gov/dads/HTML/stringMatchwError.html