Definition: Searching for approximate (e.g., up to a predefined number of symbol mismatches, insertions, and deletions) occurrences of a pattern string in a text string. Preprocessing, e.g., building an index, may or may not be allowed.
Also known as approximate string matching.
Specialization (... is a kind of me.)
Ratcliff/Obershelp pattern recognition, phonetic coding, Jaro-Winkler, Levenshtein distance, string matching with mismatches.
See also string matching, inverted index.
Note: From Algorithms and Theory of Computation Handbook, page 13-18, Copyright © 1999 by CRC Press LLC. Appearing in the Dictionary of Computer Science, Engineering and Technology, Copyright © 2000 CRC Press LLC.
For large collections that are searched often, it is far faster, though more complicated, to start with an inverted index or a suffix tree.
Qi Xiao Yang, Sung Sam Yuan, Lu Chun, Li Zhao, and Sun Peng, Faster Algorithm of String Comparison 2001.
If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.
Entry modified 28 February 2011.
HTML page formatted Tue Dec 6 16:16:33 2011.
Cite this as:
Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, "string matching with errors", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 28 February 2011. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/stringMatchwError.html