NIST

Aho-Corasick

(algorithm)

Definition: A multiple string matching algorithm that constructs a finite state machine from a pattern (list of keywords), then uses the machine to locate all occurrences of the keywords in a body of text.

Generalization (I am a kind of ...)
string matching.

See also Commentz-Walter works from the back, like Boyer-Moore.

Note: Construction of the machine takes time proportional to the sum of the lengths of the keywords. The machine is used to process the text string in a single pass. The number of transitions made by the machine is independent of the number of keywords.

Author: VO

Implementation

Bruce Watson's SPARE Parts, a string pattern recognition toolkit (C++).

More information

Alfred V. Aho and Margaret J. Corasick, Efficient string matching: an aid to bibliographic search, CACM, 18(6):333-340, June 1975.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 15 July 2019.
HTML page formatted Mon Jul 15 12:55:43 2019.

Cite this as:
Vadim Okun, "Aho-Corasick", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed. 15 July 2019. (accessed TODAY) Available from: https://www.nist.gov/dads/HTML/ahoCorasick.html