order-preserving Huffman coding

(algorithm)

Definition: A variable-length character coding based on the frequency of each character. The algorithm is similar to Huffman coding, but the trees are kept in the same order as the characters. Two adjacent trees with the least combined frequency are joined as subtrees of a new root. As with Huffman coding, that new tree is assigned the sum of the subtrees' frequencies. Repeat until all characters are in one tree.

Aggregate child (... is a part of or used in me.)
full binary tree, priority queue, greedy algorithm.

See also Huffman coding.

Note: This may not produce a true Huffman code. Although encoded messages may be up to twice as long as true Huffman codes, order-preserving Huffman codes are "quite effective for most data." [Graef06, page 5]

There are algorithms to produce optimum order-preserving prefix codes, but they are more complicated.

Author: PEB

Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 5 December 2006.
HTML page formatted Wed Oct 30 12:15:31 2024.

Cite this as:
Paul E. Black, "order-preserving Huffman coding", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed. 5 December 2006. (accessed TODAY) Available from: https://www.nist.gov/dads/HTML/orderPreservingHuffmanCoding.html