Huffman coding


Definition: A minimal variable-length character coding based on the frequency of each character. First, each character becomes a one-node binary tree, with the character as the only node. The character's frequency is the tree's frequency. Two trees with the least frequencies are joined as the subtrees of a new root that is assigned the sum of their frequencies. Repeat until all characters are in one tree. One code bit represents each level. Thus more frequent characters are near the root and are coded with few bits, and rare characters are far from the root and are coded with many bits.

Also known as static Huffman coding.

Generalization (I am a kind of ...)
greedy algorithm.

Specialization (... is a kind of me.)
adaptive Huffman coding, k-ary Huffman coding.

Aggregate child (... is a part of or used in me.)
coding tree, full binary tree, priority queue.

See also order-preserving Huffman coding, arithmetic coding, optimal merge, Shannon-Fano coding.

Note: The worst case for Huffman coding (or, equivalently, the longest Huffman coding for a set of characters) is when the distribution of frequencies follows the Fibonacci numbers. (This and other relations came from Alex Vinokur "Fibonacci numbers, Lucas numbers and Huffman codes".)

Joining trees by frequency is the same as merging sequences by length in optimal merge. See the example there. Since a node with only one child is not optimal, any Huffman coding corresponds to a full binary tree.

Huffman coding is one of many lossless compression algorithms. This algorithm produces a prefix code.

Shannon-Fano is a minimal prefix code. Huffman is optimal for character coding (one character-one code word) and simple to program. Arithmetic coding is even more compact, since it can allocate fractional bits, but is more complicated and patents cover some uses.

Author: PEB


build tree in array (C) -- also computes entropy and efficiency.

More information

A survey on data compression, John Morris' explanation, example, and analysis. Wikipedia's encyclopedic article on Huffman coding.

David A. Huffman, A Method for The Construction of Minimum Redundancy Codes, Proceedings of IRE, 40(9):1098-1101, September 1952.

Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 11 January 2021.
HTML page formatted Tue Jan 19 08:57:16 2021.

Cite this as:
Paul E. Black, "Huffman coding", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed. 11 January 2021. (accessed TODAY) Available from: