 # Huffman coding

(algorithm)

Definition: A minimal variable-length character coding based on the frequency of each character. First, each character becomes a one-node binary tree, with the character as the only node. The character's frequency is the tree's frequency. Two trees with the least frequencies are joined as the subtrees of a new root that is assigned the sum of their frequencies. Repeat until all characters are in one tree. One code bit represents each level. Thus more frequent characters are near the root and are coded with few bits, and rare characters are far from the root and are coded with many bits.

Also known as static Huffman coding.

Generalization (I am a kind of ...)
greedy algorithm.

Specialization (... is a kind of me.)
adaptive Huffman coding, k-ary Huffman coding.

Aggregate child (... is a part of or used in me.)
coding tree, full binary tree, priority queue.

Note: The worst case for Huffman coding (or, equivalently, the longest Huffman coding for a set of characters) is when the distribution of frequencies follows the Fibonacci numbers. For this and other relations see Alex Vinokur's note on Fibonacci numbers, Lucas numbers and Huffman codes.

Joining trees by frequency is the same as merging sequences by length in optimal merge. See the example there. Since a node with only one child is not optimal, any Huffman coding corresponds to a full binary tree.

Huffman coding is one of many lossless compression algorithms. This algorithm produces a prefix code.

Shannon-Fano is a minimal prefix code. Huffman is optimal for character coding (one character-one code word) and simple to program. Arithmetic coding is even more compact, since it can allocate fractional bits, but is more complicated and patents cover some uses.

Author: PEB

## Implementation

build tree in array (C) -- also computes entropy and efficiency.

A survey on data compression, John Morris' explanation, example, and analysis. Wikipedia's encyclopedic article on Huffman coding.

David A. Huffman, A Method for The Construction of Minimum Redundancy Codes, Proceedings of IRE, 40(9):1098-1101, September 1952.