Definition: Using a fixed dictionary, encode words in text with strings having many repeated characters, typically an asterisk or "star" (*).
See also Huffman coding, Burrows-Wheeler transform.
Note: This transformation does not compress the text, but prepares it for compression. With many runs of stars, the encoded text may be much more compressible than the original text. The sender and the receiver must communicate the dictionary somehow.
The code string for a word is the same length as the word. Kruse and Mukherjee constructed a dictionary "from a large set of reference texts" and assigned codes solely by frequency. Mark Nelson built the dictionary from the input text and assigned codes based on the word.
Holger Kruse and Amar Mukherjee, Data compression using text encryption, Proc. Data Compression Conference, Snowbird, Utah, USA, pp 447-, March 1997.
Holger Kruse and Amar Mukherjee Preprocessing text to improve compression ratios, Proc. Data Compression Conference, Snowbird, Utah, USA, pp 556-464, April 1998.
If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.
Entry modified 14 August 2008.
HTML page formatted Tue Dec 6 16:16:32 2011.
Cite this as:
Paul E. Black, "star encoding", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 14 August 2008. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/starEncoding.html