Share this post! | Vote this! |
|
The Huffman encoding algorithm is an optimal compression algorithm when only
the frequency of individual letters are used to compress the data. (There are
better algorithms that can use more structure of the file than just letter
frequencies.) The idea behind the algorithm is that if you have some letters
that are more frequent than others, it makes sense to use fewer bits to encode
those letters than to encode the less frequent letters.
For instance, take the following phrase: "ADA ATE APPLE". There are 4 As, 1 D, 1 T, 2 Es, 2 Ps, 1 L, and 2 spaces. There are a few ways that might be reasonable ways of encoding this phrase using letter frequencies. First, notice that there are only a very few letters that show up here. It would be silly to use chars, with eights bits apiece, to encode each character of the string. In fact, given that there are only seven characters, we could get away with using three bits for each character! If we decided to simply take that route, that would require using 14 * 3 bits, for a total of 42 bits more...
For instance, take the following phrase: "ADA ATE APPLE". There are 4 As, 1 D, 1 T, 2 Es, 2 Ps, 1 L, and 2 spaces. There are a few ways that might be reasonable ways of encoding this phrase using letter frequencies. First, notice that there are only a very few letters that show up here. It would be silly to use chars, with eights bits apiece, to encode each character of the string. In fact, given that there are only seven characters, we could get away with using three bits for each character! If we decided to simply take that route, that would require using 14 * 3 bits, for a total of 42 bits more...
There's a nice implementation of this (and a bunch of other algorithms you mention) at http://www.learninglover.com/examples.php?id=19
ReplyDelete