LZ77 and LZ78

LZ = Lempel Ziv

Replaces repeated occurrences of data with references.

Huffman Coding

The algorithm is based on the frequency of occurrence of the data item(byte). The most frequent data items will represented and encoded with a lower number of bits.

The main idea of the algorithm is create a binary tree, called Huffman tree, based on the bytes frequency on the data, where the leafs are the bytes symbols, and the path from the root to a leaf determines the new representation of that leaf byte.

Using the text ABCBAACD as example and applying those steps, we have the following tree:

So the new representation of the bytes on the text are:

  • A: 0
  • B: 10
  • C: 111
  • D: 110

Conclusion

The original representation has 8 bytes(64 bits) and the new representation have only 9 bits, that is 86% smaller than the original.

https://medium.com/stantmob/data-compression-with-huffman-coding-ad7bcb07c5d5