Data Compression

LZ77 and LZ78

LZ = Lempel Ziv

Replaces repeated occurrences of data with references.

Huffman Coding

The algorithm is based on the frequency of occurrence of the data item(byte). The most frequent data items will represented and encoded with a lower number of bits.

The main idea of the algorithm is create a binary tree, called Huffman tree, based on the bytes frequency on the data, where the leafs are the bytes symbols, and the path from the root to a leaf determines the new representation of that leaf byte.

Using the text ABCBAACD as example and applying those steps, we have the following tree:

So the new representation of the bytes on the text are:

  • A: 0
  • B: 10
  • C: 111
  • D: 110


The original representation has 8 bytes(64 bits) and the new representation have only 9 bits, that is 86% smaller than the original.



Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store