# Data Compression

# LZ77 and LZ78

LZ = Lempel Ziv

Replaces repeated occurrences of data with references.

# Huffman Coding

The algorithm is based on the frequency of occurrence of the data item(byte). The most frequent data items will represented and encoded with a lower number of bits.

The main idea of the algorithm is create a binary tree, called Huffman tree, based on the bytes frequency on the data, where the leafs are the bytes symbols, and the path from the root to a leaf determines the new representation of that leaf byte.

Using the text **ABCBAACD **as** **example and applying those steps, we have the following tree:

So the new representation of the bytes on the text are:

- A: 0
- B: 10
- C: 111
- D: 110

## Conclusion

The original representation has 8 bytes(64 bits) and the new representation have only 9 bits, that is **86%** **smaller** than the original.

https://medium.com/stantmob/data-compression-with-huffman-coding-ad7bcb07c5d5