# The Final Compression Project – Compressão e Codificação de Dados¶

Štěpán Pešout
15/01/2022

## Brief description¶

This project works with images and combines lossy and lossless compression techniques.

At first, the image is represented as a three-dimensional integer array, from which three two-dimensional arrays representing each of the color layers are extracted.

Then, two coefficients and an intercept are obtained using linear regression, so that the color values of the blue layer can be computed using only the red and green color layers.

In the red and green layers, similar values are then represented by their median. When iterating over the arrays, the frequencies of occurrence of each color shade in both layers are calculated.

The 256 color shades, for which the frequencies of occurrence are therefore known, serve as the "alphabet" for the Huffman tree construction. This makes it possible to create a dictionary and the original data is translated into binary code, then into an array of bytes, then into characters and stored in a file. The necessary information needed for decompression (color frequencies, regression coefficients, etc.) is also stored.

This work also includes a decompression algorithm that allows the image to be reconstructed. Clearly, due to the use of lossy compression techniques, the exact original image can no longer be restored.

## Function to create a image from the three separate layers¶

• red – red color layer
• green – green color layer
• blue – blue color layer
• show determines if is the image shown or just returned
• function returns image data as PIL.Image

## Regression functions¶

### Function to count regression coefficients¶

• color1 (independent) is usually the red layer
• color2 (independent) is usually the green layer
• dependent is usually the blue layer
• function returns two coefficients and an intercept for computing the dependent layer in the future

### Function to compute color values of a layer¶

• color1 – first color layer (usually the red one)
• color2 – second color layer )(usually the green one)
• color1_cf is the coefficient for multiplying color1 values
• color2_cf is the coefficient for multiplying color2 values
• intercept is the value to add after the multiplication

• function returns computed values for the third layer (usually blue)

## Function to replace similar values by their median¶

• layer is the color layer
• max_error is the value which determines maximal difference of the layer values to be considered as the same
• fill_number is the value to be inserted between two medians
• function returns the modificated layer and the frequencies of the color values

## Functions to handle Huffman tree nodes¶

### Function to create the mode¶

• every node is represented by a list
• left and right are the child nodes
• number is the alphabet symbol
• frequency is the frequency of the number (symbol)

### Function to create a list of nodes¶

• this function creates and returns a list with nodes (it converts the numbers and frequencies into the tree nodes)
• child nodes are not present yet
• frequencies is the list of frequencies of numbers

### Function to get two tree nodes with the least frequency (the least probability of observing)¶

• nodes is the node array

## Function to build the Huffman tree¶

• this function creates the correct hierarchy (parent – children) of nodes according to the frequency
• frequencies is the list of frequencies of numbers
• function returns the root node of the tree

## Function to read the Huffman tree and create a dictionary from it¶

• the function uses recursion to read the tree
• node is the root node
• code_prefix is already retrieved code (when the function calls itself)
• nodes is a list of already retrieved nodes to build the dictionary
• function returns the dictionary or it's fragment

## Function to get the binary from the color layer¶

• the function translates the color values to binary according to the dictionary from Huffman tree
• after that, the array is joined to a string, packed to chunks of the size 8, translated into decimal integers and the result is returned
• layer is the color layer
• frequencies is the list of frequencies

## Function to save binary to a file¶

• name is the file name
• binary is the array of the values to save

## Function to save a recipe to a file¶

• recipe contains important information to decompress the file
• frequencies is the list of frequencies
• width of the original image
• height of the original image
• red_regr_coefficient is a number for multiplying values of the red layer to get the blue one
• green_regr_coefficient is a number for multiplying values of the green layer to get the blue one
• regr_intercept is a number to add after the multiplication to get the blue layer values

## Function to load the compressed file and the recipe¶

• name is the file name
• function returns binary and the recipe array

## Function to translate binary accoring to the dictionary¶

• binary is the array of bits
• dictionary for the translation

## Function to replace zeros by the previous value¶

• layer is the color layer
• recipe is the recipe array for decompression
• function returns the modified layer