# A FSharp implementation of Huffman compression algorithm

Implementating Huffman algorithm is a good practice to progress in algorithms and is a good opportunity to train on working methods such as TDD.

This article is based on the gist I wrote to train me.

Huffman compression method can be decomposed in 3 steps:

- computing a dictionary with frequency of occurrences
- building a tree representing frequencies and associated path
- storage of tree and path of each occurrence bit per bit

In next examples, we will work on the string “aaabbffppppeee kk aa”

## Computing frequencies

At first, we can imagine a dictionary of char * int containing each char frequency. It don’t like to work with chars when I implement binary data traitment algorithms. So, I created an empty function getting (byte * int) list

Then I wrote a test I executed each time I could compile:

A simple function to work with bytes list could be:

I did want to try compression on real files, so I couldn’t load entire content in a byte list. Next function is computing frequencies from a seekable stream.

## Building the tree

My tree model is simply:

The cost is is representing total occurencies count contained by child branches or leaf. The swith method helps to browse sub branches of a tree node.

When the model is written, we can write tests:

After tests writting, we can implement a function populating the tree from frequencies:

Calculated paths are:

occurrency | path option | binary | frequency |
---|---|---|---|

a | Some({[True;False]}) | 10 | 5 |

p | Some({[False; True]}) | 01 | 4 |

e | Some({[True; True; False]}) | 110 | 3 |

b | Some({[False; False; False]}) | 000 | 2 |

f | Some({[False; False; True]}) | 001 | 2 |

k | Some({[True; True; True; True]}) | 1111 | 2 |

’ ‘ | Some({[True; True; True; False]}) | 1110 | 2 |

This table is demonstrating that implementation is correct. Occurrences whose frequencies are higher have the shortest paths.

A diagram summarizing this table could be like the following:

## Test online to compute a Huffman tree

Try huffman implementation compiled with fable.

occurrency | binary | frequency |
---|---|---|

a | 10 | 5 |

#### Compression summary

## Storage

The occurrency ‘a’ will be coded in 2 bits. We can not write less than 8 bits in a stream. (with the WriteByte method) So I wrote a tiny BitWritter:

The buffer is a simple byte. The write method increases the len and shift bits of buffer. When len is equal to 8 bits, we write the byte in the stream.

To read bit per bit in a stream, I use:

Try this implementation on real files using gist