# Optimal Compression for Binary Data

## The Problem

My current project uses 5mb of disk space to save a 2D map. There could be hundreds of these maps. I tried compressing the files, but couldn't get them any smaller than 2mb.

This was an improvement, but still unacceptable. A better approach was needed. This article describes how I managed to compress my 5mb of map data into just **36kb**.

## Understanding Compression

The objective is not to naively throw compression algorithms at the file, but instead to give the file more **compression potential**. A file has better compression potential when it contains larger blocks of identical byte sequences and when there are smaller differences between non-identical bytes.

This can be done in two simple steps.

## The Implementation - Part 1

The first step, very simply, is to organise like-blocks together.

The following code will create files that compress very badly, because each integer written is going to vary wildly:

For i = 0 to n WriteInt(myThing[i].iType) WriteInt(myThing[i].iColor) WriteInt(myThing[i].iCost) Next

In contrast, the following code will write blocks of data that contain lots of consecutive identical values. The resulting file will be much more compressible.

For i = 0 to n WriteInt(myThing[i].iType) Next For i = 0 to n WriteInt(myThing[i].iColor) Next For i = 0 to n WriteInt(myThing[i].iCost) Next

## The Implementation - Part 2

The next step is to write a **pre-compression filter**. This can very simply be done using a **difference filter** or subtraction filter.

This works by writing only the difference between values as they are written, instead of the values themselves.

The following is all it takes:

For i = 0 to n If i = 0 then WriteInt(myThing[i].iType) Else WriteInt(myThing[i].iType - myThing[i-1].iType) Endif Next

As a bonus tip, if you can make sure that you only have positive integer differences, perhaps by adding a large value such as (1 ShiftLeft 30) to each integer written, this becomes much more effective:

Const OFFSET = 1073741824 // (1 Shl 30) For i = 0 to n If i = 0 then WriteInt(myThing[i].iType + OFFSET) Else WriteInt(myThing[i].iType - myThing[i-1].iType + OFFSET) Endif Next

## Further Work

If you are saving data that has two dimensions, e.g. a map or an image, you can improve the compression potential further by using filters such as Average or Paeth that work on more than one dimension. You can even choose the best method to use on a line-by-line basis. This is how PNG images work.

If your data contains real/floating-point numbers, consider reducing their resolution. These numbers are really difficult to compress, so reducing the resolution will help a great deal. Here's an example, including an offset of 1.0, assuming Real values between 0.0 and 1.0:

Const FACTOR = 3600.0 // Reduce to compress, // increase for accuracy. Function CompressReal(r: Real) : Integer Begin result = ToInt((r + 1.0) * FACTOR) End Function DecompressReal(r: Integer) : Real Begin result = (ToReal(r) / FACTOR) - 1.0 If result < -1.0 then result = -1.0 If result > 1.0 then result = 1.0 End

The "compressed" real numbers are then saved as integers:

For i = 0 to n If i = 0 then WriteInt(CompressReal(myThing[i].rFoobar)) Else WriteInt(CompressReal(myThing[i].rFoobar - myThing[i-1].rFoobar))) Endif Next

# Stay Subscribed

Follow me on twitter @golightlyb, subscribe to the RSS feed or get updates by e-mail.

You can also contact me directly - I make an effort to reply to every e-mail.