-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0|702|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
SoCoder -> Article Home -> Advanced Techniques


 
cthug
Created : 08 August 2009
Edited : 09 August 2009
System : Cross Platform
Language : Monkey

RLE (Run Length Encoding) compression

A simple way of compressing data

I have been writing a simple archiving library to protect my game data (sound, image etc), using a custom format based on TAR archives. Where each file with is compressed then the whole archive is encrypted, compressed using RLE, Run Length Encoding a very simple form of lossless compression. I will now explain the code to Compress and Deflate using my RLE.

The code uses SDL types for specific integer sizes like Uint32 so include SDL header.


Now the compression code


The reason for the variable RSRCBuffer is for memory conservation. In the archive loading function in the lib, the file loading, compression and encryption functions all use this buffer. The only down side of this is that you can only have one archive open at any one time, but this works fine for me .

What this code does is count how many bytes are the same in a row and place the number, then the character in the compressed buffer. Like QQQFFFFZZZ would produce in hexadecimal: 03 51 04 46 03 5A. That compresses from 10 bytes to 6 bytes. This is good for images and other repetitive sources, but not so good for text, this can actually double the filesize if there is no patterns.

Because the count is stored as only 1 byte, it has a max of 255, so every 255, it stores that and starts again. So if you had 300 bytes of 'Q' it would be FF 51 2D 51. This also limits the file compression ratio, it can be anywhere between 1 : 128 or 2 : 1, depending repetitiveness of the source.

|edit|
Now if the compressed version is larger than the original, the original will be returned with a zero at the front. Having a at the beginning is impossible unless specifically placed there so, the defalte function can pick it up easily, just return the value of the file.
|edit|


And the decompression code:


This reverses what RSRC_Compress does, it puts in[n+1] into the buffer in[n] number of times.

|edit|
Changed source to use unsigned bytes,
I just did a test with 900.1KB BMP, compressed to 31KB
|edit|

 

Comments


Sunday, 09 August 2009, 09:13
Mog
I hated RLE so much, it was used in my Infantry project to store media files, and we had such a fun time decoding it. So by explaining the horridness of it, Good tutorial!
Sunday, 09 August 2009, 09:44
Afr0
Why not use zlib and encrypt a zlib compressed buffer instead?
Monday, 10 August 2009, 03:54
cthug
RLE is faster than zlib, but the compression ratio of zlib is far higher in comparison to time taken. I kinda wish I did just use zlib, I thought it would take a half hour and it took an hour longer, and there still a slight bug that I found last night, with signed/unsigned values (loss of 4 bytes in the middle of the code but only with BMP files?). BUT I think that is with my file loading code.