-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0|691|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
SoCoder -> Article Home -> Advanced Techniques


 
HoboBen
Created : 21 January 2012
Edited : 27 January 2012
Language : Monkey

Obstacks for low memory and high performance



This article applies to C. In garbage collected languages, this is probably what NOT to do

When implementing a low-level data structure, e.g. linked list, hashmaps, it's common that individual elements are less than the smallest size malloc can allocate, (16 bytes on a 32 bit platform, 24 on a 64 bit platform).

For example a singly linked list of one pointer & one integer would use only 8 bytes per node, representing 100% wastage.

To a lesser extent, even medium-sized elements can also suffer some wasted space due to malloc's memory overhead for bookkeeping.

Additionally, multiple calls to malloc for small objects is extremely inefficient because system calls are slow.

Often it is better to allocate an entire chunk of memory in one go, and also free it in one chunk. This is only easy when the memory can be treated as a stack (i.e. LIFO/FILO). When the stack is exhausted, a new chunk of memory is added.

So not only does this reduce memory requirements, it reduces memory fragmentation, increases locality of reference, and decreases the average time to "allocate" a new structure.

malloc internally can only request blocks of a certain size from the OS (usually about 4kb), so in cases where you are treating large chunks of memory as a stack, growing the stack should be no more expensive than the worst case under malloc.

The GNU C Library provides such functionality with Obstacks. If you want to be more portable then you may like to write your own (it wouldn't be very complicated -- if you don't care about growing the stack you can do this in three lines of code).

For the purpose of demonstration, Obstacks will do fine.

As an example, here's a benchmark of a linked list implementation that uses malloc to allocate 12-bytes nodes in the first instance, and sources from obstack_alloc in the second instance.

The benchmark shows how long it took to apply merge sort to a lists of different sizes.





Obstacks can complete some of the tests in about half the time. That's a big win!

This is due to better memory locality.

I'm surprised Obstacks are barely mentioned online. Probably because they rely on the GNU C Library.

This was just a quick write up; I might do a more thorough job later.

When implementing your own, care must be taken to respect memory alignment.

 

Comments


Saturday, 21 January 2012, 00:44
HoboBen
Important note, as this only works when memory can be treated as a stack (i.e. you can't free in the middle of it, only at the end), it is sometimes unsuitable and you have to fall back to malloc, which to be fair is incredibly clever about how to reuse memory and attempts to reduce fragmentation.

In the linked list example, the linked list is backed by an inactive list, which *is* treated like a stack. This has an important limitation in that the list cannot shrink unless you empty the entire list first (or "defrag" it).

If you're only growing, mostly only growing, or using a set of near-constant size, obstacks are a big advantage.

However if you are growing and shrinking a set, then this means you will always be holding the maximum size of the set inactive in memory. This is still probably a good trade off.