Memory Fragmentation

-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0\|519\|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
Socoder -> On Topic -> Memory Fragmentation
Thu, 01 Jul 2010, 20:59
HoboBen	I've only very recently become aware of how much a problem memory fragmentation can be in your code. I don't care about wasting a few MB of RAM, but I do care about the performance costs. One tip is, "if the nodes of a linked list are allocated consecutively in memory, this improves locality of reference and enhances data cache performance during traversal of the list. If the memory pool's free space is fragmented, new nodes will be spread throughout memory, increasing the number of cache misses." But I'm mostly concerned about the way I'm using strings. Obviously the implementations vary by language, but in C you would define a fixed-size buffer. I used to despise it, but now I realise how beneficial it is in reducing memory fragmentation. In languages where strings are of arbitrary length (such as blitz, cobra, PHP, whatever!), how are strings handled? If my strings are updated so that they change length often, how does this affect memory fragmentation? Any ideas how to mitigate this? Pad strings with null chrs? Or do you think it would even be worth rolling out some functions for fixed-size strings using banks? Any other ideas on reducing memory fragmentation? I know some C programmers even go as far as to use alternative "malloc" implementations! -=-=- blog \| work \| code \| more code
Fri, 02 Jul 2010, 04:09
Jayenkai	I don't think I've had any particular "Fragmentation" issues, or at least, never anything noticable. I'm lazy in Blitz. I let Blitz do everything itself. It's a quick language, and that's exactly how I use it. But when doing things on the DS, you better bet I'm watching every little bit of memory!! The DS is really complex for memory, especially if you're going to cram a 170,000 word wordlist in there at the same time!! I usually make sure I've got decent sized chunks. My Strings are usually defined as 1024 chrs, Arrays go well over (round up to 10, round doubly if 2x arrays) and I also tend to have a batch of "temporary" variables predefined. Even most of my functions will reuse the temporary variables, rather than constantly trying to make fresh local ones. I take no chances. I account for all the space, and I leave it exactly where it is. I've found myself trying to accomplish the same approach on the iPod, but it's tricker than I was hoping. Stupid garbage collection keeps randomly flinging bits and pieces away! Gah!! -=-=- ''Load, Next List!''
Sat, 03 Jul 2010, 11:23
JL235	Most languages handle strings as fixed size character arrays, the only difference with C is that this is exposed to you. The functional languages handle them as a list of characters (which in turn is often built out of tuples), but this is typically because they do not support fixed size arrays. Strings should be immutable, their contents should never change. Altering a string should produce an entirely new string, so don't go down the path of updating a strings contents with new values. Old strings are simply free'd when no longer needed. The buffer you define for holding the characters should never change shape, and should also never need to be padded with extra space. My advice would be to not worry about memory fragmentation as malloc and free will handle this for you. You could however research different implementations of malloc/free as different implementations work efficiently under different situations. You could also look into some existing GC libraries as they probably handle memory directly. But if you must go down this route then this is how I would implement it, and it presumes your not offering a garbage collector (although it into one would be trivial). First you'd malloc a big chunk of memory and breaking this up into smaller chunks. As you assign memory you work left to right across the memory assigning sections. When you run out of memory you then defragment the memory, and if your still short you re-alloc your memory into a bigger chunk of memory and continue assigning. You would also keep a list of the memory locations you've assigned, and if they are freed by the user you would move them into a second 'freed memory list'. For defragmentation you'd first stop the VM. Then you would move across your memory from right to left block by block and copy all the used memory across onto the freed memory. Compacting it down to one end. As you do this you also store a map from the old memory locations to their new ones. Next you need to update all the pointers in use in he users app, so you search from the stack to all pointers/references in use. When you find one, you use the map to update it's value with the new memory location. You also need to follow references and search within them for references that need to be updated. If two or more blocks reference each other (a cyclic route) then you will continuously be following each back and forth forever. To prevent this you need to keep a record of which blocks have been searched, so you only search them once. The difference with a GC is that you would not have freed memory. Instead you search the stack before compacting in order to find all the blocks which are no longer referenced. Those blocks are your freed blocks. The JVM used to work how I described it above (I believe it's a mark-and-sweep), but now it uses a tri-color garbage collector with multiple-generations. The generations refers to having seperate sections of memory that relate to your prediction of how likely they are to be freed. i.e. constants are in one set of memory, objects that last a long time are in a second and objects that live for very little time are in a third. This avoids going over the memory of objects that probably won't be garbage collected and allows you to only GC small sections of memory at a time (rather then all of it everytime). All new objects are placed into the first generation and if they survive a garbage collection (or survive multiple times) then they are copied into a second generation. If compacting the first generation doesn't provide you with enough memory; only then do you compact the second generation, and then the third and so on. Objects that survive GC'ing the second generation are moved up into the third, and so on. Constant-like data (like a class definition) are typically store in a 'grandfather' generation which is always the last generation, and in practice should never need GC'ing. But a class might no longer be needed by the VM (this is possible in Java) and so there is no reason why this can't be GC'd. But again, GC'ing the grandfather generation should typically fail to find any memory that can be freed. The tri-color refers to a technique of marking objects as black, grey and white to state if it wont, might be or will be garbage collected. It allows incremental garbage collection (which mark-and-sweep doesn't) and this is known as a low pause GC (these are good for real-time systems). It also allows large sections of memory to be marked as being safe or not-safe to free based on the objects around it. You can find more on Wikipedia.
Sat, 03 Jul 2010, 11:53
HoboBen	Thanks very much for taking the time to write that out, DD! -=-=- blog \| work \| code \| more code