Outline of Garbage Collection
For a very detailed discussion of garbage collection, I highly recommend this survey paper. Paul Wilson, Uniprocessor Garbage Collection Techniques, International Workshop on Memory Management, St. Malo, France, September 1992..
Introduction
General Compiler Runtime Support
Type checking.
Bounds checking.
Memory checking.
The Problem of Memory Management.
Keeping track of allocated memory.
Usually in interpreted languages: Lisp, Java, CLR.
Also possible as a library: C and C++.
Manual Approaches to Garbage Collection:
Do nothing!
Arena memory management.
Manual reference counting.
Automatic Garbage Collection
Preliminaries
Memory layout: code - static - stack - heap
What? compiler tracking / conservative
When? stop-the-world / incremental
How? refcount / mark-sweep / copying / generational
Reference Counting
Increment/decrement on first/last reference.
When do references come and go?
Small but repetetive cost.
Optimization: Only examine stack at cleanup time.
Mark and Sweep
Start with set of base pointers.
Chase them to find live data.
When all done, iterate and remove dead data.
Fragmentation and poor locality.
Optimization: mark-compact collection.
Copying
Keep two areas in memory: fromspace and tospace.
Allocate as normal in fromspace.
Copy live items fromspace->tospace
Throw away the fromspace.
Swap fromspace and tospace.
Optimization: Special-case very large objects.
Generational
Short-lived vs. long-lived data.
Move survivors to an "older" generation.
Collect the older generation less frequently.
Garbage Collection in the Real World
Garbage collection in Java:
Pointers are explicitly identified in the JVM.
Fixed size heap allocated at startup.
Optionally call gc() when desired.
finalize() method used to clean up external state.
Garbage collection in C:
Interpose on malloc/free/realloc/etc.
Conservative identification.
Any integer in memory could be a pointer.
Which methods are possible with conservative?
Garbage collection in the OS:
File system after a crash (fsck)
Which data blocks are in use: mark and sweep.
Which inodes are in use: reference counting.