[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon Dec 30 15:16:23 2003 
Subject:RE: Comparing Garbage Collectors 
From:steve 
Volume-ID:1031230.03 

Hi,

I can't resist adding a few pennyworth to this very interesting GC
discussion.  I also add a description of "speculative" GC which I think
is an attractive alternative.

Julian writes:
>The disadvantage of setting popminmemlim to a large size is that it causes
>the application to have a large memory footprint. Customers would not be
>happy with an application that on startup immediately consumed a significant proportion of their physical memory.

But this all depends on a lot of practical factors.  What do we mean
by "large" or "significant proportion"?  And if the application is
central to the business it typically doesn't matter. 

For example, the evil Java IDE that I am using is, right now, chewing
up 260Mb real memory and 541Mb virtual memory.  Once upon a time, that
would have been unfeasible.  But for my needs, it is an entirely practical
requirement on my desktop machine (900Mb RAM i.e. nothing special.)

So what kind of memory demands does Clementine make?  I presume that's
what Julian is basing his comments on.


> Ideally an application should not even
>draw attention to the fact that it requires tuning (e.g., long pauses)
>and should exhibit sensible behaviour out of the box.

There is a lot of difference between tuning and long pauses, so this
needs clarification.  In my opinion, the real strength of the Poplog
GC is that it is self-tuning.  This makes it very practical.


> > As explained below, the cost of incremental GC would have been very
>> high. The use of sys_lock_heap makes possible something a bit like
>> generational GC, though it is not automated.
>
>We did think about using sys_lock_heap at the time but decided against
>it.

sys_lock_heap, as it stands, is too weak because it does not allow
incremental locking and unlocking.  In addition, insufficient statistics
are accumulated to allow self-tuning.


> Our application has several types of document-like objects that can
>be opened and closed arbitrarily. We could open an object, GC, then lock
>the heap without much problem (the GC should be short since it would
>only include memory from the previous lock point). However, on closing
>we would need to unlock the heap and GC in order to free whatever object
>had been closed before locking the heap again. Since the GC would be
>scanning the entire heap, it could be slow.

There are some interesting gaps here.  It isn't stated that the
documents are large - but presumably that is the problem.  What one
needs to know is whether or not the documents are a complex graph
of teeny interlinked records or contain lots of non-full data.
And when the document is closed, is there any particular reason
why it is difficult to release the non-memory resources?

I hope Julian will fill in the details since Clementine obviously
raises some interesting performance issues.



> > John Gibson worked out a detailed scheme for incremental garbage
>> collection at some time in the mid 1980s.
>
>I think a generational GC (not necessarily an incremental GC) would be
>a useful addition since it reflects the general behaviour that the longer
>an object exists for, the less likely it is to become garbage.

The challenge is to make it self-tuning.  Having used many systems with
"advanced" generational garbage collectors, I'm quite skeptical.  My
suspicion is that the implementors place too much reliance on the
performance of a single implementation tuned for a single system. 


>The heap is split into segments, one of which is the "nursery" where
>all new objects are allocated from. The nursery is GC'd when full -- any
>surviving objects are moved to the next segment leaving the nursery
>empty. When the next segment becomes full, that is GC'd and objects are
>copied to a next (possibly final) segment. Eventually the final segment
>would need to be GC'd just like the heap would be today. The benefit is
>that scanning the relatively small nursery segment is likely to catch
>most of the garbage, meaning that the ration of the number of objects
>reclaimed per cpu cycle is higher than a complete scan of the heap
>would be.

But the gotcha is the need for a write barrier (or less commonly a read
barrier) to incrementally accumulate the forward-pointers so that the
ephemeral scan can be made fast.  Also, it is in practice tricky to
get the segment release algorithm right and this leads to manual tuning.

A very good book on the subject is Jones and Lins' "Garbage Collection:
Algorithms for Automatic Dynamic Memory Management".  My main criticism
is that it does not address the crucial topic of how good language
design allows the programmer (in collaboration with the compiler
writer) to avoid generating garbage in the first place and to make the
scans go fast.  This is, in my opinion, the key to understanding Pop-11's
good performance.

The way Pop-11's design leads to practical performance is a fascinating
topic - but one I won't pursue unless prodded with a pointy stick.  Or
a beer.  That's because I want to persuade someone, somewhere to have a
go at implementing speculative GC.0

The basic concept is simple: when an application pauses to accept input
there is an opportunity to perform a "speculative" garbage collection. 
By "speculative", I mean that it must be possible to instantly abandon the
garbage collection without any noticeable cost.  This is a pretty obvious
idea but I've never seen it done - although I bet it has.

It is easy to come up with a simple-minded implementation.  Start by
duplicating the heap (which can be done speculatively), then garbage
collect that heap in-place (which can be done speculatively), and
if you succeed flip heaps.  It is also easy to come up with more
variants with lots of interesting properties.  Maybe it would make a
good topic for a MSc thesis (hint, hint)?

What are the likely problems?  Well, nothing comes for free - so it
is important not to launch a speculative GC unless it is likely to
reclaim a decent amount of garbage.  This means that an SGC must be
deferred until the heap has grown sufficiently.  It also means that
an SGC should be deferred if the machine is under load from other
competing processes (of equal or lower priority).  So there are some
tuning issues under the hood.  Can it be made self-tuning, I wonder?

But what I like about the idea of SGC is that it is such a direct
way to address the issue of unpredictable pauses in both interactive
applications and daemons.  In each case, we want to reduce the
likelihood of a pause when an input arrives.  So by exploiting the
(relatively) idle period prior to the arrival of an input we get the
effect we want.

It is fair to ask how this would impact on a multi-threaded architecture.
In my opinion, threads are a proven bad idea and should be replaced
by cooperating but isolated virtual machines - and this incidentally
permits independent garbage collections.  However, for the benefit
of the rest of the computing community who have yet to adopt this
enlightened view, I have to admit that background threads diminish the
value of SGC.

Since Poplog has both a copy GC and in-place GC, it actually has most
of the machinery to implement a very straightforward SGC already!

-- 
Steve