[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon Oct 14 11:18:41 1993 
Subject:Automatic Storage Management 
From:Steve Knight 
Volume-ID:931015.07 

There seems to be a lot of interest in the general area of automatic
store management, implemented in POP-11 using garbage collection,
so I thought it might be worth trying to put a R&D perspective
on it.  So here is the first of a couple of short notes on the
value of automatic store management.

> Actually, no.  I am interested in finding out just what advantages
> GC actually might give me (who have so far had no problems keeping
> track of such things myself) for the performance penalty that is
> usually paid for such systems.  I have not said "GC is worthless",
> I have asked "what is it's worth?".  So far the only answer I have
> ever gotten to this question is "you don't have to keep track of
> your dynamically allocated objects".  I have never found this to
> be problematic -- thus never seen any actual ADVANTAGE to be had
> from using GC.  I'm just trying to see if someone else can give
> me a good reason why GC would be better (and worth the performance
> hit) than doing it myself.

Essentially, the claim is that where store management is a difficult 
technical issue, automatic storage management eliminates a rich flux
of potent defects.  In my previous work, my company conducted a
detailed survey of software defects.  Defects were classified into
grades of seriousness -- basically those that could be fixed on the
spot and those that cost real money to fix.  The overwhelming cost
was concentrated in the "very serious" category (i.e. they were
frequent as well as expensive.)  Breaking out this category showed
that there was one category that accounted for over 70% of all
the very serious defects; failures of store management.  And, this
precisely accords with observed reality.

Therefore question Robert Hartman poses should be rephrased as, if 
you credit these results, why isn't the benefit of storage management 
immediately visible to all programmers?  There are essentially 3 reasons

 *  There are (significant) tasks which have very limited
    store management requirements.  These programs rely on
    the operating system's automatic storage management to
    reclaim all resources.

 *  The prevalent use of GC-challenged languages leads to
    a number of undesirable effects.  (1) Poor programming
    techniques, such as allocation of fixed arrays, become
    institutionalised to the extent that they appear in 
    teaching texts.  (2) Programmers painstakingly learn a
    collection of store management techniques which they are
    reluctant to concede they are fallible in applying -- even
    when the result is extremely serious.  Sadly I have seen
    this all too often, even amongst expert programmers.  

 *  The benefits of automatic storage management techniques are
    most visible in complex projects.  It is no coincidence that
    GC-capable languages are *also* famous for their extensive
    reusable libraries.  More on this point later.


> Yes.  As I said, the POTENTIAL advantage of GC seems to lie in
> situations where it is difficult or impossible to determine when
> you are done with an object.

Robert neatly summarises what the action of a GC.  However, the
question is when is it difficult (or simply highly inconvenient)
to determine when you are done with an object?  The most obvious
answer is when you are writing code for a library.  You cannot 
sit down with all your future users (obviously) to discuss store
management with them.  This unfortunately means that you must
export store management primitives to manage all your types 
implemented using dynamic store OR references to dynamic store.

But the problem doesn't end here.  The knock-on effect is that
if (for performance reasons) you need to implement some of your
primitives using store-sharing, you must now consider the possibility
that the library-user will have very many references into your
store.  This means that all references must become dynamically
allocated objects that have their own store management primitives
(this enables a basic reference counting policy).

But now the problem is one of managing hierarchically structured
structures.  For example, when a database is closed, what happens
if a database record is accessed?  The typical, and incorrect,
answer is that an error results.  This is institutionalised 
poor practice.  A good example is provided by X-windows.  When
a parent window is destroyed, so are all of its children - why?
Given that it is possible to reparent a window, why shouldn't
one destroy a parent window, create a new window and reparent
the child window to it?  (The purpose of parentage is to provide
a model of hierarchical visibility and it has been conflated with
store management.)

In short, writing and using libraries becomes a job for software 
experts -- and all because of the lack of automatic store management.
It is no surprise that POP-11 has such an enormous and easy to
use library, whereas standard C doesn't even provide an adequate 
implementation of a list!


> I just have not run across any such situations, and asked for
> some examples that might illustrate to me the realization of
> the potential to be found in GC.

It is necessary to define the *kind* of answers that one might
hope to see in response to this question.  Common sense suggests
that it is not possible to provide an answer that *requires* GC.
After all, what a machine can do, a person is likely to be able to
better -- with enough time.  

No, the answer lies in seeing an example that has definite 
advantages because of the GC capability.  Let's compare string
concatenation in C (strcat) with that in POP-11 (<>) or Lisp.
The question to be tackled is where to put the result of
the string concatenation.  If one lacks automatic storage
management then there are two alternatives

(1) Allocate the memory for the result and demand that the
    user frees the space.  Problem: this may cause severe
    store fragmentation. 

(2) Put the result in the same store each time (i.e. reuse a specific
    store area).  This store area can be correctly resized if
    too small.  Provide the user with a primitive for taking
    a separate copy.  Problem: it is difficult to relinquish the
    underlying store without causing serious performance problems.
    Problem: the interface is more complicated.

Of course, ANSI C finds an institutionalised non-solution - namely
surgically appending to the end of one of the arguments.  This
non-solution suffers from the lethal problem that the supplied
(first) argument may be too small.  [Amusingly, the Pascal I have
on the Mac goes one better -- it simply truncates the offending
string if it is longer than 256 characters!]

In a GC-capable language, we simply allocate the store and that
is the end of the matter.  The common complaint is that this
impacts performance -- a point which is taken up below.

The key point being illustrated here is that the lack of automatic
store management rears its ugly head in the design of even exceedingly 
trivial tasks such as string operations.  In GC-capable languages there
is simply no issue beyond, perhaps, performance.


> For example, in the application I am working on right now, the
> student is on numerous lists (general roster, list of students 
> at a single site, list of students asking a question...)  But
> I >always< know when to free the memory, as he is either registered
> or not registered.  When he signs out, he is removed from all lists
> and the memory freed.  No problem.  What >advantage< would GC give
> me that is worth the performance hit it usually gives?  (You can 
> use other situations as illustration if the one I offered presents
> no real advantage for GC.)

There are several technical points that garbage collection specifically
addresses:

 *  Elimination of dangling pointers [illegal uses of free()]

 *  Elimination of space leaks.  A closely coupled issue is optimal
    dynamic store balancing.

 *  Elimination of store fragmentation.

 *  Amortising the cost of store deallocation.

In the above problem, the life cycle of a student-record is described
so precisely that automatic store management gives no design advantage.
However, it could provide two types of performance enhancement --
store compaction and a reduction in program time associated with
store management.

In this case, the (sketchy) problem outline suggest that defragmentation
would be the more important benefit.  However, it is the latter claim 
that would generate more scepticism.  Garbage collectors (almost)
always compact store, eliminating the pernicious problem of
fragmentation.  In this example, if we assume (and why not?) that
students arrive and leave in arbitrary order, eventually the store
will become severely fragmented.  (This is a slight oversimplification.)
This might mean the program will needlessly fail when performing some
space-hungry computation (e.g. timetabling).  So that is one obvious
possible benefit.

From the outline, the cost of store deallocation sounds like a
non-issue.  However, for the sake of argument, we might suppose
that this is a simulation of all the schools in the U.S.A. to
demonstrate some decisive demographic catastrophe that's coming
down the pike.  This simulation needs to be run across a matrix
of 20 demographic factors each of which are subdivided into 4
quadrants -- OK, so now student registration/deregistration is
a key performance issue.  (And so is store fragmentation, in spades!)

Well, the *last* thing we want to do is free up store immediately.
The "free()" procedure has to do a lot of work squaring away all
the itty-bits and mega-lumps of store we give it.  It is a much
smarter idea to do nothing at all and, at some suitable point,
invoke a garbage collector -- paying the overheads just once.

A typical stop-and-copy GC has the property that the time taken
to garbage collect is proportional to the store that *survives*
the GC.  Thus all the bits of store that we "forgot" about bring
a GC sooner rather than later but don't extend the GC time at all.
And now we play the simple trick of trading (heap) store for
performance -- because the larger the heap the less frequent GC
will be.

[Again, I have oversimplified the story.  If this was all there was
to it, then the GC performance would degrade as the program gets bigger.
And in many systems this is true -- leading to unexpected lack of
scalability.  In POP-11, this is modified by the following techniques:
using non-full store, heap locking, system locking, persist store 
management via garbage collectable file pointers.]

In practice, GC does seem to produce flexible and efficient management 
of store at the expense of smooth execution.  The irregular pauses
introduced by garbage collections are the only significant technical
problem with using GC for all store management.  For that reason,
the "universal" compromise is to augment GC with occasional, explicit store
deallocation.  [I call it the "universal" compromise because it is
what all programmers want (after they've swapped views enough) and
no systems supply!]


In summary, GC technology offers real benefits for many programming
tasks.  Some tasks would be tedious in the extreme without it, perhaps
impossible in practice.  The best compromise would be a system that
provided both expicit and automatic storage management tools.  This
would mean that programmers would not have to retrain when tackling
different problem areas -- saving a great deal of wasted effort.
Furthermore, programmers would have the freedom to adopt the store
management strategy appropriate to the task at hand: automatic
when robustness is important; explicit when store management is
simple and smooth execution is at a premium.  [Garbage collectors
can be very small and live in shared libraries.  So the space overhead
should not be considered an issue here.] 

Steve

P.S.  A final, teasing technical point.  Computers, contrary to common 
belief, are very poor at integer arithmetic -- getting the answer
completely wrong for trivial computations such as 2^31.  And, in
the survey of serious defects, the remaining 30% were almost wholly 
of the undetected integer over/underflow kind.

But why are computers so bad at arithmetic?  I am sure the programmer
wrote the obvious code (e.g. x = x + 1).  The answer is, I suggest, that
today's prevalent languages don't have automatic storage management.
Getting integer arithmetic to work without these moronic limits
requires dynamic allocation of (big) integers.  
Now, let's suppose you had to keep track of every single number
(because what goes for integers does double for floats) in your
program i.e. deallocate it when appropriate.  Imagine instead of
writing
    int i;  for ( i = k1; i < k2; i++ ) ...
you had to write
    int i, j;
    for ( i = k1; i < k2; j = i, i = j + 1, free( j ) ) ...
Not a nice prospect!

In fact, the current state of acceptance of incorrect integer 
arithmetic on computers is the biggest and most blatent 
institutionalised idiocy of the lot.  And it arises because
of the lack of automatic store management.