[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon Jan 21 15:56:05 1998 
Subject:Garbage Collection and O-O 
From: Robin Popplestone  
Volume-ID:980121.01 

Some correspondence re: gc:

> I did have a question though:  Unless objects/data is contiguous in
> memory and die with the scope, (function or statement block) how do you
> do garbage collection without tags?

First let's discuss what we mean by "tags". There are really two distinct
usages, but both serve the purpose of making data-objects self-describing.
In C, if you have a pointer to data, that data is not self-describing
(unless the user has adopted some conventions of his own). So, in C, any
pointer can be cast to be of type "void *", and can be cast back to say
"fred *". Garbage collection, as it is usually understood, requires that
the garbage collector be able to recognise different kinds of data, being
able to locate and manipulate pointers contained within that data. So,
conventional garbage collection is impossible in unconstrained C. However,
object orientation inherently introduces much of the necessary apparatus.

So the two meanings of "tag" are:

(1) In most implementations of LISP and its descendents, an object
is characterised by one machine-word (32 bits in most cases). It is this
one-word bit-pattern that is passed as a function-argument, returned as a
function-result etc.  Usually this 32-bit pattern has been divided into
tag-bits (commonly 8 in LISP implementations, 2 in Poplog) and the rest.
The rest is either a pointer to an object or may represent an integer or
possibly a float. The tag bits determine what the rest means. The Poplog
convention is this:
                                      TAG
        |------30 bit address --------|00|
        |------30 bit integer --------|01|
        |------30 bit float  ---------|10|

which has  the  advantage  that  the  "00"  combination  makes  the  32-bit
bit-pattern  actually  -be-  a  machine   address,  without  the  cost   of
de-tagging. A tagging-scheme like this has the advantage in LISP, which  is
basically not statically typed, that the sort of integers you want for most
purposes can represented as 32-bit items.

(2) However a  data structure  can also  have "tags"  that allow  it to  be
identified as belonging to a particular type, either absolutely, or among a
set of possible types. For example, to represent a 2-dimensional point, you
might have a structure in store like:

    TAG double double

This is of course an extra cost  incurred over C. However, what people  are
pointing out  is  that  that  extra  cost  is  -already-  paid  for  in  an
object-oriented environment, since  an object  has to have  a pointer  to a
method table associated with it. [To some extent it's also paid if you have
compiled the C to support debugging] So you would have:

    method*  double  double
      |
      ------> print_method.....

Therefore you  can  add  once-per-class garbage-collect  method(s)  to  the
method-dispatch  table  and  get   garbage-collection.  For  the   Multipop
implementation of POP-2 we used a mark-method and a relocate-method (though
the term "method" wasn't used back then in 1968).

Note that the data -in- the record do  not need to be tagged if their  type
is known to the compiler. The two doubles above are stored EXACTLY as  they
would be in C. The garbage-collect method for the class knows that they are
NOT pointers, so doesn't try to follow them.

Now it's a matter  of taste whether  you use type (1)  tagging or type  (2)
tagging. For object-orientation  you can also  have a version  of type  (1)
tagging in which an object whose type  is not fully known can be passed  as
an argument in the form of two pointers,  one to the object and one to  its
method-table.

For statically-typed languages the idea  of garbage-collect methods can  be
applied to Procedure Activation Records (whether on a stack or heap). Eliot
Moss has done this for the Modula-3 compiler. This gets rid of most tagging
during parameter-passing unless  you're writing  parametrically-polymorphic
code.

Java has, sensibly in my view, ruled out traditional LISP-style tagging by
requiring that "int" be a full 32-bit thingy. The cost is that if you want
an integer -object- you have to use Int, which will be passed as a pointer
to a box containing an integer.

Robin.