Some correspondence re: gc:
> I did have a question though: Unless objects/data is contiguous in
> memory and die with the scope, (function or statement block) how do you
> do garbage collection without tags?
First let's discuss what we mean by "tags". There are really two distinct
usages, but both serve the purpose of making data-objects self-describing.
In C, if you have a pointer to data, that data is not self-describing
(unless the user has adopted some conventions of his own). So, in C, any
pointer can be cast to be of type "void *", and can be cast back to say
"fred *". Garbage collection, as it is usually understood, requires that
the garbage collector be able to recognise different kinds of data, being
able to locate and manipulate pointers contained within that data. So,
conventional garbage collection is impossible in unconstrained C. However,
object orientation inherently introduces much of the necessary apparatus.
So the two meanings of "tag" are:
(1) In most implementations of LISP and its descendents, an object
is characterised by one machine-word (32 bits in most cases). It is this
one-word bit-pattern that is passed as a function-argument, returned as a
function-result etc. Usually this 32-bit pattern has been divided into
tag-bits (commonly 8 in LISP implementations, 2 in Poplog) and the rest.
The rest is either a pointer to an object or may represent an integer or
possibly a float. The tag bits determine what the rest means. The Poplog
convention is this:
TAG
|------30 bit address --------|00|
|------30 bit integer --------|01|
|------30 bit float ---------|10|
which has the advantage that the "00" combination makes the 32-bit
bit-pattern actually -be- a machine address, without the cost of
de-tagging. A tagging-scheme like this has the advantage in LISP, which is
basically not statically typed, that the sort of integers you want for most
purposes can represented as 32-bit items.
(2) However a data structure can also have "tags" that allow it to be
identified as belonging to a particular type, either absolutely, or among a
set of possible types. For example, to represent a 2-dimensional point, you
might have a structure in store like:
TAG double double
This is of course an extra cost incurred over C. However, what people are
pointing out is that that extra cost is -already- paid for in an
object-oriented environment, since an object has to have a pointer to a
method table associated with it. [To some extent it's also paid if you have
compiled the C to support debugging] So you would have:
method* double double
|
------> print_method.....
Therefore you can add once-per-class garbage-collect method(s) to the
method-dispatch table and get garbage-collection. For the Multipop
implementation of POP-2 we used a mark-method and a relocate-method (though
the term "method" wasn't used back then in 1968).
Note that the data -in- the record do not need to be tagged if their type
is known to the compiler. The two doubles above are stored EXACTLY as they
would be in C. The garbage-collect method for the class knows that they are
NOT pointers, so doesn't try to follow them.
Now it's a matter of taste whether you use type (1) tagging or type (2)
tagging. For object-orientation you can also have a version of type (1)
tagging in which an object whose type is not fully known can be passed as
an argument in the form of two pointers, one to the object and one to its
method-table.
For statically-typed languages the idea of garbage-collect methods can be
applied to Procedure Activation Records (whether on a stack or heap). Eliot
Moss has done this for the Modula-3 compiler. This gets rid of most tagging
during parameter-passing unless you're writing parametrically-polymorphic
code.
Java has, sensibly in my view, ruled out traditional LISP-style tagging by
requiring that "int" be a full 32-bit thingy. The cost is that if you want
an integer -object- you have to use Int, which will be passed as a pointer
to a box containing an integer.
Robin.
|