I have sent the following message to be posted on comp.compilers. Possibly it
may be censored.... Robin.
dave@yost.com (Dave Yost) writes:
>[Is it time to make compilation as easy as a subroutine call?]
David Keppel responds:
> Various languages have offered related facilities for years. Without
> going in to a lot of detail (or being very precise :-), LISP
> [McCarthy78] data and code use the same structure, so programs can
> generate code on the fly; LC^2 [Mitchell70] allowed programs to
> generate strings and then call routines to convert them to executable
> code;
> SNOBOL-4 [GPP71] allowed programs to execute strings; there is no
> command interpreter in Smalltalk-80 [GR83], instead all input
> expressions are compiled and then executed. Most extensible systems
> either use some kind of dynamic compilation or translate extension code
> in to a form that could be compiled but is instead interpreted [NG87].
I am not sure of the history of actual compilation in LISP. However our 1968
implementation of POP-2 (Package for On-line Programming) supported the
incremental generation of actual machine code blocks (admittedly a limited set
of instructions). These blocks were, and had to be, garbage-collectable,
because that was required by the design of the Multipop timesharing system.
The subroutine call was -POPVAL- (shades of EVAL of course,...), and took
a linear (and tail-lazy) list of tokens, which could be derived by applying
INCHARITEM, which was the tokeniser, to a character repeater derived from a
string.
Burstall, R.M. and Popplestone, R.J., [1968] The POP-2 Reference Manual,
Machine Intelligence 2, pp. 205-46, eds Dale,E. and Michie,D. Oliver and Boyd,
Edinburgh, Scotland.
Subseqent work at Edinburgh and Sussex Universities on incremental compilation
includes:
** The development (principally by David Warren) of the first Prolog compiler.
This addressed various issues of compiling pattern-matching and backtracking.
The ways in which the operation of procedure activation are taken apart and
recombined by the Prolog people are very interesting, and not without wider
relevance.
** The development of the Poplog system by Sussex University.
This supports code-generation via a suite of procedure calls for operating on
a stack-machine, which is then transformed into moderately efficient machine
code for the target machine (efficient at least for integer arithmetic and
pointer manipulation). It has been used to implement Common Lisp, Prolog and
SML, as well as POP-11, which is the current development along the POP-2
lines.
On the subject of efficiency, it is not obvious that non-garbage collectable
languages will eventually win out even on efficiency. With modern
architectures, the additional freedom that a garbage collector has to relocate
objects can improve cache locality in a way that more than compensates for
the increasingly residual costs of fully-automatic storage management.
Simon Peyton-Jones' STG machine, conceived of as an approach to compiling
non-strict languages, is impressive in the way it surmounts the traditional
"boxing" problems of garbage-collected systems.
>I think it would be great if there were shared libraries that
>offered incremental compilation services. You would call a
>service function and pass it a code object, which includes slots
>for:
> * Source code in one or more representations, e.g.
> * some visual representation
My Pantechnicon system is aimed at this.
> * source C
> * preprocessed C
> * assembler
The only trouble I have with unconstrained C in this kind of environment is
that it is particularly difficult to make it both safe and efficient.
> * Abstract representation (AR) (a database) which holds
> * information needed for generating machine code
> * correspondences between source code and AR fragments
> * correspondences between AR fragments and offsets into
> machine code for execution visualization (a.k.a. debugging)
> and linking
> * warnings and error messages tied to their sites in the AR
> and thus to their sites in source code
> * Executable object code (for one or more target CPU architectures)
The Poplog abstract representation for generating machine code is
thus the Poplog Virtual Machine (PVM), which consists of the suite of
code-planting procedures. E.g. sysPUSH(2); sysPOP("x"); generates code for
pushing the integer -2- on the stack, and then popping it to the variable -x-.
This will in fact be optimised to a move in machine-code. Other people have
used, e.g. SCHEME as an intermediate language.
Tools for performing the transformation into the PVM were initially absent
in POPLOG, except in-so-far as POP-11 provided a tolerable envrironment for
programming them. But the coming release has a yacc-style parser-generator
incorporated.
Typically, however, one needs an abstract form that is intermediate between
source text and the PVM, as a basis of type-checking/inference strictness
and update analysis etc. Independently of Sussex, I have also developed a
parser-generator that supports some of the above attributes, and allows one
to define the intermediate form painlessly. Error messages
for syntactic errors are actually quite helpful - the edit cursor is
placed at the error, and a token that could have given rise to a legal
parse is printed out. However support for integration with the POPLOG
source-reference debugger is not currently provided.
>Some services might include:
> * translate source from language syntax xyz into AR.
> This service, traditionally called the "front end" would be
> left for the implementor of the source language to provide,
> though translators for ISO C would proabably be widely available.
My parser-generator uses productions such as:
<while_stmt> -> while <expr> do <statement> 'while(^t1,^t2)';
to accomplish this. Here the '...' sequence, by default, creates abstract
syntax using the "mk_term" function to build a tree. If desired a conventional
semantic function can be called instead.
> * translate AR into machine code for one or more target CPU architectures
This is of course what the POPLOG VM does. But only for the machine on
which it is running. Incremental cross-compilation is not supported, except
by running POPLOG on the target machine.
> * link two or more code objects together for common execution
Poplog supports limited incremental linking of external code. Ordinary code
objects are "linked" via the name spaces of the various languages.
Much work has been done on being able to maintain consistency between Poplog
representation of data-objects and that of external languages.
> * disassemble AR into assembly source code
The parser generator ought to be able to do this (or tell you it can't
be done unambiguously ... you might for example generate the same AR for
for and while loops. Mine can't, but I have thought about the problems.
With a little annotation about indentation conventions it should be possible
automatically to generate some equivalent source. This assumes that the
AR building functions are, in functional language terms, -constructors-.
For example:
<while_stmt> -> while <expr> do <statement> 'while(^t1,^t2)';
could be used to generate printing code in, say SML:
fun print_AR(while(t1,t2)) =
(print("while"); print_AR(t1); print("do"); print(t2));
> * disassemble machine code into AR
Some Prolog implementations have been known to do this...
> * control execution of the machine code (set breakpoints, step, etc.)
>A traditional batch compiler could be implemented as a shell
>that sends source text to some services from the shared library,
>then dumps out asm and/or object files.
>Compilation could become more ubiquitous and more interactive.
>Programming environments supporting incremental compilation could
>flourish--they could concentrate on making great tools for
>humans, starting at a much higher level.
Well, yes...have the SmallTalk and Lisp communities particularly
not been doing this? On the Poplog front, the latest development is HIP,
Hypermedia In Poplog, which uses Pop-11 as the scripting language.
>C is being used as a portable assembler today by compilers for
>such languages as Eiffel, Modula-3, C++, Scheme, etc. If we had
>an extensible, standardized object definition such as what I
>propose, with services made available in shared libraries,
>implementers of new languages could break free of the batch
>nature of existing C tools, and they could easily implement
>dynamic programming environments on top of existing compilation
>and debugging tools.
Primarily I see the objection being C programs under development have a finite
lifetime before they break. So a language should be at least dynamically
type-safe (e.g. LISP or POP-11) and preferably statically type-safe too.
For an ASCII-based formalism, a -restricted- C would be quite attractive.
Personally, I like the general flavour of C syntax. But a language should be
designed for type-safety, which C is not.
Werner Assmann writes:
> 1. You has to use always the same environment for updating your source text.
> This seems to be a simple thing but restricts your freedom. It's more
> difficult to reach consistency. Note: there are a lot of people in love
> with 'vi'.
What is needed here is protocols whereby an editor communicates -incremental
changes- to a compiler, which then performs an -incremental parse-.
> 2. The implementation of such systems is not so simple as people thinks.
> Especially in the case of any errors it's sometimes a little difficult
> to avoid inconsistent states.
How difficult depends on the language. With "flat" and dynamically
typed languages like LISP, POP-2 and Prolog, problems were minimal.
With a statically typed language, a change in a type declaration can propagate
to every piece of code, making incremental recompilation impossible for
certain changes. One solution to consistency problems is to write
functionally, with memoisation providing the incremental properties.
The SML compilers are a compromise which is not too happy, in my opinion.
Consider the sequence:
fun fred x = 34;
fun joe x = fred x + 45;
fun fred x = 45;
Now in POP-11 or LISP, the redefinition of -fred- would be reflected in the
meaning of -joe-. But not in SML, which takes the no-doubt-rigorous view that
the meaning of -fred- in -joe- is what it was when joe was compiled.
> 3. The most interesting thing in incremental compilers is the high
> compilation speed to avoid waiting time (== lost time). Some people say:
> the current hardware is so very fast that all such tricks are not necessary.
> I'm not convinced by this argument because I'm dayly waiting a lot of
> minutes for the results of compilation processes.
Well, I don't wait minutes except in some applications developing X-windows
and other applications that involve linking in lots of C to POPLOG.
But of course run-time performance is not always adequate with Poplog's
incremental compiler. Which means I do spend some time hacking C. But it is
usually only a small core of an application.
Robin Popplestone.
|