Despite languishing at home with a cold, Robin's provocative posting
awakens the slumbering urge to contradict ....
Like Robin, I can't resist replying to this claim:
> > Typical automated
> > measures of "program complexity" applied to FORTH code yield very low
> > indices even for programs I would regard as complex.
The natural conclusion to reach is that the measures are wrong. FORTH
is a jolly interesting programming language but not especially
conducive to good programming style. My own study program complexity
metrics lead me to believe that they are, to be frank, virtually
useless. Only inside a tightly managed quality-control environment
could you hope to make good use of them. [I'm happy to discuss at
length the (very serious) shortcomings of program metrics. But I'll
spare the bandwidth, for the moment.]
> Moreover with a language that supports in-lining [like C++ (not my favourite
> by the way) or Haskell say] a big win is possible, since a small function may
> inline in less space than its call takes, and is of course faster. This is
> directly opposed to threaded-code, where by definition the small function
> -must- be called, since it may be redefined. Typically threaded code is useful
> for -debugging- but it is best converted into an unthreaded form for use in a
> module like a mailer where it will be normally used unchanged.
I'd like to point out the use of -plant_in_line- available in the next
release of the PLUG archive. This enables you to transparently change
procedure calls into in-line code for simple functions. The next release
should be available early next week. Lots of new goodies.
> For example, the POP-11:
>
> define flat(T);
> define flat1(T);
> if is_node(T) then flat1(left(T)); flat1(right(T));
> else T
> endif
> enddefine;
> [% flat1(T) %]
> enddefine;
>
> works by -stacking- the leaves of the tree in the inner -flat1- function.
> Rod Burstall's [% .... %] construction converts the stacked values to
> a list on the heap. This function is O(n), whereas an implementation
> using -append- would be O(n^2). And any O(n) implementation in a
> non-open-stack language is conceptually complicated by having to pass
> an argument in which the result can be accumulated.
There's an obvious quibble to be made here. Here's an O(n) version
written without appeal to the open stack or accumulating arguments.
It works by use of a (more) global variable.
vars G = conspair( undef, nil );
define flat( T ) -> R;
lvars p = G;
define flat1( T );
if T.is_node then
flat1( node_left( T ) );
flat1( node_right( T ) )
else
conspair( T, nil ) ->> back( p ) -> p
endif
enddefine;
flat1( T );
back( G ) -> R;
nil -> back( G );
enddefine;
Mind you, a mess like the above proves nothing except that you've eaten
too many Shreddies for breakfast.
> > FORTH is incrementally compiled. The compilation mechanism is part of the
> > language and its components accessible to the programmer. This latter
> > feature distinguishes FORTH from all other languages I know. (Maybe Lisp
> > does it too? Don't know enough about it to say.) As a result, the user can
> > try something out as soon as it is defined. For example (a trivial one, to
> > be sure!) suppose we define a new FORTH subroutine (everything in FORTH is
> > a subroutine, called a "word", just as everything in C is a function):
>
> > : *+ * + ; ( a b c -- b*c+a)
>
> POP-11 is rather more verbose. The direct equivalent:
>
> define *+ (/*a,b,c*/); * + /* b*c + a */ enddefine;
This is, fair enough, the direct equivalent -- although the comments
rather clutter the clean lines of the original. The relationship
is rather more obvious if one writes
define *+; * + enddefine;
But, for the proud posssessors of the PLUG Source Code Archive, the
direct equivalent probably looks like this
vars *+ = <<| * + |>> ;
[And for those lacklustre people who haven't got the PLUG SCA, it
would be possible to write
vars *+ = nonop * <> nonop + ;
It just lacks the pizzaz of the SCA version.
]
> > The colon ***is*** the compiler. It is a FORTH word that makes a new
> > dictionary entry with the name *+ .
>
> There are some disadvantages to this kind of approach. In fact Rod Burstall
> urged that we should use an abstract syntax for POP-2, rather than having
> the whole compiler driven by "immediate action" tokens.
And the omission of an abstract syntax must be ranked as the most serious
weakness of POP-11 as she currently exists. Intriguingly, it is not
a very difficult technical problem to remediate.
> Now here we come to the main issue. POP-1 -was- a small language. POP-2 larger
> and POP-11 has grown to be big by some people's standards, though smaller
> than Common Lisp systems. Now various languages have made much bigger market
> penetration because of being -at- the right time -in- the right place and
> -with- the right facilities. Some facilities at some times represent overkill,
> but they may be seen as essential later. The primary facility that comes to
> mind is garbage collection, which -is essential-. Other facilities that take
> a lot of space in Poplog are the Common Lisp equivalent numbers (fractions,
> complex numbers ...) and the C-based widget sets.
I shall take the liberty of repeating my views on this interesting topic.
POP-11 is not an especially big programming language. However, it has
evolved into a disorganised whole and this issue needs redressing. For
example, how do you find out if an item is in a list?
member( item, list )
But what if it was a string?
locchar( item, 1, string ) /* watch out for type check! */
But what about a vector?
for i in_vector vector do ... endfor
And an array? And a closure? And in the top N elements of the stack?
And an integer vector? And a set? And a queue? And the stack of a
process? And the fields of an arbitrary record?
The way to make the language smaller is to make it bigger(!). We need
to impose a sensible hierarchy of types and then systematically implement
the operations to reflect that hierarchy. Indeed, it was for this reason
that I implemented ObjectClass, in the hope that the provision of a
object-oriented system closely aligned with the POP-11 type system
would initiate this rationalisation.
> However both FORTH and C were conceived of as languages for -small- machines
> and their role in the market place may be eroded as people realise that
> machines are no longer small, but software packages are -big-, and need to be
> written in languages that provide better means of constraining programmers.
>
> Systems like POPLOG which provide -semantic- support intended for a variety of
> languages would seem to have a discintct advantage over both FORTH and C.
Hear, hear! I imagine that shared libraries will have an important role
here. If operating systems provided shared library support for some of
the most commonly used symbolic language primitives (e.g. tagged
arbitrary precision arithmetic) then the overhead of running several
symbolic processing applications would be brought to a sensible level.
The large size of programming environments such as POPLOG can be seen
as mainly due to their duplication of low-leve computing facilities
that are provided in an inappropriate way. For example, full arithmetic
is provided because of the inadequacy of machine-level arithmetic.
Equally, garbage collection is provided because of the inadequacy
of provided, standard store management routines.
[Incidentally, I concur with Robin's off-the-cuff remark that
conservative garbage collectors are inadequate. I've closely
followed the reports on many conservative garbage collectors
and pathological cases are just too frequent to be acceptable.
Diagnostic libraries, such as Sentinel, are clearly the best
current option for engineers labouring their life away in skull-
numbing low-level languages such as C and C++. (Wot? No
secure automatic storage management? No thanx! This is the 1990's
you know.)]
> The next release is coming out with a YACC style parser generator.
What? First I've heard of it. Sounds very interesting ... we want
to get our grabbers on it NOW!
> > The above and related advantages are the chief reasons why FORTH users
> > would rather fight than switch.
>
> One knows the feeling...
This "fight vs switch" mentality is very damaging. We should simply
use the best available technology for the job. Sometimes that will be
a low-level language such as C++ or FORTH (or NEON or any of its
OO kin) but mostly it will be something high-level, with higher-order
procedures, proper lexical binding, proper arithmetic models,
a decent interactive development environment, full automatic, secure,
optimum storage management, powerful graphical development tools,
efficient compilation as well as efficient run-time, and the ability
to transparently move between programming paradigms when appropriate.
We don't fight. We merely say "find the alternative because we're
willing and able to switch to anything half as good."
> Well... I managed to get the -tak- benchmark to run 6x -faster- than C in
> POP-11 for a particular set of big arguments by writing:
>
> memo_n(tak,3,1) -> tak;
>
> That is "replace tak by its memoised version - by the way it takes 3 arguments
> and returns one result". Now this is perhaps cheating. But is it?
Yes, it is cheating. It is perfectly true that "memo_n" cannot be written
in C (well, it sort of can, but only very sort of). But the purpose of
"tak" as a benchmark has always been to test raw procedure calling speed.
[Naturally, on system such as POPLOG it equally tests the speed of full
arithmetic addition, too.] If a C-programmer wanted to make "tak" go
fast they might write a special purpose memoisation procedure. Sure,
less convenient but it would work. So, yes, it is cheating, although
interesting.
> Since it
> changes the complexity class of the function, I can beat C by any factor I
> like!
If you make the assumption of finite storage (which is eminently correct)
I believe that the complexity class remains unchanged! It is only changed
for very small numbers (i.e. all the ones we care about).
> Actually, the raw speed for systemy kinds of things seems to be that half the
> speed of C if strictly comparable code is used.
I use the rule of thumb between 2-3 times slowdown on transliterated
code. This is horribly boosted if one uses lots of double-precision
stuff (roll on 64-bit architectures!). However, when translated back
using normal idioms I use the rule of thumb of between 0.5-1.0.
In other words, I usually expect to write faster code that my
C-programming equivalent for the same problem. I attribute the
performance difference to several factors. [1] Working in
a high-level language leaves me more time to manage strategic
performance issues. [2] Garbage collection amortises the overheads
of storage management. [3] The heavy use of hash tables, memoisation
and other performance management techniques that are trivial in
POP-11 and a pain in the rear-end in low-level languages.
Steve
|