[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon Feb 27 10:59:10 1994 
Subject:a heretical suggestion regarding Pop's lexical rules 
From:"A.Sloman" 
Volume-ID:940227.01 

While working on my Pop-11 primer and looking at the messy explanation
of the lexical rules for words, I had a thought, which perhaps others
have had before.

I began to wonder whether
 it would be best to abandon the distinction between sign characters and
alphanumeric characters, and instead, like Lisp, allow any
space-delimited, or separator delimited, mixture of characters to be
accepted by the lexical analyser. Similarly, words could be allowed to
start with numerals, except for the special cases currently used where
there's a single letter, e.g. 3.5e5, 3.5s5 (single precision float),
3.5d5 (double precision float).

We could rule out leading numerals followed by "_", so as to avoid
ambiguity with ratios and complex numbers e.g.  "10_/3", "2_+:3",
"1.5_+:6.9"

Then, for example, x+y would be a single identifier, not three, whereas
x+y,x-y would be three identifiers, since "," would remain a separator,
and 1cat 2cat 3cat would be acceptable identifiers.

This would have several advantages:

(a) the rules for forming identifiers would be much easier for novices
to learn. Lots of Pop-11 learners waste time over apparently arbitrary
rules that disallow "3cat" and "#$cat" but allow "cat3" and #_cat".
Simplifying the lexical rules might win more converts.

(b) we would no longer need to use an alphabeticiser, at least with sign
characters, and "_" could lose its special role in permitting
identifiers to include mixtures of alphanumeric and sign characters,
simply act as any other sign character. (It would retain a special role
in numbers, for the reason indicated above.)

(c) the space of useful identifiers would be significantly extended, and
several could immediately be reserved for system use, e.g. everything
starting "##" and followed immediately by a letter could be reserved for
system identifiers, and everything starting "$-" would have to be
reserved for section pathnames. It would also be possible to have a
convention regarding global variables apart from procedure names, e.g.
that they should start with some special sign character to make them
easier to identify when reading code.

(d) identifiers with section pathnames would in most contexts work
exactly like other identifiers, e.g. $-foo$-baz$-grum could occur in a
list and would be parsed as one item.

(e) the word-left and word-right rules for VED could be considerably
simplified.

Disadvantages:

(1) the compiler and things like valof would have to dig into words to
see if they start with "$-" instead of begin given "$-" as a separate
item to test using "==". This will complicate and slow down compilation
where sections are used. However, some other aspects of lexical analysis
would presumably be slightly faster and certainly simpler.

(2) The lexical analyser would have a harder job deciding whether it was
dealing with a number or a word. 123ee5 would be a word and 123e5 a
number.

(3) many expressions using infix operators will become more verbose, as
they will need more spaces, as in lisp. (However some of us prefer to
use spaces in those contexts in any case. I often (not always) add
spaces to code obtained from others, e.g. after commas and around "+",
"-", "*" "=", and "/", and possibly even "->", for the sake of clarity.
Steve Knight goes even further and puts spaces after "(" and before ")".

(4) it is TOTALLY backward non-compatible, though I think it would not
be hard to use the existing lexical analyser to write a program to
transform all code to the new standard, simply by inserting spaces
between all identifers that start or end with letters, sign
characters or numerals.

The lack of compatibility is clearly a killer. However the fact that
automatic conversion of old code would be so easy makes it worth
considering. Of course, new code would not necessarily run in the older
Pop-11 systems.

Because of total non compatibility I previously opposed the suggestion
to change Poplog pop-11 define ... enddefine syntax so as to make input
and output locals default to lvars, because that could seriously affect
the semantics of programs in a manner that would be hard to detect
automatically. However, if we allowed the new identifiers, then we could
probably (though not easily) produce a file-conversion utility to
convert code so that all those identifiers that were used non-locally
(and not declared as file-local lexicals) had something like "#" or "=="
pre-pended. Then errors involving those identifiers would be easier to
track down, e.g. with the help of "grep".

However, I would still prefer a new syntax to be used for such a change
e,g. "def" ... "enddef", or "defproc" ... "enddefproc", and for
anonymous procedures with the new default we could return to
"lambda" ... "endlambda".

It is possible that the original lexical rules for Pop2 had some
justification other than a desire for compactness in infix expressions
and assignments, e.g. x+y, x=y, x->y, x::y. If so, it has escaped me.

I would gladly abandon that compactness for the sake of simpler
lexical rules allowing a larger set of usable identifiers.

Aaron