I'd like to say that Adrian's post was excellent and shows some real
thought (& many thanks to Ian Rogers for forwarding it.) The Pop9x effort
has always had "filling in the gaps" as a goal and the generic procedures
are definitely a candidate for rework.
Here's some comments on Adrian's post:
> If the generic procedures are going to work on all objects then I think
> Harry's suggestion that datalist(3) = [] is more correct.
Firstly, it is not practical to imagine the "generic procedures" as working
on all objects in a uniform way. The underlying issue is that the Pop type
system is currently viewed as a "flat" collection of types. However, it
is more useful to view it as a hierarchy of classes, grouped according to
which generic procedures have a common meaning.
For example, -datalist- properly falls into the category of "operators
on sequences". Since numbers aren't sequences (well, integers might be
considered as bit vectors, but let's skip that) it doesn't make much
sense for datalist to operate on them.
We now have to decide what the most convenient treatment of non-sequences
is to be. One argument is that they should be treated as empty sequences,
another that they should be treated as sequences with one element, and the
last is that they should be distinguished from sequences.
> Also you would have to write more specific code when
> you bottom out. Compare:
>
> define rec_appdata(s, p);
> p(s);
> if is_structure(s) then
> applist(datalist(s), p);
> endif;
> enddefine;
>
> with
>
> define rec_appdata(s, p);
> p(s);
> applist(datalist(s), p);
> enddefine;
Clearly, Adrian is adopting the view that non-sequence values should be treated
as empty sequences. His argument is that this gives the most elegant and
convenient treatment. The counter-argument would be that it gives rise to
procedures that have unexpected behaviour on some arguments. This latter
view suggests that -rec_data- should be written as
define rec_data( s, p ); lvars s, p;
p( s );
if s.issequence then ;;; -issequence- is made up.
applist( datalist( s ), p
endif
enddefine;
> As a first guess I would say the generic procedures should operate as
> follows on a simple data object X.
>
> datalength(X) = 0
> length(X) = 0
Adrian is stating that non-sequential objects (simple values) should be
treated as empty sequences explicitly here.
Correcting the arity of the following examples:
> appdata(X,P) = erasenum(%2%)
> mapdata(X,P) = erase
> ncmapdata(X,P) = erase
Elegantly, Adrian treats "mapdata" and "ncmapdata" as preserving all
non-component information. Since non-seuential objects have zero components,
they must be fully preserved. A very neat solution.
> copy(X) = identfn
> copydata(X) = identfn
Copy and copydata clearly don't belong in the same class as sequences.
They belong to a different class, namely the non-uniquely allocating class.
Items are either allocated uniquely (simple values, words) or non-uniquely
(everything else).
Here Adrian is arguing that -copy- and -copydata- should simply return
uniquely allocated values. This, in my view, is definitely incorrect. It
destroys the guarantee that:
copy( X ) /== copy( X )
and the minor convenience that is returned is of much less benefit. Instead,
I propose the introduction of
-isunique-
that determines whether or not an item 'inherits' from the uniquely allocating
class. I also propose variants of -copy- and -copydata-
define try_copy( x ); lvars x;
if x.isunique then
x
else
copy( x )
endif
enddefine;
define rec_copy( x, copyfn ); lvars x, copyfn;
if x.issequence then
mapdata( x, copyfn )
else
x
endif
enddefine;
Obviously rec_data can be specialised using -copy- or -try_copy- as
appropriate.
> explode(X) = erase
> datalist(X) = [] ([% explode(STRUCT) %])
Continues the idea that non-sequences should be treated as empty sequences.
> fill(X) = identfn
This is rather more dubious. -fill- applies to objects both in the mutable
and sequence classes. Are non-sequences treated as mutable or immutable?
Adrian elects for the more permissive interpretation -- items are mutable
unless otherwise stated. (And this is probably the right choice.)
-fill- is currently defined to treat all records, including pairs, as
sequences of fields. This reveals an awkward compromise at the heart of
the generic procedures system. Some generic procedures, such as -fill-
and -appdata- are only concerned with the implementation level. Others,
such as -length- and -=- are concerned with the conceptual or class level.
I believe there should be a *clear* naming convention distinguishing these
two levels.
For example:
appdata( [1 2], identfn ) =
** 1 [2]
is not consistent with
explode( [1 2] ) =>
** 1 2
In this case, -explode- is acknowledging the conceptual level and -appdata-
is breaking it. I call routines like -appdata- "abstraction breakers"
because they ignore the carefully designed abstractions of the language.
> allbutfirst(X) =mishap
> allbutlast(X) = mishap
> last(X) = mishap
This is perfectly sensible behaviour consistent with non-sequences being
treated as empty sequences. These procedures mishap when supplied the
empty list or empty vector.
> I don't think we can really change the behaviour since there are almost
> certainly user programs out there that rely on these procedures to do
> their type checking for them.
I regard this kind of thinking as too conservative. Poplog has changed
radically since I first started using it back in 1985. Several of my
colleagues have large areas of their Poplog setup break every time we have
an upgrade, with user-defineable procedures disappearing into the system
every 5 minutes. We are used to coping with significant change with
negative impact.
So why do we cluck like frightened hens at the thought of relatively
minor, positive changes? There are many ways to institute these changes
and the essence is to have a change plan. One route is to introduce a
new compile_mode that flags which version of the language the program is
written to. This can dynamically localise system procedures during
compilation. Another route is to have a systematic name change (which I
believe is necessary for other reasons, too.) Another route is to
introduce an object-oriented system that directly supports these ideas.
Obviously, we aren't discussing a plans that destroy programs which aren't
being actively maintained. Those investments must be protected -- even more
than they currently are. But we can certainly change within these
restrictions.
> We have two concepts of simple structures in Pop-11:
>
> a) The -issimple-/-iscompound- divide depending on whether we
> access the object via a pointer.
>
> b) The class_field_spec(KEY) == false divide which depends on
> whether the structure has "contents."
These distinctions can be analysed using a few concepts. The issimple/
iscompound divide is best discussed in terms of unique-allocation and
(im)mutability. This is because there are data structures that are
uniquely-allocated or immutable that aren't simple but still share many
of the same problems.
The "class_field_spec( KEY ) == false" is the mechanism by which opaque
(or abstract) data types are constructed. Currently, only system types
can be truly opaque and this is a serious weakness in the idea. Opaque
types are invulnerable to inspection except through the provided
interfaces.
> The generic data-structure procedures basically operate on the (b)
> distinction. Procedures make this more complex since:
>
> o class_field_spec(procedure_key) == false yet they have fields
> we can access (-pdnargs-, -pdprops-, -pdpart- etc.)
As outlined above, this is because procedures are opaque types whose
implementation is not open for inspection. However, merely because a
class is opaque, it does not follow that it has no interface -- quite the
opposite in fact.
> o Complex data structures are built from procedures. All the
> following are all "procedures" yet give radically different
> behaviour when passed to -datalist-:
>
> identfn.datalist=>
> ;;; MISHAP - BAD ARGUMENT FOR EXPLODE
>
> identfn(%1%).datalist=>
> ** [1]
>
> (identfn <> identfn).datalist=>
> ** [<procedure identfn> <procedure identfn>]
>
> newarray([1 2]).datalist=>
> ** [undef undef]
>
> newassoc([[a b]]).datalist=>
> ** [[a b]]
Here Adrian points out a definite weakness in existing Pop-11, namely the
failure to have distinguishable procedure types. It is not unreasonable
for closures, properties, and arrays to be in the procedure class (although
I always feel unhappy about arrays being represented as procedures) but
we cannot expect such disparate entities to behave the same way to
generic procedures.
This disrupts the model that keys correspond to classes. In the case of
procedure, class is a finer grained distinction than key, which is
counter-intuitive. I believe that this requires procedures having more
than one key.
> Ideal worlds solutions:
>
> o Composites, closures, arrays, and properties all have different
> keys. -pdprops- etc work like normal record field access procedures.
> Radical change to Pop-11... cannot do it.
>
> o Composites, closures, arrays, and properties are subclasses of a
> procedure class in an OO Pop environment. Again... a radical change.
We have discussed the concept of procedure classes many times for Pop-11.
Adrian discusses one possible view on procedure classes here. The essence
of correctly implementing procedure classes is that -datakey- returns
different keys for different procedure type.
This is clearly a trivial proposal:
define datakey( x ); lvars x;
if x.issimple then
... look up tag bit in table ...
else
lvars k = ... grab 2nd long word ...
if k == procedure_key then
... find virtual key ...
else
k
endif
endif
enddefine;
This would slow down -datakey- fractionally, it is true. However, it does
show that implementing procedure keys is utterly trivial -- in contrast to
Adrian's fears.
No, what holds back the implementation of multiple procedure keys is a
failure to perceive added value. The added value is in making Pop-11 a
neater system. The initial cost is efficiency.
> In addition to -is_structure- we probably want -is_simple_procedure- as
> well, defined as
>
> define is_simple_procedure(p); lvars p;
> isprocedure(p) and not(
> isclosure(p) or ispcomposite(p)
> or isarray(p) or isproperty(p)
> );
> enddefine;
>
> -is_structure- can then be defined as:
>
> define is_structure(s); lvars s;
> class_field_spec(s.datakey)
> or isprocedure(s) and not(is_simple_procedure(s));
> enddefine;
>
> How is Pop9x handling this sort of confusion :-) ?
My belief is that Pop9x should remove this confusion. It has always been
an unhappy decision to implement arrays as procedures and I don't see why
that decision should be perpetuated. (Note that AlphaPOP, bless it's
cotton socks, reversed that decision.) Robin Popplestone argues that
the ability to treat arrays as appliable objects is valuable -- and I agree --
but that has nothing to do with implementing them as procedures.
Another argument occasionally wheeled out in favour of making arrays
procedures is that it improves performance in array lookup. This is an
extremely weak argument indeed, as any APL implementor will tell you. The
cost of implementing arrays as procedures, in terms of space, is very
considerable.
For example, a 5 x 5 array in Pop-11 takes 83 long words. 27 long words
are required by the arrayvector. 12 long words are required by the boundslist.
This essential information accounts for less than half of the space
consumed!
The lookup performance improvement is noticeable, when one performs array
lookup. However, APL implementations don't compile specialised lookup
functions because the majority of their processing time is spent in
special-purpose array operations e.g. adding or scaling arrays by a scalar.
These array-processing functions need to be made lightening-fast, not
point-by-point lookup.
Well, that's enough on arrays as procedures. My view can be summarised as
suggesting that the lookup function should be optional rather than mandatory.
The current implementation of arrays makes small arrays prohibitively
expensive. (Questions for those interested in efficiency: what's the space
cost of an array of dimensionality 0? What's the space cost of an array of
dimensionality 1 and zero length? Are these figures reasonable? Why is
there such a difference -- you'll be amused when you figure it out!)
Implementing closures and properties as procedures is very reasonable
indeed, though, since directly applying them is the most natural operation.
However, I would envisage -datakey- returning distinct keys for them. This
might indeed upset a few old programs -- which would have to be compiled
with the
compile_mode:era +jurassic;
option enabled. But common-sense suggests that programs that test
procedurality by inspecting keys are rather fewer in number than programs
that break when another user-redefinable procedure gets absorbed into the
system!
Steve
|