Elan <rebol@techscribe.com> writes:
> Are poplog prolog "databases" sufficiently fast to manage a small
> database of several thousand records? Or do I need to employ postgresql
> (which is really overkill in this situation)? Is there a better,
> built-in way for managing data under poplog?
I can't comment in detail on the efficiency of the current prolog
implementation. However I did hear of at least one customer of
ISL gaining a speedup of some orders of magnitude when they switched
from using the prolog database to using properties in pop-11.
There are several information files about Pop-11 properties readable
in Ved
HELP PROPERTIES
HELP NEWPROPERTY
HELP NEWMAPPING
HELP NEWANYPROPERTY
REF PROPS
The HELP files are in
$usepop/pop/help/
the REF files in
$usepop/pop/ref
Some of these are in a format that is hard to read without
Ved because of the use of "fancy" fonts. There are "stripped"
(plain text) versions of all the online documentation at
http://www.cs.bham.ac.uk/research/poplog/doc/
Properties use hash tables. There are two main sorts of properties
in pop11.
1. Those created by newproperty require exact identity of keys. They
hash the address of a datastructure to get the actual key used.
2. Those created by newanyproperty and newmapping (a simplified
interface to newanyproperty) hash on the contents of the data
structure.
So two lists with identical contents will be associated with the
same value in case 2 but not in case 1. Newproperty produces
properties that work faster but are less flexible.
Depending on the form of the data and the way in which you need to
be able to access items, you may find that some sort of tree of
properties of type1 gives you very fast access.
However, for many purposes searching down lists can be fast enough.
Create a list of 10000 items thus:
vars x;
vars database
= [%for x from 1 to 10000 do [the next number is ^x] endfor%];
Now search through it 10 times searching for a non-existent item, using
the "=" test which compares contents (like EQUAL in lisp), and time
it:
sysdaytime() =>
repeat 10 times
member([the next number is 99999], database) =>
endrepeat;
sysdaytime() =>
On a 450 mhz sparc running solaris 7 this printed out immediately:
** Sat Oct 21 09:51:30 BST 2000
** <false>
** <false>
** <false>
** <false>
** <false>
** <false>
** <false>
** <false>
** <false>
** <false>
** Sat Oct 21 09:51:31 BST 2000
Re-doing to measure CPU time instead, without the false printed:
timediff() ->;
repeat 10 times
member([the next number is 99999], database) ->
endrepeat;
timediff() =>
** 0.16
(i.e. 0.16 secs)
It would be around the same speed on a 450 mhz pentium/celeron
with linux poplog. I find that non-floating point performance on sparc
and intel corresponds approximately to CPU clock speed.
Compare storing the lists in an expandable property using newmapping:
true -> popgctrace;
4000000 -> popmemlim;
6 -> pop_hash_lim;
vars procedure pdatabase = newmapping([], 10000, false, true);
;;; Now insert the 10000 items, associating them with true:
vars x;
for x from 1 to 10000 do
true -> pdatabase([the next number is ^x])
endfor;
(This is much slower if you don't increase the default value of
pop_hash_lim. see HELP SYSHASH);
;;; Acces is now very much faster than searching down a list ;;; we can
search for the same item (actually in the property) 10000 times in a
fraction of a second. (Not if you leave pop_hash_lim with its default
value of 3, however!)
timediff() ->;
repeat 10000 times
pdatabase([the next number is 9999]) ->
endrepeat;
timediff() =>
If you want to use pattern variables, e.g. with the Pop-11 pattern matcher
then the above method won't work. You can create a list of items and
use constructs like foreach to search over the list with patterns
containing variables to be bound to values.
vars type;
timediff() ->;
repeat 10 times
foreach [the next ?type is 9999] in database do type=> endforeach;
endrepeat;
timediff() =>
This prints out:
** number
** number
** number
** number
** number
** number
** number
** number
** number
** number
** 0.25
Inside a procedure definition I would use lvars x, and the pattern
prefix "!" that allows pattern variables to be used.
It would also be possible to use the new pattern syntax that works with
"=" defined in HELP EQUAL. It works on vectors as well as lists.
;;; make alist of 10000 vectors
vars x;
vars vdatabase
= [%for x from 1 to 10000 do {the next number is ^x} endfor%];
timediff() ->;
lvars type;
;;; create pattern once, rather than each time round the loop
lvars pattern = {the next =?type is 9999};
repeat 10 times
lvars item;
for item in vdatabase do
if item = pattern then type => endif
endfor;
endrepeat;
timediff() =>
** number
** number
** number
** number
** number
** number
** number
** number
** number
** number
** 0.19
Vectors take up less space than lists.
I hope that provides some useful information. Old users of Pop-11
may not be aware of the generalisations around 1996 by John Gibson
enabling "=" to be used as a pattern matcher, with a new data
type, matchvars, and new pattern syntax using =?, =?? etc
described in HELP EQUAL.
HELP EQUAL also describes a more powerful matcher called "equals", that
is guaranteed to find all possible matches where segment variables are
involved. The simpler matchers, "matches" and "=" both fail on this;
vars x,y;
[[1 2][2 1]] matches [[??x ??y][??y ??x]] =>
** <false>
;;; note new syntax for segment variables "=??"
vars x,y;
[[1 2][2 1]] = [[=??x =??y][=??y =??x]] =>
** <false>
But equals gets it right
vars x,y;
[[1 2][2 1]] equals [[=??x =??y][=??y =??x]], x, y =>
Apologies for long reply.
Aaron
===
Aaron Sloman, ( http://www.cs.bham.ac.uk/~axs/ )
School of Computer Science, The University of Birmingham, B15 2TT, UK
EMAIL A.Sloman AT cs.bham.ac.uk (ReadATas@please !)
PAPERS: http://www.cs.bham.ac.uk/research/cogaff/
FREE TOOLS: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html
|