[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon Apr 5 07:30:53 1994 
Subject:Re: Saving Data in Poplog. 
From:jonm (Jon Meyer) 
Volume-ID:940405.01 

Robin - your saving mechanism is elegant. One comment I have is that it is
highly desirable to separate the byte representation required to save 
a structure from the file used to hold that representation. In other words,
don't emulate the datafile approach.

For example, I have implemented a similar package to yours, though it operates 
at a much lower level and uses a compact and fast binary file format (I
threw away human-readability in favour of size/speed). 

The jewel is that the underlying procedures used to encode/decode data, 
    sys_write_data(file, item, flags) -> nbytes
and
    sys_read_data(file, flags) -> ok

allow you to specify character consumers/repeaters as the file argument.

This means that I can write:

;;; converts Pop structure to a string (using identfn as the character consumer
;;; will leave all the required bytes on the stack)

define encode_to_string(item); lvars item;
	consstring( sys_write_data(identfn, item, 2:0) )
enddefine;

;;; converts an encoded string back into its pop structure (using stringin
;;; to generate a character repeater to pass to sys_write_data)

define decode_from_string(string); lvars string;
	unless sys_read_data( stringin(string), 2:0 ) then
		mishap(string, 1, 'NOT A PROPERLY ENCODED STRING');
	endunless;
	/* retrieved item is left on stack */;
enddefine;

Now I have a procedure which coerces a Pop structure into a string, and
one which coerces it back to a Pop structure.

I've written a wrapper on this which uses the GNU gdbm package (a 
package for creating very large string-based hash tables) to create 
potentially huge persistent property tables holding Pop datastructures.
I simply do:

   dbm_property('~/test', false, false) -> prop;

   [a {test} 1.2] -> prop([a key]);

   prop([a key]) =>
   ** [a {test} 1.2]

and I can come back at any time and '~/test' will contain the [a key]
entry.

This forms the basis of the object oriented database I use in HiPWorks. 
To save the contents of an object, I have methods which leave all
the important data on the stack. I do something roughly like:

  {% dataword(object), {% dest_persist_data(object) %} %} -> dbm_prop(key);

to write an object to disc, and

  dbm_prop(key) -> data;
  new_object_of_type(data(1)) -> object;
  explode(data(2)) -> dest_persist_data(object);

to read it back in. dest_persist_data is a method which is redefined by
each class of object to leave the right things on the stack to save
the object, or to take the right things off the stack to restore it. 

Another advantage to the sys_read_data/sys_write_data is that it makes
it simple to pass Pop datastructures between two running Poplog
processes, e.g. using a pipe, without having to create an intermediary file.

A final advantage to the sys_read_data/sys_write_data approach is that
you can use it to create equivs. of discin and discout which work on
a Pop item level rather than a character level. I can thus do:

  discdataout('/tmp/test') -> consume;

  consume('Hello World');
  consume([a b c]);
  consume(termin);

  discdatain('/tmp/test') -> repeater;

  repeater() =>
  ** 'Hello World'
  repeater() =>
  ** [a b c]

etc.

As a toy, I once wrote something which does:

;;; itemise the contents of file, using discdataout to write items to file.t
define create_itemised_file(file); lvars file;
	lvars repeater = incharitem(discin(file));
	lvars consumer = discdataout(file <> '.t');
	lvars filter   = repeater <> dup <> consumer;
	until filter() == termin do enduntil
enddefine;

;;; compile a pre-itemised file
define compile_itemised_file(file);
	lvars file;
	discdatain(file).pdtolist.compile;
enddefine;

Then doing

   itemise_file('test.p');

and comparing 

   compile('test.p');
with
   compile_itemised_file('test.p.t');

I found that not only were the itemised versions smaller than the
originals (about .7 the size) they also compiled quicker (about .7 the
time) - an all round winner.

Jon.