[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon Jun 4 11:11:12 2001 
Subject:Re: ? text-processing examples 
From:Aaron Sloman See text for reply address 
Volume-ID:1010604.03 

[To reply replace "Aaron.Sloman.XX" with "A.Sloman"]

cglur@onwe.co.za writes:

> Date: 2 Jun 2001 17:35:20 GMT
>
> I'd like to find some tutorials/examples which would guide me to
> appropriate techniques to do some text-processing under poplog.
>
> The text IO would be via files.

There are facilities for reading writing, appending to files, in
different modes, and a fair number of string manipulating mechanisms
in pop11, mostly described in REF STRINGS, and stuff for fileio in
 REF SYSIO, with sys_file_match (described in REF SYSUTIL) for
pattern-based exploration of directories, and a regular expression
matcher described in REF REGEXP plus some extra string facilities
posted about a year ago by Steve Leach, now available in
    http://www.cs.bham.ac.uk/research/poplog/string_ops/
    http://www.cs.bham.ac.uk/research/poplog/string_ops.tar.gz
        6937 bytes
(probably also at www.poplog.org, with additional goodies.)

If you want to do manipulations not at the character level but at the
leve of words and numbers, then incharitem is extremely useful. Give it
a character repeater (such as produced by discin) and then you'll get
back an item repeater. I.e.
    vars procedure(char_rep, item_rep);
        ;;; declare them as procedure identifiers for efficiency.

    discin('myfile.txt') -> char_rep;

    incharitem(char_rep) -> item_rep;

Then char_rep is a procedure which, each time it is called returns the
next character from the file myfile.txt, and item_rep is a procedure
which each time it is called, repeatedly invokes char_rep until it has
enough characters to return a number (integer, ratio, decimal, or
complex number) or a word or a string.

If you'd prefer to manipulate the file as a list of text items do

    vars file_text = pdtolist(item_rep);

See information on dynamic (lazily evaluated lists) in REF LISTS
or HELP PDTOLIST, or see Chapter 6 of the Pop-11 primer.

It's going to be hard to know what to point you at unless you can
be a bit more specific about some of the things you might want to
do.

A colleage was having trouble with a file containing 8-bit (graphic)
characters so I showed him how this procedure could transform those
characters into something else (or omit them), while leaving other
characters unchanged.

	define transform_file(inputfile, outputfile);
		lvars
			produce = discin(inputfile),
			consume = discout(outputfile),
			char;

		repeat
			produce() -> char;
			quitif (char == termin);
			if char < 128 then consume(char)
			else
				;;; whatever you want to go out, e.g. nothing,
				;;; or some translation
			endif;
		endrepeat;
		consume(termin); 	;;; to flush output buffer
	enddefine;

Then
	transform_file('testin', 'testout');

will read the file called 'testin' and write the transformed
version to 'testout'. If it is a multi-megabyte file or you need to do
it often, some simple optimisations are possible to speed that up.
(See HELP EFFICIENCY)


A program by Riccardo Poli that reads in a file of text, builds a
table of transition probabilities, then uses it to produce a sort of
parody of the original is described here:
    http://www.cs.bham.ac.uk/research/poplog/help/summarise
        4302 bytes
with the pop-11 code in here
    http://www.cs.bham.ac.uk/research/poplog/lib/summarise.p
        6735 bytes

REF DISCAPPEND describes another utility

    discappend(<filename or device>) -> <character_consumer>;

The character consumer is a procedure that can be repeatedly applied to
characters (8 bit integers), which will be appended to the original
file (flush and close it by applying the consumer to termin).

The code for discappend can be inspected in VED
    ENTER showlib discappend

(I have just noticed that HELP DISCAPPEND is out of date.)


> I'm guessing that ved was written in pop-11 ?

Yes, The basic sources for Ved are in
    $usepop/pop/ved/src/

and autoloadable and other extensions in

    $usepop/pop/lib/ved/

though Xved uses a lot of X facilities, as you can see in

    $usepop/pop/x/ved/

In the files
    http://www.cs.bham.ac.uk/research/poplog/auto/mimencode.p
        3647 bytes
    http://www.cs.bham.ac.uk/research/poplog/auto/mimedecode.p
        3456 bytes

You'll find code to read in a file text file and write out a mimencoded
version, and code to read in a mimencoded file and write out a
mimedecoded version. Both files have documentation near the top
of the file. They are used in files to read and write portions
of a ved file with decoding or encoding.

    http://www.cs.bham.ac.uk/research/poplog/auto/ved_writemime.p
        861 bytes
    http://www.cs.bham.ac.uk/research/poplog/auto/ved_readmime.p
        873 bytes

And the latter is used in a ved utility to prepare mime attachments
for posting, in
    http://www.cs.bham.ac.uk/research/poplog/auto/ved_attach.p
        3949 bytes

as described in
    http://www.cs.bham.ac.uk/research/poplog/help/ved_attach
        17202 bytes

All that is packaged in
    http://www.cs.bham.ac.uk/research/poplog/attach.tar.gz
        10981 bytes

There are also libraries related to parsing, which might be useful
for some applications.

I hope that helps and is not too overwhelming.

Aaron
====
Aaron Sloman, ( http://www.cs.bham.ac.uk/~axs/ )
School of Computer Science, The University of Birmingham, B15 2TT, UK
EMAIL A.Sloman AT cs.bham.ac.uk   (ReadATas@please !)
PAPERS: http://www.cs.bham.ac.uk/research/cogaff/
FREE TOOLS: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html