[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Wed, 10 Mar 2004 21:43:32 +0000 (UTC) 
Subject:Re: Optimisation of program using ratios 
From:A . Sloman 
Volume-ID: 

Jonathan,

> I've been implementing a Naive Bayes model for text classification in POP11.
> While I'm rather impressed with the ratio datatype (I don't recall finding
> perfect precision in the other languages I've studied), I've gotten myself
> into a problem with speed.

The indefinite precision ratios and big integers of pop11 are also
available in some lisp systems, some prologs, standard ML, and probably
also some other functional languages.

> My model iterates through the words in a piece of text, looking up the
> probability of each word for a given class, and multiplying together all the
> probabilities it looks up.  Since these are all very small probabilities, by
> the time its finished the final probability is very very small - in fact if
> I print it, it fills the an 80 x 30 screen!  From this I presume that each
> time it is used in a calculation, the memory manipulation going on must be
> enormous.  Is this the case?

If you are using ratios rather than decimals, then every new ratio
occupies a data-structure allocated from the heap, whose size can indeed
grow as precision increases (though pop11 always reduces ratios to their
lowest common terms. (See REF NUMBERS)

Your program may be getting slow because the individual ratios are huge
or because you are producing them very frequently causing lots of
garbage collections, or both.

If the problem is only garbage collections then you can try increasing
the value of popmemlim to reduce the frequency of garbage collections.

E.g. If you think your machine typically has 400 Mbytes of memory unused
(= 100 Mwords on a 32 bit machine or 50 Mwords on a 64 bit machine (Dec
Alpha) you can try increasing) then you could try increasing popmemlim
to half the number of words, e.g. 50000000. (The default is much
lower, which is fine for many learners.)

If you increase it too much you may end up causing paging and swapping
during garbage collections. This can be reduced by switching to the
non-copying garbage collector (make pop_gc_copy false). See
HELP EFFICIENCY.

I don't know what your project is, but it may be that using ratios
rather than decimals gives you overkill in precision.

I think there are quite a lot of experts in these things at Sussex.

Aaron