[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Thu, 11 Mar 2004 00:46:00 +0000 (UTC) 
Subject:Re: Optimisation of program using ratios 
From:steve 
Volume-ID: 

Hi Jonathon,

>I've been implementing a Naive Bayes model for text classification in POP11.
>While I'm rather impressed with the ratio datatype (I don't recall finding
>perfect precision in the other languages I've studied), I've gotten myself
>into a problem with speed.

I recall arguing many years ago, when Pop-11 first got rational numbers,
that "/" on two integers should produce a rational rather than a float.
But seeing how many times this has caused confusion I often wonder whether
I argued the wrong way.

The basic problem is that most programming languages have defective
arithmetic operators and, unfortunately, programmers come to Pop-11 with
this baggage.  And part of that baggage is that "/" is a constant
time operator but unpredictably inaccurate. 

For better or worse, Pop-11 confounds that expectation by being
accurate but, in some situations, expensive.


>My model iterates through the words in a piece of text, looking up the
>probability of each word for a given class, and multiplying together all the
>probabilities it looks up.  Since these are all very small probabilities, by
>the time its finished the final probability is very very small - in fact if
>I print it, it fills the an 80 x 30 screen!  From this I presume that each
>time it is used in a calculation, the memory manipulation going on must be
>enormous.  Is this the case?

From this description - yes.  All you need to do to "fix" the problem is
to using floating point numbers.  So if you are writing your probabilities
like this
	1/2
you "should" write them as
	1.0s0/2
See REF ITEMISE/Floating Point for the differences between 0.0s0 and
0.0d0.  ( s = single precision, d = double precision )

When you multiply them together you will imperceptibly lose precision but
make enormous gains in efficiency.  And by setting popdprecision to
<false> or <true> you can further tweak the speed/precision tradeoff.

-- 
Steve