Hi Jonathon,
>I've been implementing a Naive Bayes model for text classification in POP11.
>While I'm rather impressed with the ratio datatype (I don't recall finding
>perfect precision in the other languages I've studied), I've gotten myself
>into a problem with speed.
I recall arguing many years ago, when Pop-11 first got rational numbers,
that "/" on two integers should produce a rational rather than a float.
But seeing how many times this has caused confusion I often wonder whether
I argued the wrong way.
The basic problem is that most programming languages have defective
arithmetic operators and, unfortunately, programmers come to Pop-11 with
this baggage. And part of that baggage is that "/" is a constant
time operator but unpredictably inaccurate.
For better or worse, Pop-11 confounds that expectation by being
accurate but, in some situations, expensive.
>My model iterates through the words in a piece of text, looking up the
>probability of each word for a given class, and multiplying together all the
>probabilities it looks up. Since these are all very small probabilities, by
>the time its finished the final probability is very very small - in fact if
>I print it, it fills the an 80 x 30 screen! From this I presume that each
>time it is used in a calculation, the memory manipulation going on must be
>enormous. Is this the case?
From this description - yes. All you need to do to "fix" the problem is
to using floating point numbers. So if you are writing your probabilities
like this
1/2
you "should" write them as
1.0s0/2
See REF ITEMISE/Floating Point for the differences between 0.0s0 and
0.0d0. ( s = single precision, d = double precision )
When you multiply them together you will imperceptibly lose precision but
make enormous gains in efficiency. And by setting popdprecision to
<false> or <true> you can further tweak the speed/precision tradeoff.
--
Steve
|