Hi,
I've been implementing a Naive Bayes model for text classification in POP11.
While I'm rather impressed with the ratio datatype (I don't recall finding
perfect precision in the other languages I've studied), I've gotten myself
into a problem with speed.
My model iterates through the words in a piece of text, looking up the
probability of each word for a given class, and multiplying together all the
probabilities it looks up. Since these are all very small probabilities, by
the time its finished the final probability is very very small - in fact if
I print it, it fills the an 80 x 30 screen! From this I presume that each
time it is used in a calculation, the memory manipulation going on must be
enormous. Is this the case?
I'd be grateful if anybody could offer their comments and suggestions -
perhaps another datatype is more suitable?
many thanks
Jonathon Read
|