On Thu, 18 Dec 2003 15:07:46 GMT, spam@softluck.plus.com (Jonathan L
Cunningham) wrote:
>>> >[AS]
>>> >You can check this by looking at the output of dic_distrib(); which
>>> >prints out bucket sizes.
>>>[JLC]
>>> Groan! Why oh why doesn't it leave them on the stack, so that
>>> [% dic_distrib() %]
>>> would give a list that could be analysed?
>>This will do it:
>
>Thanks ... saves me doing it. I may have a quick play with the
>numbers now ...
Which I've just done. This is in Windows Poplog with a relatively
empty dictionary (about 2130 entries -- which still suggests
the dictionary ought to be bigger).
I got these figures (slightly reformatted by hand):
size actual predicted
** [ 0 0.149414 0.12492]
** [ 1 0.257813 0.259844]
** [ 2 0.260742 0.270248]
** [ 3 0.162109 0.187379]
** [ 4 0.086914 0.097441]
** [ 5 0.047852 0.040537]
** [ 6 0.021484 0.014053]
** [ 7 0.011719 0.004176]
** [ 8 0.000977 0.001086]
** [ 9 0.0 0.000251]
** [10 0.0 0.000052]
** [11 0.000977 0.00001]
Where size is the size of bucket, and the other columns give the
fraction of buckets which should be of that size (assuming a
Poisson distribution, which I think is what it should be).
It actually looks quite good: the main problem simply being that
the total number of buckets is too few (i.e. there are more than
twice as many words as buckets). There are *slightly* more big
buckets than we'd expect, as well as slightly more empty
buckets (predicted 128, actual 153 -- multiply by 1024 the total
number of buckets).
Jonathan
--
Use jlc at address, not spam.
|