[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon May 23 10:26:36 1995 
Subject:Re: fastest processor for running Poplog? 
From:A . Sloman 
Volume-ID:950524.02 

Anthony and Jonathan, thanks for your comments.

Anthony wrote:

> Maybe we should agree a set of Poplog benchmarks and put the on an
> ftp server so people can apply them to lots of different machines.

I have now put my tests in the directory

	ftp://ftp.cs.bham.ac.uk/pub/dist/poplog/benchmarks

The README file explains what the other files are. The benchtar.Z
file is a compressed tar file containing all the other files, so you
can get them in one go.

The tests include some shell scripts for running the prolog tests
N times in parallel (N = 1 2 4 8 12 16) which is useful for seeing how
performance degrades on a machine under heavy multi-user load.


> The question is what shall we call them POPmarks?

My tests are very simple. The floating point test is particularly
stupid, though it gives a very rough indication of relative performance
of machines.

> PS Arron I assume by SS/672 you mean a SpacServer 4/670 with two
> Cyprus 102 CPUs as used in the SparcStation 2.

Yes except that they are a bit faster than the SparcStation 2.

Steve Knight has now run my tests on his workstation, a HP 735, running
HP-UX. It comes out faster than all the other machines. He has added
his numbers to my table as column J:

                     A      B   C       D     E      F    G     H       I     J

                    DEC         HP     Sun*  Sun    Sun  Sun   Sun     Sun    HP
                    3100   MIPS HP-UX  SS2   SS/    SS10 SS10  SS10    Ross   HP-UX
                           2000 M68040       672    /30   /41   /52    HS66   735
                                                              Sol2.3   mhz
------------------------------------------------------------------------------------
Prolog test KLips    56.1  68.9 107.8 117.0  134.8 203.3 238   295     335    387
(Simple reverse)
------------------------------------------------------------------------------------
Pop-11 Factorial(1000)
three times (Secs)
    recursive       0.78  0.80  0.55   0.81  0.75  0.55  0.46  0.42   0.38    0.11
    iterative       0.77  0.80  0.57   0.78  0.71  0.51  0.45  0.40   0.38    0.11
------------------------------------------------------------------------------------
Floating point
    single (secs)   0.88  0.67  0.71   0.38  0.38  0.21  0.19  0.17   0.13    0.13
    double          1.05  0.85  0.60   0.43  0.41  0.25  0.22  0.18   0.16    0.17
------------------------------------------------------------------------------------
* Replacing SS2 processor with Weitek chip gave
    177 Klips, Factorial: 0.5, 0.45 secs, Float: 0.23, 0.3 secs
    I.e. not quite SS10/30

He may be able to try out the tests on bigger HP machines later.

Jonathan wrote:

> Out of curiosity, I tried factorial(1000) in Franz
> Allegro CL/PC 1.0[1] on the Pentium 90 machine I am
> using at the moment. Computing it 3 times takes 0.22
> secs plus 0.17 secs garbage collection time (= 0.39
> secs) although if I run the loop 30 times it takes
> 1.92 plus 1.54 = 3.46, averaging a bit faster than the
> Sparc running at 66MHz (I assume Prof. Sloman's times
> included GC).

Yes. However, in Poplog with popmemlim set to 500000 GC times are
very much lower. E.g. on a SparServer with Dual 66 Mhz Hypersparcs
with several other people logged in.

define procedure fac(n);
	lvars n;
	if n == 0 then 1
	else fac(n - 1)*n
	endif
enddefine;
500000 ->> popmemlim -> popminmemlim;
true -> popgctrace;
sysgarbage(),timediff() ->;
repeat 30 times
	fac(1000) ->;
endrepeat;
timediff() =>

Gives:
;;; GC-user(C) 0.12  MEM: l 69632 + u 117756 + f 326659 + s 1 = 514048
;;; GC-auto(C) 0.15  MEM: l 69632 + u 117958 + f 328505 + s 1 = 516096
;;; GC-auto(C) 0.17  MEM: l 69632 + u 117820 + f 328643 + s 1 = 516096
;;; GC-auto(C) 0.15  MEM: l 69632 + u 118017 + f 328446 + s 1 = 516096
;;; GC-auto(C) 0.15  MEM: l 69632 + u 117901 + f 328562 + s 1 = 516096
;;; GC-auto(C) 0.12  MEM: l 69632 + u 118065 + f 328398 + s 1 = 516096
;;; GC-auto(C) 0.15  MEM: l 69632 + u 117961 + f 328502 + s 1 = 516096
;;; GC-auto(C) 0.17  MEM: l 69632 + u 118113 + f 328350 + s 1 = 516096
;;; GC-auto(C) 0.13  MEM: l 69632 + u 118016 + f 328447 + s 1 = 516096
;;; GC-auto(C) 0.15  MEM: l 69632 + u 118160 + f 328303 + s 1 = 516096
;;; GC-auto(C) 0.15  MEM: l 69632 + u 118070 + f 330441 + s 1 = 518144
;;; GC-auto(C) 0.14  MEM: l 69632 + u 118210 + f 330301 + s 1 = 518144
** 5.85

I.e. GC time is 1.75 secs 1.75/5.85 = 30% of total time.

If I increase the memory allocation by a factor of 4
2000000 ->> popmemlim -> popminmemlim;

I get (on the same machine) only 3 garbage collections
;;; GC-user(C) 0.15  MEM: l 69632 + u 119803 + f 1811460 + s 1 = 2000896
;;; GC-auto(C) 0.13  MEM: l 69632 + u 119086 + f 1814225 + s 1 = 2002944
;;; GC-auto(C) 0.15  MEM: l 69632 + u 119136 + f 1814175 + s 1 = 2002944
** 5.1

I.e. GC time is 0.43 secs, = 0.43/5.1 = 8.4% of total time.

This shows how you have to control heap allocation when running
benchmarks.

> I don't think the compiler technology is dramatically
> better, so these figures reflect processor speed.

I have reason to believe, from some tests Harry Barrow did a few years
ago, that the American Lisp systems (certainly Lucid Lisp) do a lot more
compile time optimisation than Poplog does. (It's easier in Lisp,
because there's a parse tree, and also the open stack in Poplog means
that you can do such clever things with registers.) But maybe Allegro
is not so heavily optimised. In Lucid, compilation took very much
longer than in Poplog Pop-11 on the same machine. My impression is
that the Poplog garbage collector is one of the fastest there is,
but for user code other compilers do much better optimisation.

(E.g. I don't know whether Poplog uses the new integer operations on the
latest Sparc systems. The original SPARC used not to include integer
multiply and divide. Can anyone at Sussex or ISL comment?)

Here's a more realistic comparison of processor speed, from a table
prepared by John DiMarco at Toronto, available via

    ftp://ftp.cdf.toronto.edu/pub/spectable

I've merely selected a subset of the values. Pentium is at the end.

System            CPU        ClkMHz  Cache      SPECint SPECfp  Info  Source
Name              (NUMx)Type ext/in  Ext+I/D    92      92      Date  Obtained
================= ========== ======= ========== ======= ======= ===== ==========

-- -- HP
HP 425t           68040      25      4/4          12.3    10.3  Jun93 DECinfo
HP 425e           68040      25      4/4          12.2     9.3  Jun93 DECinfo
HP 730            PA1.1      66      128/256      47.8    75.4  May92 c.s.sun.hw
HP 7[35]5         PA7100     99      256/256     109.1   167.9  Jan94 HP
HP 7[35]5/125     PA7150     125     256/256     136     201    Apr94 HP
HP 750            PA1.1      66      256/256      48.1    75.0  Oct92 c.arch
-- -- SGI (fastest machines only here)
SGI PowerChl,Onyx R8000      75      4M+16/16    108.7   310.6  Jun94 c.arch
SGI Indigo2       R4400      100/200 1M+16/16    119     131    Nov94 SGI PdcTbl
SGI PowerIndigo2  R8000      75      2M+16/16    107     265    Nov94 SGI PdcTbl
SGI IndySC        R4600      44/133  512+16/16   113.5    73.7  Feb95 SGI anno
SGI IndySC        R4400      44/175  1M+16/16    122.6   115.5  Feb95 SGI anno
-- -- Sun Sparc (inc Hypersparc)
Sun SS/IPC        FJMB86902  25      64           13.8    11.1  Nov92 Sunflash
Sun SS/IPX        FJMB86903  40      64           21.8    21.5  Nov92 Sunflash
Sun SS2           RT601      40      64           21.8    22.8  Oct92 c.arch
Sun SS2/PowerUp   WeitekPwUP 40/80   16/8         32.2    31.1  Jun93 c.s.sun.an
Sun SS10/20       SuprSP     33      20/16        39.8    46.6  Nov92 Sunflash
Sun SS10/30       SuprSP     36      20/16        45.2    54.0  Apr93 Cockcroft
Sun SS10/40       SuprSP     40      20/16        50.2    60.2  Apr93 Sunflash
Sun SS10/41       SuprSP     40/40.3 1M+20/16     53.2    67.8  Apr93 Cockcroft
Sun SS10/51       SuprSP     40/50   1M+20/16     65.2    83.0  Apr93 Sunflash
Sun Classic,LX    MicroSP    50      4/2          26.4    21.0  Nov92 Sunflash
Sun Voyager       MicroSP2   60      16/8         43.2    36.2  Mar94 Sun
Sun SS4/70        MicroSP2   70      16/8         59.6    46.8  Jan95 Sunflash
Sun SS4/85        MicroSP2   85      16/8         65.3    53.1  May95 SunIntro
Sun SS5/70        MicroSP2   70      16/8         57.0    47.3  Mar94 Sunflash
Sun SS5/85        MicroSP2   85      16/8         65.3    53.1  May95 SunIntro
Sun SS5/110       MicroSP2   110     16/8         78.6    65.3  May95 SunIntro
Sun SS20/50       SuprSP     50      20/16        76.9    80.1  May95 SunIntro
Sun SS20/51       SuprSP     40/50   1M+20/16     81.8    89.0  May95 SunIntro
Sun SS20/61       SuprSP     50/60   1M+20/16     98.2   107.2  May95 SunIntro
Sun SS20/71       SuprSP2    50/75   1M+20/16    125.8   121.2  Jan95 SunIntro
Sun SS20/612      2xSuprSP   50/60   1M+20/16      ?     127.1  Sep94 SPEC newsl
Sun SS20/HS11     HyperSP    50/100  256+8/0     104.5   127.6  Nov94 SunIntro
Sun SS20/HS21     HyperSP    50/125  256+8/0     131.2   153.0  May95 SunIntro
-- -- Ross hypersparc
RT 100S-66        HyperSP    40/66   256+8/0      67      87    Aug94 Ross
RT 100S-72        HyperSP    40/72   256+8/0      75      96    Aug94 Ross
RT 200S-66        HyperSP    50/66   256+8/0      72      94    Aug94 Ross
RT 200S-72        HyperSP    50/72   256+8/0      80     105    Aug94 Ross
RT 200S-90        HyperSP    50/90   256+8/0     101     120    Apr95 Ross
RT 200S-110       HyperSP    50/110  256+8/0     122     142    Apr95 Ross
-- -- Intel pentium
Intel Xpress      Pentium    60/90   512+8/8     106.5    81.4  Mar95 www.intel
Intel Xpress      Pentium    60/90   1M+8/8      110.1    84.4  Mar95 www.intel

> It
> really is a pity that, as recent postings reveal, there
> is no pop11 available at the moment for Window's users.

I understand that NT Poplog should be usable with Windows, in 32bit
compatibility mode. But I don't know if anyone has tried it.

Aaron