lucb AT telus.net writes:
> Date: Wed, 1 Jan 2003 07:50:26 +0000 (UTC)
>
> ... the problem (and unfortunately lost most
> transcripts). But the following reliably gives me a core dump:
>
> : newarray([-5 0 10 15])->b;
> : b(4)=>
> ;;; MISHAP - INVALID ARRAY SUBSCRIPT
> ;;; INVOLVING: 4
> Segmentation fault
>.....
> Pentium III running Mandrake 9.0 Linux workstation.
> Poplog: linux.poplog.V15.53 without motif.
>
> I assume this is not something that happens on other combinations of
> hardware and OS.
I confirm that this happens also with RedHat 7.3 or RedHat 8.0 on PC
with either pentium P4 or AMD athlon, and also with PC poplog running on
Windows.
On redhat 8.0
Sussex Poplog (Version 15.53 Mon Aug 21 17:36:46 BST 2000)
Copyright (c) 1982-1999 University of Sussex. All rights reserved.
Setpop
: 1 -> popsyscall; ;;; ensure full error messages
:
: vars b = newarray([-1 5 -1 5]);
: b(6) =>
;;; MISHAP - INVALID ARRAY SUBSCRIPT
;;; INVOLVING: 6
;;; DOING : sys_exception_final sys_exception_handler
;;; Segmentation fault (core dumped)
I checked using Poplog version 15 running on a PC with Windows 2000
and also got an access violation error. It opened a new window in which
the error printing went on forever, until I killed the process.
However the problem does not occur on either sparc+solaris or on an
alpha running digital Unix. In those cases you get a mishap message and
poplog continues running, as expected.
On linux + PC the segmentation fault arises with b(6,6) and
b(6, 4) and also b("cat"), so it is not a stack problem arising out of a
missing argument, but has something to do with what happens when the
index is discovered to be out of bounds or of the wrong type.
Because it is common to both windows poplog and linux PC poplog I assume
the problem is in the assembler file that defines the array checking
code.
The error handler which prints the above message is invoked by this
procedure
define Array$-Sub_error(item);
defined in $popsrc/errors.p
That procedure is invoked in the low level machine code procedure
array_sub_error defined in this assembler file (which seems to be the
only file involved here that is specific to pc+linux (or pc+windows):
$popsrc/aarith.s
That file defines the machine code procedure _array_sub, which computes
the offset into the array vector. It gets the array indexes off the
stack one at a time, testing them to ensure that they are integers and
in range.
It looks as if the error test works, then _array_sub calls the routine
array_sub_error which succeeds in calling Sub_error, which invokes
pop11's generic error handler, which starts printing out the error
message, and fails half way through printing.
The corresponding $popsrc/aarith.s for windows poplog was apparently
generated from the linux version then edited by hand, according to a
comment inserted by Robert Duncan.
I suspect something is wrong with the machine instructions in both files
and I wonder if someone familiar with the PC architecture can tell what
is wrong either from the linux version or the windows version. They are
accessible here if you don't have local versions:
http://www.cs.bham.ac.uk/research/poplog/src/master/S.pcwnt/src/aarith.s
PC + windows
http://www.cs.bham.ac.uk/research/poplog/src/master/S.pcunix/src/aarith.s
PC + linux
For comparison here are two versions that work OK:
http://www.cs.bham.ac.uk/research/poplog/src/master/S.sun4r5/src/aarith.s
Sparc + solaris
http://www.cs.bham.ac.uk/research/poplog/src/master/S.axposf/src/aarith.s
Alpha + unix
(If anyone has a version of poplog running under solaris on a PC I
expect it will have the same problem).
I decided to see whether the problem was caused by something happening
after the low level array subscript check had finished.
Printing out the calling stack (the DOING list) is done by
sys_pr_message(count, message, idstring, severity);
defined in $popsrc/errors.p
It is invoked by sys_raise_exception via the user-definable
pop_exception_handler, which defaults to sys_exception_handler
So I tried redefining pop_exception_handler to simply print out a
message, or do nothing. But poplog still crashed.
Likewise if I define pop_pr_exception to simply print out a message
or do nothing:
define pop_pr_exception(count, message, idstring, severity);
message =>
enddefine;
Sometimes I find that it prints the message and I can then run pop11 a
bit more before it crashes. Sometimes pop11 goes into a loop and has to
be killed using kill -9
It looks to me as if something gets corrupted before the error handler
starts printing the message, i.e. before pop_pr_exception is
invoked, but the corruption allows the process to continue for a while
before it manifests itself.
How it manifests itself seems to vary.
That is not uncommon with heap corruption, which goes undetected until
either a garbage collection or something else goes wrong.
In this case I don't know if the problem is heap corruption, or
corruption of the pop11 control stack, or something else.
I suspect the bug is in $popsrc/aarith.s for PC+linux and also in
the version for PC+windows.
Sorting this out will require help from someone who is familiar with the
intel machine instruction set.
NOTE: the code in the .s files is not pure assembler for the system.
The $popsrc/*.s include directives for the poplog assembler which
generates the actual assembler files.
See
http://www.cs.bham.ac.uk/research/poplog/sysdoc/
Aaron
====
Aaron Sloman, ( http://www.cs.bham.ac.uk/~axs/ )
School of Computer Science, The University of Birmingham, B15 2TT, UK
EMAIL A.Sloman AT cs.bham.ac.uk (ReadATas@please !)
PAPERS: http://www.cs.bham.ac.uk/research/cogaff/ (And free book on Philosophy of AI)
FREE TOOLS: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html
|