The two methods described above deal with the problem of building a vision system to recognize familiar objects within a scene. This is only one of the tasks that the visual sub-system of our MTG would have to deal with. An equally important task is generating an internal representation of the shapes and positions of visible objects, including unspecified objects that may be obstacles in the path of the MTG, thereby enabling the MTG to plan and execute a path to the Underground entrance. For example, how could the MTG find out about the structure of an unfamiliar building for which a stored model could only be loosely specified (for example, ``buildings have a floor and walls which usually meet at right-angles'')?
One valuable source of information is the use of binocular vision, in which a pair of images of the same scene are taken from slightly differing viewpoints. If the projection of a point in space can be identified in each image, by, say, matching edge elements derived from a fragment of texture on a visible surface, then the actual 3-D location of that point can be determined using simple geometry. This technique works for unspecified objects and clearly provides a useful starting point from which to map out the surrounding world.
Another valuable source of information about the surrounding world is the shading across visible surfaces caused by variations in surface orientation. Under certain conditions, measurements of these variations can be turned into precise descriptions of surface orientation (see Horn, 1975, for more details).
A time sequence of images depicting one or more objects moving with respect to the camera provides yet another means for discovering something about the the disposition of the object surfaces. The situation here is similar to that for binocular vision since objects are viewed from more than one direction. It is therefore hardly surprising that methods developed for binocular vision can often be adapted to deal with sequences of images and vice versa.
More information on binocular vision, shading analysis, and the use of image sequences can be found in Marr (1982), Frisby (1979), and Bruce (1985).