Recovery of 3D Structure

The goal of object perception is to treat different 2D views of a 3D object similarly. One intuitive way to do this is to recover the 3D structure of the object from the 2D image. If the visual system can construct an accurate 3D model of the to-be-recognized object, the problem of object constancy is automatically solved, because such a model would constitute a viewpoint-invariant representation—the same model would be constructed for a given object regardless of viewing conditions, so that all retinal images would result in an identical representation.

David Marr, whose theoretical work has driven much of the research in object perception, felt that the derivation of a precise, viewpoint-invariant 3D model is a fundamental step in the achievement of object constancy, and most vision scientists have agreed with this ideal. But for most objects, it is not feasible for the visual system to construct a complete 3D model from any one view of the object. For example, the structure of the back of a house is unknowable when the house is viewed from the front. Similarly, it is impossible to know whether or not the SOGI has a fourth "arm" from the view in Fig. 1A (Fig. 1I indicates that the fourth arm does in fact exist).

A more realistic goal is to represent the 3D structure of the portion of the object that can be seen from a given viewpoint. In such a system, small shifts in viewpoint (such as that from Figs. 1A to 1H) would be likely to lead to the same structural description and thus would cause no trouble for the visual system. Large viewpoint shifts may result in a completely different structural description, and when these shifts occur, object constancy is expected to be violated (i.e., object perception should be error-prone and/or slow under such conditions).

Despite the elegance of the structure-recovery approach, it is important to understand that limited object constancy is theoretically attainable without referring to a 3D structural description. A representation of the 2D image of an object—such a representation is referred to as view-based—can serve as the basis for object perception as long as perceived views can be accurately matched to encoded views. The latter part of this statement is crucial: the success of view-based models depends on the ability to specify processes that select the proper encoded representation when presented with a novel view of an object. Thus, whereas the focus of structural description theorists has been on how to construct an informationally rich representation of an object, the focus of view-based theorists has been on specifying computational procedures that achieve object constancy using less sophisticated representations but more complex matching processes.

