Vision and Perception

Sight, although not essential, appears to be an important component of human intelligence. AI vision systems have many potential applications, from automated security camera systems to automatic detection of pathological images in medicine. Because of the accessibility (and assurance of general functional commitment) of the optic tract, mammalian visual systems offer some of the best opportunities for basing AI systems on neuroanatomical data rather than cognitive-level abstractions. Figure 2 is one example of an attempt to build an AI vision system with regard to neuroanatomy.

1. Pattern Recognition

Pattern recognition usually refers to attempts to analyze two-dimensional images and recognize (i.e., label) within them prespecified subareas of interest. Mathematics and statistics feature strongly in this subarea by providing algorithms for noise reduction, smoothing, and segmentation. Processing of images at the pixel level leads naturally into the classic bottom-up approach to pattern recognition. In this strategy, the idea is to associate areas in the image of similar texture or intensity with features of interest and to associate discontinuities with boundaries that might be developed to circumscribe features of interest. When the image has been segmented, segments may be collected into recognizable objects or may represent the expected objects.

This final process of labeling image segments may be classically programmed or may be learned through a set of examples and the use of an inductive technology. This latter approach has been found to be valuable when it is difficult to specify exactly what is to be labeled (e.g., a tumor in a radiogram), but many examples are available that can be used to train a system.

It has been argued, however, that an intelligent agent generally sees what it expects to see; this is the top-down approach to vision. In this case, an image is scanned with an expectation of what the system wants to find in the image. The image need only be processed where and to whatever depth necessary to confirm or reject the expectation. In the bottom-up approach, it is anticipated that the features will emerge from the image, and in the top-down approach it is hoped that knowing what one is looking for will facilitate quick and accurate recognition or rejection. Of course, it is possible to combine bottom-up and top-down processing.

2. Image Understanding

One point of distinction in AI vision systems is that of finding and labeling objects in an image (from a stored list of possibilities) as opposed to the development of an understanding of an image. The latter, somewhat ill-defined goal seems to be what an AI vision must achieve, and it probably involves some object labeling as a constituent process. Image understanding is the placement of an image in a global context. A pattern recognition system, for example, might find and label a tree, sky, clouds, and grass in an image, whereas an image understanding system might declare that the image is a countryside scene in some temperate climatic zone, perhaps land used for grazing cows or sheep (an implication of the short-cropped grass).

Sophisticated image understanding has not been achieved, except in some very restricted subdomains (e.g., automatic monitoring of the configuration of in-use and free gates at a specific airport terminal). The inductive technology of neural computing has been trained to assess the crowdedness of specific underground station platforms.

In the latter case, a neural net system can only be trained to generate answers in one-dimension, i.e., it can assess crowdedness but no other characteristic of the people such as category (e.g., business or tourist). Other neural nets might be trained to assess other desired features but this implies that the important features are decided in advance of system construction and that a general understanding will only be gained from a set of trained networks. This approach to image understanding implies that a classically programmed harness system will be needed to collect and combine the outputs of the individual neural networks.

Image understanding, however it is achieved, is expected to require substantial knowledge of the world within which the image is to be interpreted. In the previous example of a rural scene, the only basis for drawing the implication about cows and sheep must be knowledge of why grass is cropped in the countryside (as opposed to a lawn, for which the grazing implication is likely to be wrong) and which grazing animals are likely to be responsible in a temperate climate.

Thus, intelligent image understanding will require image processing, pattern recognition, and a knowledge base together with an appropriate control strategy, and all must be smoothly integrated. Like so much else in AI, image understanding beyond very limited and heavily constrained situations is currently beyond the state of the art.

Conquering Fear In The 21th Century

Conquering Fear In The 21th Century

The Ultimate Guide To Overcoming Fear And Getting Breakthroughs. Fear is without doubt among the strongest and most influential emotional responses we have, and it may act as both a protective and destructive force depending upon the situation.

Get My Free Ebook


Post a comment