Neural networks are computational structures that are composed of many simple, but highly interconnected, processing units or nodes. Each interconnection, or link, is associated with a weight that is a number that controls the strength of the link; for example, with the weight as a simple multiplier, positive weights larger than 1.0 will increase the activity value from one node to the next. There are many varieties of neural network in use and most are trained in order to generate the desired system. Training consists of adapting an initial set of link weights (initially set as random numbers) so that the network reproduces a training set of input-output pairs—the training set, a set of examples of the behavior that we wish to simulate. The training procedure (which consists of adjusting the link weights until the network reproduces the training set) is an automatic algorithmic process. The training cycle typically consists of inputing a training pattern, executing the network, calculating the error between what the network computes and the correct result for this particular input, and adjusting the link weights to reduce this error. It is not usually possible (and not desirable) for a given network to reproduce the training set exactly. Training is thus repeated until a measure of the error between network performance and training set is acceptably small. The training algorithms usually guarantee that this error will not increase with further training, but there is no guarantee that it will always be reducible to the desired value.
This process of AI system development—random initialization and repetitive training—is clearly different from that of classical programming and it does not always succeed. At best, training only promises a local minimum error, which means that training can become stuck with no further improvement from successive repetitions of the training cycle, but the network is a poor approximation to the training data.
Figure 1 presents a neural network known as a multilayer perceptron. It is a system designed to predict risk of the bone degradation disease osteoporosis. It can be used to prioritize patients for screening in order to improve early detection of the disease. The inputs are 15 risk factors (measured from a given patient) and the single output is a measure of how likely the patient is to develop osteoporosis, with 1.0 the highest likelihood and 0.0 the lowest. Approximately 500 patients both with and without osteoporo sis were used to train the network, and it proved to be approximately 72% correct on a set of 200 previously unseen cases.
The network structure and mechanisms illustrated in Fig. 1 bear little more than a tenuous analogical relationship to actual brain structures. Other uses of neural computing have accepted more realistic constraints from neuroanatomy. Figure 2 presents a simplified version of a system that was designed to recognize shape from shading (i.e., to reproduce the human ability to perceive a three-dimensional shape from a two-dimensional pattern of light and dark). The neural net system used was built from hexagonal arrays of formal neurons that were configured to mimic the various known receptor types in the optic tract (e.g., circular receptive fields).
Although AI systems constructed with neural computing methods are necessarily approximations, they offer the advantage of not requiring a prior problem specification. For many AI subareas the details of how the input information is transformed into an "intelligent" output are incompletely understood. In fact, the essence of the problem is to develop such an
understanding. A data-driven technology, such as neural computing, that enables the construction of a system from a set of examples is thus particularly useful.
Neural computing is a powerful technology but it tends to suffer from lack of transparency: Trained neural networks are resistant to easy interpretation at the cognitive level. In the trained neural network illustrated in Fig. 1, for example, all the individual risk factors are totally and equally (except for the individual link weightings) intermingled in the computation at the hidden layer (all input nodes have a connection to every hidden-layer node). Therefore, how is each risk factor used in the computation of disease risk? It is impossible to determine from the neural net system; its computational infrastructure cannot be "read off" as it can be for a classical program. However, different styles of explication are emerging: It is possible, for example, to associate a confidence with each network prediction based on an analysis of the particular input values with respect to the set of values used for training.
Another result of the total intermingling of all the input features and the fact that the internal structure of a neural network is nothing but units with different combinations of weighted links in and out is that no conceptually meaningful representation can be isolated within the computational system. In the osteoporosis system, for example, the input feature "age" is assessed in comparison to some threshold in order to add its contribution to the final outcome. However, no comparison operations or threshold values are evident within the trained network. It is usual to say that such computational mechanisms use distributed representations. In contrast to neural networks, there are other inductive, or data-driven, technologies that operate directly at the cognitive level.
Was this article helpful?