Crossmodal Perception

One of the driving forces in evolution has been the creation of multiple means by which animals can sample their environments. This has involved the development of numerous sensory systems that are responsive to very different forms of physical energy, because a primary imperative of survival is knowledge

Encyclopedia of the Human Brain Volume 3

Copyright 2002, Elsevier Science (USA).

All rights reserved.

of environmental circumstances, information that is as important for the location and capture of prey as it is for the avoidance of danger. Extant animals maintain a rich variety of such systems, thereby increasing the probability of their survival and the circumstances in which they can flourish. Being able to detect many of the same events with different sensors allows some to substitute for others when the circumstances require it. For example, touch and hearing can substitute for vision in the darkness. The combination of inputs from sensors that can function simultaneously also provides an animal with multiple "views" of a single event. As a result, the inherent ambiguities that may be present when evaluating an event along one sensory dimension (e.g., how it looks) can be obviated by the additional information provided along another sensory dimension (e.g., how it sounds).

Our sensory systems function so well in these regards that we have come to have great faith in the accuracy with which they reflect the properties of the physical world. Consequently, we are sometimes amused and sometimes distressed to learn that our sensory judgments are not absolute and can vary considerably in different situations. However, it appears that our reactions to this knowledge differ in different contexts, and for most people it is far less surprising when errors are made in estimating the intensity of a sound because of the presence of background noise than when a sound is mislocated because of the confounding influence of a visual stimulus. Perhaps this is because we are more familiar with the relativity of perceptual judgments within a given sensory modality than across different sensory modalities. We readily accept the fact that visual judgments are influenced by context in the visual arts—painting, photography, and film—and we are not at all disturbed by the use of perspective and shading to alter our judgments of dimensionality, relative distance, or even the stature of a vertically challenged actor. However, finding that a sensory judgment can be substantially altered by seemingly irrelevant cues from another sensory modality is somehow more surprising and less intuitive. Nevertheless, cross-modal influences on perception are both common and potent. The presence of a neutral auditory cue can make a dim light appear substantially brighter, the vibration of the muscles on one side of the neck can make a stationary light in a darkened room appear to be displaced or even appear to be moving toward the contralateral side, rotation of the body can make a horizontal line appear oblique, and changing gravitational forces can seriously disrupt our localization of visual cues.

Of course, cross-modal influences are not restricted to altering visual judgments; judgments in all sensory modalities are subject to such influences. A common example of the sensitivity of the vestibular system to the effect of visual cues is evident in modern wide-screen cinematography, in which films are made from the perspective of a pilot flying much too low and much too fast over steep mountain peaks and into deep canyons or from the front seat of a roller coaster. Inevitably, the viewer experiences the same vestibular and gastrointestinal sensations that make clear air turbulence and high seas so amusing.

The general impression that different senses function in entirely separate realms and are thus unlikely to interact with each other may also be partly a consequence of the unique subjective impressions associated with each of them. These impressions, called "qualia," are modality specific. For example, the perception of hue or color is specific to the visual system. Although we may speak of the smell or taste of green, this is really a consequence of associations that are developed through experience because there is simply no nonvisual equivalent to the experience of "green." This becomes quite obvious when trying to describe color to a congenitally blind person. The same sort of problem would be encountered in trying to describe music to someone who is congenitally deaf because pitch is unique to the auditory system. Similarly, the sensations of tickle and itch are peculiar to somatosensa-tion, and there is no nontaste equivalent to "salty," nor a nonolfactory equivalent to "burnt almond." The unique nature of these qualia has been the subject of discussion for much of the history of neuroscience and led Johannes Muller to propose a theory of "specific nerve energies" in 1826. According to Muller, each subjective impression is attributable to the activation of modality-specific nerve fibers and their target neurons in the brain, a concept that has been supported by studies in which electrical stimulation of modality-specific cortical regions in patients (which is helpful in avoiding sensory areas during brain surgery) produced the appropriate modality-specific sensations.

These observations might lead one to expect that the brain must avoid cross talk between the modalities in order to maintain their unique qualia. However, this is not the case. Not only do the senses regularly affect each other, as indicated by the cross-modal influences on perception noted previously, but when cross-modal stimuli are slightly discordant, this cross talk can often result in any of a host of unexpected and intriguing illusions. One of the best known of these is the ventriloquism effect, and few people have not delighted in the illusion that a wooden doll is actually speaking. The name of the effect is derived from the world of entertainment, but the phenomenon properly refers to a broad class of events in which a visual cue produces the apparent translocation of an auditory cue. Although this illusion need not involve speech, the impact of television and movies has rendered speech translocation its most common form. Each voice in a movie or television program seems to come from the appropriate character regardless of his/her location on the screen. However, in reality all sounds are derived from the same location: speakers at the sides of the television cabinet or the movie screen. This underscores the fact that despite the apparent skills of some entertainers, the ventriloquist's trick is less a reflection of his ability to "throw his voice'' than the susceptibility of the brain to use the visual system to localize external events. Thus, the lip movements and gestures associated with speech produce the compelling illusion that the corresponding sound must be coming from the source of those movements, even if it is a dummy.

Among speech professionals, the so-called McGurk effect reigns supreme. In this illusion, the meaning rather than the location of the auditory signal is altered by lip movements. Its popularity is due to its ability to dramatically demonstrate how important the integra tion of visual and auditory cues is to speech perception and the significance of this cross-modal integration when, for example, trying to understand a spoken message in a noisy room. The integration of visual and auditory signals has also recently been shown to play a significant role in nonhuman primate communication. Originally, the McGurk effect was shown by having the lips of the demonstrator form a syllable, for example, "ga," but the sound associated with the lip movements (prerecorded and played through a speaker in synchrony with them) was "ba." The listener perceived "da," a percept that represents neither of the original cues but rather the product of their synthesis. The McGurk effect has also been shown to be readily induced by using strings of spoken or visible (lip movements) sounds, each of which has no meaning by itself but when combined they result in an intelligible sentence (Fig. 1). It has even been shown to work with words that already have meaning, but their combination results in the perception of an entirely different word (e.g., hearing "bows" and seeing "goes" results in the percept "doze").

There are many cross-modal illusions, some of which seem like curiosities and others that must be fully appreciated because they can confound judgments in life-threatening circumstances. A recently described auditory-tactile illusion that is particularly compelling, but that is unlikely to produce significant problems in daily life, is the so-called "parchment-skin'' illusion. In this case, a subject is fitted with

Figure 1 An example of the McGurk effect. Here, the pairing of a nonsense phrase delivered from the speaker ("my bab pop me poo brive'') combined with the visual cues of a person whose lips are forming a different nonsense phrase ("my gag kok me koo grive'') results in the meaningful percept: "My dad taught me to drive'' (adapted and reprinted with permission from Stork and Massaro, American Scientist 86,236244, copyright 1998).

headphones and a microphone, which is routed through the headset, is placed near one of his hands. Then, the subject rubs his fingers together and the sound of the rubbing is amplified and played through the headphones. The result is that the subject perceives his skin to feel rough and dry, and the degree of roughness perception changes with variations of the composition of the sound frequency. On the other hand, visual-vestibular illusions are of particular importance to aircraft pilots, who must correctly judge the speed with which the nose of the aircraft is rising despite the potent, and often confusing, influences of vestibular cues produced by the strong gravitational forces of the high-speed takeoff. Errors in judgment in this circumstance can have severe consequences for both pilot and plane.

The few cross-modal illusions described here should be sufficient to emphasize the point that changing the normal relationships among cross-modal stimuli can have substantial disruptive effects on perception. Sensory systems have evolved to function in concert, and the neural integration of their inputs, as well as the perceptual products that result from this integration, reflect the normal mode of brain function. Some of the direct behavioral consequences of the integration of cross-modal signals involve both an increase in the probability of detecting an external event and a substantially more rapid response to it. Often, these changes in detection and reaction speed exceed statistical models based on reactions to each of the individual modality-specific stimuli alone.

One of the key features that can be used to predict whether cross-modal stimuli will produce enhanced neural, perceptual, and/or behavioral responses is whether or not these stimuli are derived from the same event because such cues are generally coincident in both space and time. The relative timing of the different stimuli is critical because if cross-modal stimuli are sufficiently disparate in time, the brain does not integrate them, but treats them as separate events. However, from the perspective of neural function, the temporal window during which this information can be integrated is very long, lasting hundreds of milliseconds. This makes it possible for the nervous system to integrate, for example, visual, auditory, and somatosensory cues that are derived from the same event despite the fact that these inputs arrive at a multisensory neuron at very different times. The difference in arrival time is due to the nature of the stimuli and the nature of the nervous system. Soma-tosensory cues do not travel in space; they are delivered to the skin of the observer and have a reasonably short conduction time to the brain. Sound is also rapidly transmitted from the ear to the brain, but it travels very slowly in space and takes an appreciable time to reach the ear. Although for all practical biological purposes light transmission is instantaneous, it requires a good deal of processing time in the retina before it can be sent to the brain.

The spatial relationships among cross-modal stimuli are also critical, not only for determining whether a multisensory interaction will take place in the brain but also for determining the kind of interaction that will occur. Spatially coincident stimuli generally result in enhanced brain signals and enhanced perception and behavior; spatially disparate stimuli either produce no interaction and are treated as separate events or they inhibit one another and degrade brain signals as well as their perceptual and behavioral products. It is when these spatial and temporal relationships are neither so close that they produce "normal" products nor so discrepant that they are treated as separate events that aberrant neural, perceptual, and behavioral consequences result. The illusions discussed previously are good examples of some of these "anomalous" products.

The integration of cross-modal inputs (multisensory integration) is also responsible for providing unity to our perceptions. This observation was obvious to Aristotle and was discussed in De Anima. Aristotle was one of the first to grapple with the concept of multisensory integration by noting that although each of our five senses provides us with different information, the resultant perception is of a single world. He concluded that there must be a mechanism by which the disparate information that is obtained from different sense organs is brought together into a unified whole, a view that anticipated modern discussions of what has become known as the "binding" problem. If we state this problem from a modern neurological perspective, it might be posed as follows: How can we perceive objects as unitary entities when their individual features are processed separately in different populations of neurons in different regions of the nervous system? This issue is as germane to within-modal issues as to cross-modal issues, because individual components of a modality-specific stimulus (e.g., its motion, direction ofmovement, and color) are dealt with by different populations of neurons in different regions of the brain, just as different modalities engage different populations of neurons in different regions of the brain (e.g., their different modality-specific pathways). Although there is no widely accepted theory explaining how the brain solves this "problem," there are a number of reasonable possibilities that are currently being considered.

Was this article helpful?

0 0
All About Alzheimers

All About Alzheimers

The comprehensive new ebook All About Alzheimers puts everything into perspective. Youll gain insight and awareness into the disease. Learn how to maintain the patients emotional health. Discover tactics you can use to deal with constant life changes. Find out how counselors can help, and when they should intervene. Learn safety precautions that can protect you, your family and your loved one. All About Alzheimers will truly empower you.

Get My Free Ebook

Post a comment