Abstract
Basic features of objects and events in the environment such as timing and spatial location are encoded by multiple sensory modalities. This redundancy in sensory coding allows recalibration of one sense by other senses if there is a conflict between the sensory maps (Radeau and Bertelson, 1974; Zwiers et al., 2003; Navarra et al., 2009). In contrast to motor or sensorimotor adaptation, which can be relatively rapid, cross-sensory recalibration (the change in an isolated sensory representation after exposure to conflicting cross-modal information) has been reported only as a result of an extensive amount of exposure to sensory discrepancy (e.g., hundreds or thousands of trials, or prolonged durations). Therefore, sensory recalibration has traditionally been associated with compensation for permanent changes that would occur during development or after traumatic injuries or stroke. Nonetheless, the dynamics of sensory recalibration is unknown, and it is unclear whether prolonged inconsistency is required to trigger recalibration or whether such mechanisms are continuously engaged in self-maintenance. We show that in humans recalibration of perceived auditory space by vision can occur after a single exposure to discrepant auditory–visual stimuli lasting only a few milliseconds. These findings suggest an impressive degree of plasticity in a basic perceptual map induced by a cross-modal error signal. Therefore, it appears that modification of sensory maps does not necessarily require accumulation of a substantial amount of evidence of error to be triggered, and is continuously operational. This scheme of sensory recalibration has many advantages. It only requires a small working memory capacity, and allows rapid adaptation to transient changes in the environment as well as the body.
Introduction
Sensory systems have evolved to detect different forms of environmental properties. While the transduction process is unique in each system, there are redundancies among the systems along several perceptual dimensions. For example, in determining the spatial position of an object, the location can be determined by sight, sound, touch (for proximal stimuli), and smell. In addition to increasing the reliability of perceptual estimates (Ernst and Banks, 2002; Alais and Burr, 2004; Stein and Stanford, 2008), an important functional benefit of having redundant sensory representations is the ability to perform self-maintenance (King, 2009; Recanzone, 2009). This allows calibration of sensory representations even in the absence of external feedback. This form of sensory recalibration has been shown in both the temporal (Fujisaki et al., 2004; Vroomen et al., 2004; Harrar and Harris, 2008; Navarra et al., 2009) and spatial domains (Canon, 1970; Radeau and Bertelson, 1974).
We investigated auditory spatial recalibration by vision. It has been known that after repeated exposure to simultaneous but spatially discrepant auditory and visual stimuli, the perceived location of a solitary auditory stimulus is shifted in the direction of the previously experienced visual stimuli (Canon, 1970; Radeau and Bertelson, 1974; Recanzone, 1998; Lewald, 2002)—a phenomenon often referred to as the “ventriloquist aftereffect.” Generally, the exposure to discrepant auditory–visual stimuli lasts several minutes and includes hundreds or thousands of presentations (Recanzone, 1998; Lewald, 2002). Even for studies that used relatively short exposure blocks (Frissen et al., 2005; Bertelson et al., 2006; Kopco et al., 2009), the stimulus discrepancy is kept at a fixed value within and across blocks, and aftereffects are measured as an aggregate over several exposure and testing blocks. The extant data on sensory recalibration observed after extensive exposure are consistent with the following two competing hypotheses about the underlying mechanism: (1) a mechanism that becomes engaged from the onset of exposure to the discrepancy; and (2) a mechanism that requires accumulation of evidence of error in the form of repeated and consistent discrepancy before it becomes engaged. To date, the lack of evidence for fast recalibration has implicitly favored the latter hypothesis.
Materials and Methods
All participants had normal hearing and normal or corrected-to-normal vision, and provided written informed consent approved by the University of California, Los Angeles Institutional Review Board.
Observers (n = 146; 100 females; age range, 18–35 years) sat 52 cm from a screen, with their chins rested on a chin-rest. The screen was a black, acoustically transparent cloth. Behind the screen were five free-field speakers positioned 7° below fixation and along the azimuth at −13°, −6.5°, 0°, 6.5°, and 13° (− is to the left of fixation, + is to the right of fixation). The speaker locations were unknown to the participants. Stimulus conditions included five unisensory auditory conditions, five unisensory visual conditions, as well as all combinations of the five visual locations and five auditory locations. Fifteen trials of the 35 stimulus conditions were presented in a pseudorandom order across trials. Therefore, auditory–visual spatial discrepancy varied from 0 to ± 26°.
Auditory stimuli were ramped white noise bursts lasting 35 ms. The visual stimuli were projected from a ceiling-mounted projector set to a resolution of 1280 × 1024 pixels onto the screen in front of the observers. The visual stimulus was a white-noise disk with a Gaussian envelope of 1.5° full width at half-maximum, presented so that the center of the disk was positioned at the center of one of the five (invisible) speakers. The visual stimulus was presented for 35 ms. On the auditory–visual trials, the stimuli were presented simultaneously. Each trial started with the presentation of a fixation cross approximately in the center of the screen and straight in front of the observer's head, and after 750–1100 ms stimuli were presented for 35 ms, and 450 ms after the offset of the stimuli the fixation cross was removed. Immediately after the offset of fixation cross, a downward block-arrow pointer appeared on the screen just above the elevation level where the stimuli were presented, at a random horizontal location. The position along the azimuth was randomized to minimize response bias. The cursor on the screen was controlled only in the horizontal direction by a trackball mouse. Participants were instructed to fixate on the fixation cross, and after the fixation cross is removed “move the cursor as quickly and accurately as possible to the exact location of the stimulus and click the mouse.” This enabled the capture of continuous responses with a resolution of 0.1°/pixel. On unisensory auditory (A) trials, subjects reported the location of sound, on unisensory visual (V) trials, they reported the location of visual stimulus, and on bisensory (AV) trials they reported both the location of visual stimulus and location of auditory stimulus. The order of these two responses was consistent throughout the session, and was counter-balanced across subjects. A blue “S” or green “L” was placed inside the cursor to remind subjects to respond to the sound or light, respectively. Feedback was not provided.
To familiarize participants with the task, the experiment was preceded by a practice block of 10 randomly interleaved trials in which only an auditory stimulus was presented at a variable location, and subjects were asked to report the location of the auditory stimulus. Feedback was not provided in the practice block either. The raw data from these spatial localization judgments have been used in another study (Wozny et al., 2010).
Results
Unisensory auditory responses had an average SD of 6.0° (5.8, 6.5, 6.5, 6.0, and 5.2° for the five locations from left to right). Unisensory visual responses had an average SD of 2.5° (3.3, 2.2, 2.0, 2.2, and 3.0° for the five positions from left to right). We examined observers' localization responses in A trials as a function of the type of stimuli presented on the immediately preceding trial. Since the trials were pseudorandomly ordered, each unisensory auditory trial could be preceded by a unisensory visual trial (Fig. 1a, cyan arrow), a unisensory auditory trial (Fig. 1a, green arrow), or an auditory–visual trial (Fig. 1a, magenta arrow). We define the dependent variable in our analyses as the shift in perceived auditory location (Fig. 1b–d, ordinate). For each A trial, this is calculated as the response on that trial minus the observer's average response across all 15 A trials with that sound position (the given trial included in the average). Rightward and leftward shifts are represented by positive and negative values, respectively. Thus, if the distribution of responses to a given sound position is sampled randomly, on average this shift in perceived auditory location would be zero. Therefore, any average shift that significantly deviates from zero would indicate a dependence on the independent variable (Fig. 1b–d, abscissa). Each data point in Figure 1 shows the shift in the perceived auditory location pooled across all five auditory positions. Results remain qualitatively the same if the shift is calculated relative to the veridical auditory position rather than the average perceived position (data not shown). While inclusion of the current response in the calculation of average response would result in a small underestimation of the actual shift in the percept, we adopt this measure over the measure that uses the veridical position, because some subjects show response distributions that are not centered around the veridical position, rendering the latter measure less informative.
Shift in perceived auditory location as a function of specific exposures in the preceding trial. a, Schematic diagram showing random interleaving of AV, A, and V trials. b, The shift in perceived auditory location (mean ± SEM. across observers) as a function of location of the visual stimulus in the preceding V trial. c, The shift in perceived auditory location (mean ± SEM. across observers) as a function of location of auditory stimulus in the preceding A trial. d, The shift in perceived auditory location (mean ± SEM. across observers) as a function of auditory–visual spatial discrepancy in the preceding AV trial. Stars denote data points that are significantly different from zero (corrected for multiple comparisons using Bonferroni–Holm correction).
First, we examined whether auditory localization in A trials is systematically affected by the position of a unisensory stimulus presented in the previous trial. Figure 1b shows data from A trials that are immediately preceded by a V trial. The abscissa shows the location of the preceding visual stimulus. There was a significant effect of previous visual location on auditory shift (Kruskal–Wallis one-way ANOVA by ranks test, df = 669, p = 0.000). The cyan star indicates that the shift is significantly different from zero [two-sided Wilcoxon signed rank test with family-wise α = 0.05; significant after Bonferroni–Holm correction (Holm, 1979) for each of the five tests: n = 134, 133, 136, 136, and 131, respectively]. Figure 1c shows data from A trials that are immediately preceded by an A trial. The abscissa shows the location of the preceding auditory stimulus. There was not a significant effect of previous auditory location on auditory shift (Kruskal–Wallis one-way ANOVA by ranks test, df = 651, p > 0.05, n = 126, 126, 133, 132, and 135 for the five locations). The absence of an effect of previous sound position on current auditory response suggests that the observed visual bias may not be a simple response bias; however, it is possible that the response bias is stronger following visual trials due to a higher confidence in response or the higher saliency of stimuli on those trials.
Next, we investigated whether the auditory localization in A trials is systematically affected by the auditory–visual discrepancy experienced in the previous AV trial. Figure 1d shows data from A trials that are immediately preceded by an AV trial. The abscissa shows the AV spatial discrepancy of the preceding AV trial (V-A; negative indicates V to the left of A). There did exist a significant effect of previous AV discrepancy on auditory shift (Kruskal–Wallis one-way ANOVA by ranks test, df = 1276, p = 0.000). The magenta stars indicate that the mean shift is significantly different from zero (two-sided Wilcoxon signed rank test, Bonferroni–Holm correction for multiple tests with family-wise α = 0.05; n = 121, 146, 146, 146, 146, 146, 145, 146, and 135 for the 9 tests). Additional analyses suggest that the recalibration effect does not transfer across hemifields (data not shown); however, it is not clear whether this reflects a midline boundary effect or a general degradation of the effect across space.
The magnitude of auditory shift increased as auditory–visual discrepancy in the preceding trial increased (Fig. 1d). Because the direction of a shift toward the absolute location of the previous visual stimulus and direction of AV discrepancy coincide in many A trials used in the analysis of Figure 1d, we investigated the underlying factor for the observed shift. First, it should be noted that the observed increasing shift as a function of increasing AV discrepancy—especially for large discrepancies (±19.5° and ±26°)—cannot be explained by visual bias. We scrutinized this issue further by examining the subset of trials in which a shift in the direction of visual discrepancy and a shift in the direction of visual stimulus would be in opposite directions. For example, in the scenario depicted in Figure 2a, right, if the change in auditory localization is due to a bias toward the location of experienced visual stimulus, then the perceived auditory location should be shifted to the left (brown arrow); however, if the change in auditory localization is due to a shift in the auditory spatial map in the direction of visual discrepancy, then the auditory stimulus should be shifted to the right (green arrow). The data obtained from these kinds of trials (Fig. 2a) are shown in Figure 2b. The results show that shift in perceived auditory location is in the direction of visual discrepancy. This confirms that the observed change in auditory localization is driven by preceding auditory–visual discrepancy (or “error”), thus indicating visual recalibration of auditory space (rather than a bias toward the exposed absolute visual location). These results altogether indicate that cross-modal sensory recalibration can occur after a single presentation of stimuli lasting only a few milliseconds, and can occur in the absence of feedback or reinforcement. Thus, it appears that the nervous system is continuously engaged in cross-modal sensory calibration, even on the basis of a single exposure.
Delineating between two potential underlying factors for the shift in perceived auditory location. a, Two examples of an A trial in which the predictions of the two underlying mechanisms are in opposite directions. If the driving force behind the auditory spatial shift is a bias toward the absolute location of visual stimulus previously experienced, then the shift in the A trial should be in the direction represented by the brown arrow. If the driving force behind the auditory spatial shift is the discrepancy between visual and auditory signals, then the auditory space should be calibrated by a shift of the auditory spatial map in the direction represented by the green arrow. b, Change in the perceived auditory location (mean ± SEM across observers) as a function of the auditory–visual discrepancy in the preceding AV trial, selectively for a subset of trials in which the two hypotheses would make opposite predictions, as follows. Trials where the AV discrepancy is greater than zero and the unimodal auditory stimulus is at or to the right of the visual stimulus (as depicted in a, right); or where the AV discrepancy is negative and the unimodal auditory stimulus is at or to the left of the visual stimulus (as depicted in a, left). The number of data points from left to right was as follows: 50, 119, 139, 146, 146, 128, 113, and 48. For these trials, recalibration mechanism predicts a straight line with a positive slope (i.e., within the green shaded region). The bias-toward-visual-location hypothesis predicts data points falling within the brown shaded regions. The observed auditory shifts are fitted well by a straight line with a positive slope (p = 0.000) consistent with a recalibration mechanism, and inconsistent with the absolute visual bias mechanism. recalib., Recalibration.
Next, we asked whether recalibration occurs following any multisensory experience, or whether it is selective to the cases where the sensations are estimated to stem from the same source and yet are at a spatial conflict with each other. To address this question, we performed the following analysis. We compared the following two subsets of unisensory auditory trials: the auditory trials that follow the perception of unity in the AV trial, and the auditory trials that follow the perception of independent sources in the AV trial. Because we did not ask observers to report their perception of unity/independence, we indirectly identified these trials based on observer's spatial judgments. Perception of distinct spatial locations for the auditory and visual stimuli suggests perception of independent sources. On the contrary, if a common cause is perceived for both stimuli, then the two signals are fused and the same location is perceived for both stimuli. Therefore, we calculated the shift in the perceived auditory location separately for two groups of auditory trials, a group in which the auditory response and visual response on the previous AV trial were within 0.5° of each other, and a group in which there was more than a 6° difference between the auditory and visual responses on the preceding AV trial. These groups of trials represent postunity and postindependence A trials, respectively. We left out the remaining A trials due to the possibility that small differences in reported locations may be due to motor errors; however, results remain qualitatively the same using other cutoff values. As can be seen in Figure 3, a larger shift in the auditory map occurs on postunity trials, suggesting that the perception of unity does influence the degree of subsequent recalibration.
The influence of perception of unity of the auditory and visual stimuli on the subsequent recalibration of auditory map. The shift in perceived auditory location (mean ± SEM. across observers) is plotted as a function of AV spatial discrepancy in the preceding AV trial (similar to Fig. 1) separately for two types of AV trials. Blue circles show trials where there was more than a 6° difference between the auditory and visual reported locations on the preceding AV trial. Green squares show trials whereby the auditory response and visual response on the previous AV trial were within 0.5° of each other.
Finally, we investigated the dynamics of recalibration by looking at how the auditory shift decays and accumulates over trials. Figure 4a shows the same analysis as that of Figure 1d, except that instead of looking at the immediately preceding AV trial, it looks at 2-, 3-, 4-, and 5-back AV trials, where the N-back AV trial refers to the Nth AV trial before the A trial under consideration (i.e., with N − 1 intervening AV trials). As expected, the recalibration effect is degraded the further back we look in the AV history of trials, as intervening AV discrepancies counteract the effect. We explored the dynamics of recalibration process by examining the percentage of shift as a function of the number of repeated exposures to the same direction of discrepancy. Because the ordering of trials was random, there was a reasonable number of trial sequences of up to six consecutive AV trials with discrepancies in the same direction. As can be seen in Figure 4b, the degree of shift in the auditory spatial map increases with the number of repetitions of discrepancies in the same direction.
Dynamics of recalibration across trials. a, The shift in perceived auditory location (mean ± SEM. across observers) is plotted as a function of AV spatial discrepancy in the N-back AV trial designated at the top of each figure (see text for more detail). Figure 1d would be considered the 1-back AV trial. Asterisks denote statistically significant datapoints (p < 0.05 with Bonferroni–Holm correction for multiple comparisons). b, Abscissa shows the number of consecutive AV trials that have a discrepancy in the same direction (rightward or leftward) preceding an A trial. Ordinate shows the shift in perceived auditory location (regardless of position) in the A trial as a percentage of the average discrepancy in preceding N consistent AV trials. Data are pooled across observers. Shaded regions show ± SEM. The number of data points for the six different conditions are as follows: 5214, 2128, 812, 316, 152, and 56. The amount of shift in auditory spatial map accumulates as the number of exposures to the same kind of discrepancy is increased.
Discussion
We explored the influence of brief exposure to auditory and visual stimuli on subsequent auditory localization. First, we found that auditory responses are biased toward the absolute position of the preceding visual stimulus (Fig. 1b). This bias could reflect a response bias (i.e., decision level) or a perceptual bias (i.e., prior expectation of an even at a given position). The absence of a similar effect induced by a preceding auditory stimulus (Fig. 1c) makes a general response bias explanation less likely; however, this explanation cannot be ruled out. Further research is required to elucidate the level of processing at which this bias occurs. Second, and more surprising, was the finding of the influence of brief auditory–visual discrepancy on the subsequent perception of auditory space. The results shown in Figure 2 clearly indicate that the observed shift in the auditory map following AV trials (Fig. 1d) is driven by the difference between auditory and visual positions (AV discrepancy, or error), rather then the absolute location of the visual stimulus. Therefore, even though there is a visual biasing effect, during bimodal stimulation, auditory recalibration occurs as a function of AV discrepancy. Remarkably, this recalibration occurs after a single presentation lasting only 35 ms.
These findings blur the distinction between perception and learning. First, there does not appear to be a need for extended and substantial evidence of error for the mechanisms of sensory recalibration to be engaged. Even a single presentation at the smallest discrepancy tested (6.5°) can result in a shift of the subsequent auditory percept. Second, feedback is not required for this fast recalibration to occur. The visual signal appears to serve as a teaching signal for auditory modality, resulting in self-supervised learning of sensory representations. We also examined whether sounds provided a teaching signal for visual recalibration; however, no evidence of a shift in the visual spatial map was found in our data. It seems that vision provided a teaching signal for auditory recalibration because the visual estimates were much more reliable (much lower variance in responses) (Burge et al., 2010). However, it is also possible that for each perceptual dimension (e.g., space), one modality (e.g., vision) enjoys the hardwired status of teacher regardless of the quality of the signal at any given time. Future studies can delineate between these two scenarios by, for example, pairing a strong (reliable) auditory signal with a weak (unreliable) visual signal.
The fast recalibration in the present study resembles those previously reported in motor adaptation studies, during which recalibration of movements are found to occur within a few trials of artificially induced error (McLaughlin, 1967; Redding and Wallace, 2003; Thoroughman et al., 2007). However, an important difference between motor adaptation and the current example of sensory adaptation is that in the former, direct feedback can be used as teaching signal. Moreover, the motor system is expected to be highly malleable and dynamic (for instance, due to muscle fatigue), and thus, subject to adjustment. In contrast, spatial and temporal maps are considered foundational to all perceptual processes and are expected to be fairly stable. Previous studies of sensory recalibration have reinforced this notion (Hofman et al., 1998; Zwiers et al., 2003), therefore, sensory recalibration has traditionally been regarded as a process for adjusting to long-term changes such as those occurring during development or after a permanent damage from injury or stroke.
Even in studies that use hundreds or thousands of repeated exposures to auditory–visual spatial discrepancy, often a shift in the spatial map has been limited to a fraction (e.g., 50%) of the experienced discrepancy. Our findings showing a significant and sizable (5%) shift in the spatial map following a single exposure raise the question of why a complete (i.e., 100%) and faster shift is not observed in these studies. This could be due to a hardwired cap on the degree of change in these basic maps, or alternatively, due to a slowing down of the learning rate. As can be seen in Figure 4b, the degree of shift in the auditory spatial map increases with the number of repetitions of discrepancies in the same direction, but the rate of accumulation appears to slow down, suggesting a decrease in learning rate. It is also conceivable that exposure to randomly varying discrepancies across trials in this experiment induced the nervous system to enter a “plastic state,” thus resulting a higher rate of recalibration than usual. Future studies should further explore the dynamics of recalibration using repetition of the exact discrepancy, with larger sample sizes and additional number of repetitions, both in randomly interleaving and blocked designs to provide a clearer description of the trajectory.
The findings of the present study suggest that the auditory spatial map is shifted by a few degrees after only milliseconds of exposure to conflicting sensory input. This is a remarkable degree of plasticity and raises the question of why such profound plasticity does not render the auditory spatial representations volatile and unstable. We suspect that the answer lies in the dependency of the recalibration effect on the perceived unity of the multisensory inputs. The results shown in Figure 3 suggest that strong recalibration occurs only when the conflicting auditory and visual signals are perceived to have stemmed from the same source. In nature, while the sensory signals are typically corrupted by noise and therefore small discrepancies are likely to occur even when the sensations arise from the same object, larger discrepancies between sensory signals are unlikely to occur due to random noise, and occur usually when there is a systematic error in one of the systems (that would require correction). Therefore, the dependency of recalibration on the perceived unity of the signals as well as its dependence on the degree of discrepancy together suggests that a sizable shift in the auditory map does not occur all the time and is selective to situations wherein the discrepancy likely reflects a systematic error in one of the modalities. Such a selective recalibration scheme would not pose a threat to the stability of the perceptual system.
On the other hand, the ability to quickly recalibrate has functional advantages. One desirable aspect of this level of plasticity is that in this scheme adaptation does not require a large capacity for—and does not impose a high demand on—working memory. It can also be speculated that this degree of fast recalibration allows adaptation to transient changes in the environment and in the body. For example, as we walk from one room to another room, or from indoors to outdoors, or vice versa, there are sometimes significant changes in the reverberation properties of the space surrounding us. Other examples of rapid changes occur when the ear canal becomes obstructed by, for example, water, wax, infection, or sinus pressure, or when sound pressure changes due to long hair intermittently and differentially covering the ears. Such changes can rapidly cause a small discrepancy between the auditory and visual spatial signals, and the ability to quickly bring the auditory map in register with the visual map can provide consistency and accuracy in auditory representations. Future studies will need to examine whether a similar scheme of continuous calibration operates in other domains and other sensory modalities. Understanding the dynamic mechanisms of recalibration will be valuable in understanding self-organization and unsupervised learning in biological systems, and could impact methods of neurorehabilitation, sensory prosthetic development, and algorithms for self-monitoring within autonomous agents.
Footnotes
D.W. was supported by a University of California, Los Angeles (UCLA) graduate division fellowship and a National Institutes of Health (NIH)-sponsored UCLA Neuroimaging Training Fellowship. L.S. was supported by UCLA Faculty Grants Program, Faculty Career Development award, and NIH Center for NeuroRehabilitation Training grant. We thank Norbert Kopco, Dean Buonomano, and Shinsuke Shimojo for comments on an earlier draft of the manuscript; Aaron Seitz for helpful discussions and suggestions; and Stefan Schaal for helpful discussions and comments on the manuscript.
- Correspondence should be addressed to Ladan Shams, Psychology Department, University of California, Los Angeles, 1285 Franz Hall, Box 951563, Los Angeles, CA 90095-1563. ladan{at}psych.ucla.edu