The brain's ability to bind incoming auditory and visual stimuli depends critically on the temporal structure of this information. Specifically, there exists a temporal window of audiovisual integration within which stimuli are highly likely to be bound together and perceived as part of the same environmental event. Several studies have described the temporal bounds of this window, but few have investigated its malleability. Here, the plasticity in the size of this temporal window was investigated using a perceptual learning paradigm in which participants were given feedback during a two-alternative forced choice (2-AFC) audiovisual simultaneity judgment task. Training resulted in a marked (i.e., ∼40%) narrowing in the size of the window. To rule out the possibility that this narrowing was the result of changes in cognitive biases, a second experiment using a two-interval forced choice (2-IFC) paradigm was undertaken during which participants were instructed to identify a simultaneously presented audiovisual pair presented within one of two intervals. The 2-IFC paradigm resulted in a narrowing that was similar in both degree and dynamics to that using the 2-AFC approach. Together, these results illustrate that different methods of multisensory perceptual training can result in substantial alterations in the circuits underlying the perception of audiovisual simultaneity. These findings suggest a high degree of flexibility in multisensory temporal processing and have important implications for interventional strategies that may be used to ameliorate clinical conditions (e.g., autism, dyslexia) in which multisensory temporal function may be impaired.
The proper integration of information from the different sensory modalities is central to our ability to perceive the world in an accurate and meaningful way. One of the most formidable tasks the brain faces in this process comes in determining whether stimuli from different modalities were generated by a single external source or come from different sources. It is not surprising, then, that one of the key cues in this likelihood determination is spatial location (Meredith and Stein, 1986; Wallace et al., 1992; Teder-Sälejärvi et al., 2005), since stimuli that are spatially proximate are likely to be associated with a common event, and stimuli that are spatially disparate are unlikely to be of common origin. Similarly, the temporal structure of a multisensory stimulus pair provides important probabilistic information as to the sources of sensory information. However, given the differing propagation times for environmental energies in each of the sensory systems, the temporal relationship of a stimulus pair derived from the same event must be flexibly specified. Consequently, the concept of a multisensory temporal binding window emerges as a useful construct. Within this window, the combination of information from two modalities results in significant changes in neural, behavioral and perceptual responses (Dixon and Spitz, 1980; McGrath and Summerfield, 1985; Meredith et al., 1987; Lewkowicz, 1996; Colonius and Diederich, 2004). Multisensory temporal processes have been best examined in the audiovisual domain and have capitalized on tools such as simultaneity judgment tasks to define the important time scales for audiovisual binding.
Although developmental studies have highlighted that significant changes take place in multisensory temporal processing as maturation progresses (Lewkowicz, 1996; Wallace et al., 1997; Wallace and Stein, 1997; Lewkowicz and Ghazanfar, 2006; Lewkowicz et al., 2008), few studies have looked at the possible malleability of these processes in the adult. Those that have examined the window's flexibility focused on changes in point measures such as the point of subjective simultaneity (PSS) and have shown that repeated exposure to asynchronous multisensory combinations biases judgments in the direction of the repeated exposure (Fujisaki et al., 2004; Vroomen et al., 2004; Navarra et al., 2005; Vatakis et al., 2007; Hanson et al., 2008; Keetels and Vroomen, 2008). In contrast, no work has examined whether the size of the multisensory temporal window can be enlarged or contracted, a change that would be of strong ethological and perceptual relevance because of the importance of this window in the binding of cross-modal cues and because there is increasing evidence that this window may be enlarged in several prominent neurobiological disorders (de Gelder et al., 2003, 2005; Virsu et al., 2003; Hairston et al., 2005). In the current study, we set out to examine whether we could alter the temporal characteristics of multisensory processing in adults by engaging participants in two perceptual training paradigms in which they were given feedback as to the correctness of their simultaneity judgments.
Materials and Methods
Twenty-two Vanderbilt undergraduate and graduate students (mean age 20.73 years; 11 female) underwent the two-alternative forced choice (2-AFC) training portion of the study. All participants had self-reported normal hearing and vision, and none had any personal or close family history of neurological or psychiatric disorders. All recruitment and experimental procedures were approved by the Vanderbilt University Institutional Review Board (IRB).
2-AFC simultaneity judgment assessment.
In this task (Fig. 1), participants judged whether the occurrence of a visual stimulus and an auditory stimulus were “simultaneous” or “non-simultaneous” by pressing 1 or 2, respectively, on a response box (Psychology Software Tools Response Box Model 200A). Participants were seated in a dark and quiet room 48 cm from a computer monitor. E-Prime 2.0 (220.127.116.11) was used to control all experiments.
A white crosshair fixation marker (1 cm × 1 cm) on a black background appeared 1 s before the stimuli were presented and persisted throughout the duration of each trial. The visual stimulus consisted of a white ring on a black background that subtended 15° of visual space with an outer diameter of 12.4 cm and an inner diameter of 6.0 cm (area = 369.8 cm2). This stimulus was presented for one refresh cycle on a high refresh-rate monitor (NEC MultiSync FE992, 120 Hz) and hence were 8.3 ms in duration.
The auditory stimulus was a 10 ms, 1800 Hz tone burst presented to both ears via headphones (Philips SBC HN110) with no interaural time or level differences. The acoustic stimulus was calibrated with a Larson-Davis sound level meter (Model 814). Auditory stimuli were presented at 110.4 dB SPL unweighted using impulse detection and flat weighting settings.
The stimuli had stimulus onset asynchronies (SOAs) ranging from −300 ms (auditory stimulus leading) to 300 ms (visual stimulus leading) at 50 ms intervals. SOAs were verified externally with an oscilloscope within an error tolerance of 10 ms arising from the inherent timing error of the auditory presentation hardware and drivers. In the simultaneity judgment assessment task, the lags were equally distributed. A total of 325 trials made up the task (25 cycles × 13 trials/cycle).
2-AFC simultaneity judgment training.
The training tasks differed from assessments in that after making a response, the subject was presented with either the phrase “Correct!” paired with a happy face, or “Incorrect” paired with a sad face corresponding to the correctness of their choice. These faces (area = 37.4 cm2, happy = yellow, sad = blue) were presented in the center of the screen for 500 ms. The white ring and fixation were the same size as in assessment trials and were presented for the same amount of time. Only SOAs between −150 and 150 ms, broken into 50 ms intervals, were used for the training phase. In addition, in the training phase the SOAs were not equally distributed: the veridical simultaneous condition had a 6:1 ratio to any of the other 6 nonsimultaneous conditions. In this way there was an equal likelihood of simultaneous/nonsimultaneous conditions, minimizing concerns about response bias. There were 120 trials in the training phase (20 cycles × 6 trials/cycle). See Figure 1, a and b, for illustrations of the temporal structure of stimulus presentation.
2-AFC training protocol.
Training consisted of 5 h (1 h per day) during which participants took part first in a pretraining simultaneity judgment assessment, then in 3 shorter simultaneity judgment training blocks, followed by a post-training simultaneity judgment assessment. An additional baseline assessment was performed at the outset of the study for each subject, followed by the typical training day; this served to detect any practice effects that may have resulted from completion of the assessment itself.
After 1 week without training, a subset of training subjects (n = 16) returned to the lab and underwent one simultaneity judgment assessment without any training.
Fourteen Vanderbilt undergraduate and graduate students (mean age 19.50 years; 4 female) underwent the 2-AFC exposure portion of the study. All participants had self-reported normal hearing and vision, and none had any personal or close family history of neurological or psychiatric disorders.
The exposure portion of the study differed from the 2-AFC training protocol only in that in lieu of the training blocks, participants underwent 2-AFC exposure blocks of the same length. Thus, all participants in both Experiment 1 and Experiment 2 took part in the same number of 2-AFC simultaneity judgment assessments. The details of the exposure blocks are outlined below.
To maintain attention, the 2-AFC exposure blocks consisted of an oddball task wherein participants were exposed to the same ring-tone pairs present in the simultaneity judgment training sessions but were instructed to press a button when they saw a red ring. As in the simultaneity judgment training sessions, SOAs were not equally distributed: the veridical simultaneous condition had a 6:1 ratio to any of the other 6 nonsimultaneous conditions. Red rings occurred with the same probability across all conditions, and were 1/10 as likely to appear as white rings. The rings and fixation were the same size as in the assessment trial and were presented for the same amount of time; the tone was identical to that presented during assessment and training sessions. Only SOAs between −150 and 150 ms, in steps of 50 ms intervals, were used for this task.
Twenty Vanderbilt undergraduate and graduate students (mean age 20.20 years; 13 female) underwent the two-interval forced choice (2-IFC) training portion of the study. All participants had self-reported normal hearing and vision, and none had any personal or close family history of neurological or psychiatric disorders.
2-IFC simultaneity judgment assessment.
The 2-IFC simultaneity judgment assessment used exactly the same stimuli as those used in the 2-AFC task. In this task, however, participants were presented with two visual-auditory pairs, one with an SOA of zero (simultaneously presented) and one with a non-zero SOA (nonsimultaneously presented). Presentations were separated by 1 s, during which a fixation cross alone was presented. Instructions asked participants to indicate by button-press which interval (first or second presentation) contained the flash and beep that happened at the same time. Participants were instructed to respond as quickly as possible. Simultaneous pairings were as likely to be presented in the first interval as in the second. A simultaneous-simultaneous condition was present in equal representation to other SOAs as a catch trial.
2-IFC simultaneity judgment training.
The training phase of the 2-IFC portion of the study was identical to that of the assessment phase with two exceptions: (1) participants were given feedback as to the accuracy of their responses after each trial, in the same manner described in the 2-AFC training; (2) in a manner similar to the 2-AFC simultaneity judgment training protocol, the range of SOAs presented during training was restricted in training (−150 to 150 ms by 50 ms increments) compared with assessment (−300 to 300 ms). However, unlike the 2-AFC version of this training, and by virtue of the 2-IFC structure, the ratio of simultaneous to nonsimultaneous presentation was always 1:1.
2-IFC training protocol.
Participants underwent training in five 1 h blocks (1 h per day) on the two-interval forced choice version of the simultaneity judgment task. Similar to the 2-AFC training protocol, each day's 2-IFC training began with a simultaneity judgment assessment followed by three shorter blocks of training, and ended with a post-training simultaneity judgment assessment.
A subset of training subjects (n = 9) returned to the lab 1 week after cessation of training and underwent one simultaneity judgment assessment without any training.
All data were imported from E-Prime 2.0 .txt files into MatLab 18.104.22.1681 R2008b (MathWorks) via a custom-made script for this purpose. Individual subject raw data were used to calculate the mean probability of simultaneity judgment (2-AFC) and accuracy (2-IFC) at each SOA for all assessments. These means were then analyzed in multiple ways as summarized in the following sections.
Grand mean SOA analysis.
To determine how overall group probability of simultaneity judgment (2-AFC) or accuracy (2-IFC) changed after training or exposure, individual means at each SOA were averaged to produce the grand average plots shown in Figures 2b, 3b, 4a, 5b, and 6a. Statistical analysis included performance of a two-factor (group, SOA) repeated-measures ANOVA, followed by post hoc t tests (with Holm correction for multiple comparisons) if significant to determine which SOAs showed statistically significant difference from pretraining to post-training assessment.
Window size estimation.
Individual mean data were fit with two sigmoid curves generated using the MatLab glmfit function, splitting the data into left (auditory presented first) and right (visual presented first) sides and fitting them separately. A criterion at which to measure each individual temporal window size was then established. For the 2-AFC tasks, this criterion was equal to 75% of the maximum data point at baseline assessment. For the 2-IFC task, this criterion was set at half the distance between individuals' lowest accuracy point at baseline assessment and 1 (∼75% accuracy). These criterion lines were then used to assess the width of the distributions produced by each individual's assessment data throughout the duration of the training period. Distribution width was then assessed for both the left side (from zero to the left-most point at which the sigmoid curve crossed the criterion line) and the right side (from zero to right intersection point) and then combined to get an estimation of total distribution width. This was then used as a proxy for the size of each individual's window at each assessment. An example of the result of this process may be seen in Figure 2a. It should be noted that, when mean data from any individual assessment were unable to be fit with a sigmoid curve, all data from this individual were discarded for analysis of window size progression. Group-level analysis of differences in window size across time was conducted by performing a repeated-measures ANOVA (within-subject factor, assessment number) followed by post hoc t tests (corrected via the Holm method for multiple comparisons) to determine which differences between assessment measures were responsible for the variance observed.
Judgments of audiovisual simultaneity can be used to define a multisensory temporal binding window
The data produced by the 2-AFC training protocol from one participant are shown in Figure 2a. Here, the mean probability of simultaneity judgment is plotted as a function of SOA and then fitted with two sigmoid curves to model the left and right sides of the plot. The resulting distribution was used to create a singular metric to serve as an index of the multisensory temporal binding window. The value of this window was set as the breadth of the distribution (in ms) at which individual participants reported simultaneity at or >75% of their maximum at baseline (full width at 75% height). This level was chosen because it represents half the distance between the 50% level and each individual's highest likelihood of reporting simultaneity. Note that for this individual's initial assessment the mean span for the multisensory temporal window at this criterion was 321 ms (blue points and curve).
Perceptual training on a 2-AFC task results in a significant narrowing of the multisensory temporal binding window
Immediately after the training period there was a dramatic shift in judgments of simultaneity (Fig. 2). In the individual shown in Figure 2a, this was manifested as a decrease in window size from 321 ms at baseline assessment to 115 ms at the post-training assessment on day 5. Comparisons of group pretraining and post-training simultaneity judgments at each SOA also reveal a strong effect (Fig. 2b). The largest training-related effects were seen on the right side of the distributions, corresponding to those conditions in which the visual stimulus preceded the auditory stimulus: for all stimulus conditions, a repeated measures ANOVA with within-subject factors SOA and pre-/post-training status resulted in a significant interaction (F(12,238) = 10.11, p = 0.005). Post hoc paired-samples t tests revealed significant decreases in mean probability judgment at the 100 ms (from 0.826 to 0.633, p = 0.025), 150 ms (from 0.709 to 0.507, p = 0.016), and 200 ms (from 0.622 to 0.431, p = 0.020) SOA conditions after correction for multiple comparisons. Hence, the training effect appears to be driven largely by significant decreases in the probability of simultaneity judgment following training at the objectively nonsimultaneous conditions.
To examine the time course of the training-induced changes, we plotted the simultaneity distributions at each of the 11 assessments completed throughout the course of training. Quite surprisingly, the effect is evident after a single day of training and is also equivalent in magnitude to that seen after 5 d of training (Fig. 2c). Statistical analysis by repeated measures ANOVA revealed a significant main effect of assessment number (n = 19, F(10,179) = 3.459, p < 0.001) and post hoc paired-samples t tests with correction for multiple comparisons indicated a significant reduction in total window size from baseline assessment (mean of 294.59 ms) to post-training assessment day 1 (mean of 215.02 ms, p = 0.045, corrected). Window size did not differ significantly from post-training day 1 assessment onward (by repeated-measures ANOVA, F(8,143) = 1.566, p = 0.140), although means decreased from 215.02 ms to 194.87 ms. Interestingly, changes in window size seem to be wholly attributable to decreases in the right side of the temporal window. Thus, whereas the left side of the distribution did not change significantly over the course of training (F(10,179) = 1.637, p = 0.099), the right side showed strong training-related changes (from 159.57 ms at baseline to 109.56 ms at post-training day 5 assessment, F(10,179) = 4.360, p = 1.77 × 10−5).
Changes in the multisensory temporal binding window are not seen following passive exposure to the identical stimuli
As with the training group, the data generated by the 14 exposure control participants during each assessment were fitted with two sigmoid curves and window sizes were derived. Figure 3a shows data from a typical participant. Note that, in striking contrast to Figure 2a, the size of this individual's window appears to have increased after exposure (from 383 ms at baseline to 443 ms at postexposure day 5 assessment). This change parallels similar effects on the group level (Fig. 3b): comparison by repeated-measures ANOVA revealed a significant interaction between SOA and pre-/postexposure status (F(12,142) = 7.793, p = 0.015) and while uncorrected post hoc t tests indicate a significant increase in the probability of simultaneity judgment after exposure on the −50 ms, 0 ms, 200 ms, 250 ms, and 300 ms conditions (all p < 0.05), only the 300 ms condition shows a change after correction for multiple comparisons (from a mean of 0.409 at baseline to 0.663 at post-training day 5, p = 0.002). These results are upheld in analysis of group window size progression (Fig. 3c): one-way repeated-measures ANOVA indicated a main effect of assessment number (n = 11, F(10,99) = 2.212, p = 0.025), with a significant increase in mean window size (from 301.21 ms at baseline to 403.33 ms at postexposure day 5) first appearing at postexposure assessment on day 4 (p = 0.044, corrected). However, this increase did not remain significant pre-exposure (p = 0.660, corrected) and postexposure (p = 0.069, corrected) day 5 assessments. This difference in total window size appeared to be driven by an increase on the left side of the distribution (F(10,99) = 2.518, p = 0.011). In contrast, analysis of the right side of the distribution indicated no effect (F(10,99) = 0.771, p = 0.656).
Training-induced changes in the multisensory temporal binding window are stable for at least 1 week
Follow-up assessments were conducted on a subset of the participants in the training group (n = 16) 1 week after the completion of training. Participants underwent no additional assessments or training during this week. Analysis of group-level probability of simultaneity judgment at each SOA tested (Fig. 4a) revealed a significant interaction between SOA and pre-/post-training status (repeated measures ANOVA, F(12,166) = 6.394, p = 0.023) and post hoc t tests corrected for multiple comparisons revealed a number of significant decreases on the right side of the distribution (100 ms, from 0.833 to 0.565, p = 0.016; 150 ms, from 0.695 to 0.435, p = 0.009; 200 ms, from 0.603 to 0.355, p = 0.010; 250 ms, from 0.463 to 0.238, p = 0.004; 300 ms, from 0.333 to 0.140, p = 0.004). Analysis of window size change corroborates these results (Fig. 4b), indicating an overall effect of assessment number in follow-up participants (n = 14; repeated-measures ANOVA, F(10,129) = 3.873, p = 6.62 × 10−5). Further analysis demonstrated 1 week follow-up window size (mean of 184.69 ms) to be significantly smaller than that at baseline assessment (mean of 255.86 ms, p = 0.004) as well as at post-training day 1 assessment (mean of 235.24 ms, p = 0.039), but not significantly different from post-training assessment on day 5 (mean of 197.15 ms, p = 0.608). Thus, while training-induced narrowing remained unchanged 1 week after training cessation, there was evidence of continued narrowing after the initial post-training day 1 drop.
Perceptual training on a 2-IFC simultaneity judgment task results in a significant narrowing in the size of the multisensory temporal binding window
While the 2-AFC results indicate a substantial, rapid, and long-lasting alteration in the size of the multisensory temporal binding window after perceptual training, it is possible that the effects seen may be driven, at least in part, by changes in cognitive biases (i.e., criterion shifts) associated with the two-alternative design rather than by changes in sensory perceptual processes. To address this possibility, a cohort of 20 participants was recruited to take part in a two-interval forced choice (2-IFC) task wherein they were instructed to determine which of two sequential presentations of audiovisual pairs were simultaneous. This experimental structure does not require the setting of a cognitive criterion for simultaneity and thus is more likely to reveal true improvements in discrimination abilities following perceptual training on the same task. It also carries with it the additional benefit of having a constant 1:1 ratio of simultaneous-to-nonsimultaneous presentations, eliminating the need to alter this ratio for the training portion of the study.
As was done for the 2-AFC task results, individual data for each of the 20 subjects' assessments were fit with two sigmoid curves. Similar to the procedure used to determine window size in the 2-AFC task, the value of the temporal binding window was set as the breadth of the distribution (in ms) at which individual participants performed at a criterion defined as halfway between their lowest accuracy point at baseline and 1 (the mean criterion level was 72.3%). Figure 5a illustrates the results of this process in one individual. Note that this individual's window size narrows from 349 to 182 ms following training. Group grand mean SOA analysis (Fig. 5b) revealed no overall effect of training at individual SOAs (repeated measures ANOVA, F(11,196) = 0.792, p = 0.385), likely the result of high intersubject variability and the presence of individuals who fail to show an effect after training; see Figure 8). However, analysis of window size as a function of training day (Fig. 5c) revealed a highly significant main effect of assessment number (n = 17; F(10,159) = 4.503, p = 1.31 × 10−5), with the first significant drop occurring between the post-training assessment on day 1 (mean of 275.79 ms) and the pretraining assessment on day 2 (mean of 173.67 ms, p = 0.002, corrected). Window size measures from this time period forward did not differ significantly (F(7,111) = 2.067, p = 0.052), and all remained significantly lower than baseline (mean of 305.82 ms). In a striking similarity to the data derived from the 2-AFC portion of the study, the changes yielded by the 2-IFC training task seemed to be driven almost completely by shifts in the right side of the multisensory temporal distributions: although repeated-measures ANOVA indicated significant variation in left window size among the different assessments (F(10,159) = 3.442, p < 0.001), no window size measurement proved to be statistically significant from baseline after correction for multiple comparisons. In contrast, window sizes on the right side differed significantly across assessments (F(10,159) = 3.450, p < 0.001) and, like the pattern in total window size change, first differed significantly from baseline (mean of 216.97 ms) at pretraining assessment on day 2 (mean of 107.68 ms, p = 0.002, corrected).
Changes induced by perceptual training on a 2-IFC simultaneity judgment task are stable for at least 1 week
One week after completion of their training on the 2-IFC task, 9 participants returned to complete a final assessment, the results of which are depicted in Figure 6. Analysis of grand mean accuracy as a function of SOA (Fig. 6a) revealed a significant interaction between SOA and pre-/post-training status (F(11,75) = 13.131, p = 0.007), and post hoc t tests showed statistically significant increases in mean accuracy at 150 ms (from 0.681 to 0.881, p = 0.010, corrected), 200 ms (from 0.794 to 0.900, p = 0.047, corrected), and 300 ms (from 0.794 to 0.950, p = 0.018, corrected) SOA conditions. Once again, these effects were also evident on individual window size analysis (Fig. 6b), which indicated that there was significant variation among window sizes across all assessments (repeated-measures ANOVA, n = 7, F(10,59) = 2.29, p = 0.019), and that window size on 1 week follow-up assessment (144.22 ms) was still significantly smaller than at baseline (357.53, p = 0.001) but was not significantly different from post-training day 5 assessment (178.17 ms, p = 0.582).
The window narrowing produced by 2-AFC and 2-IFC training tasks are highly similar in both degree of narrowing and temporal dynamics
Examination of window size change over the course of training for both the 2-AFC and 2-IFC tasks allows for comparison of the dynamics of changes brought about by training under the two paradigms. Figure 7 highlights several similarities and differences between the two groups. Although training using the 2-AFC task results in a window size that is significantly narrower than baseline earlier (post-training day 1) than training under the 2-IFC task (pretraining day 2), and although the mean window size for the 2-IFC group is often lower than that of the 2-AFC group after baseline assessment, an ANOVA with between-subjects factor Group and within-subject factor Assessment Number indicated no main effect of group (F(1,19) = 2.673, p = 0.103) and no interaction between group and assessment number (F(10,188) = 0.993, p = 0.449). On the level of each individual assessment, 2-IFC window size (173.67 ms) was only significantly smaller than 2-AFC window size (234.50 ms) at pretraining day 2 assessment (p = 0.024), and this difference does not withstand correction for multiple comparisons. Overall, the degree and time course of narrowing are remarkably similar between the two paradigms.
Large initial window size predicts success during training
It was noted during analysis of the 2-IFC data that there appeared to be individuals whose mean window sizes decreased with training (dynamic participants) and those whose mean window sizes either remained the same or increased in size (static participants). 2-AFC dynamic participants' (n = 13) and static participants' (n = 6) window size progressions are plotted in Figure 8a. An ANOVA with between-subjects factor Group and within-subject factor Assessment Number revealed a significant interaction (F(10,58) = 14.358, p < 0.001), and the difference between groups at baseline assessment trended toward significance (dynamic participants, 344.01 ms, static participants, 187.40 ms, p = 0.086). Further analysis indicated that window size significantly decreased across the week's assessments in dynamic participants (repeated-measures ANOVA, F(10,119) = 4.125, p = 6.63 × 10−5) but that no such change occurred in static participants' window size (F(10,49) = 0.737, p = 0.687). Figure 8b highlights the differences in progression seen between 2-IFC dynamic participants (n = 11) and static participants (n = 6) over the course of the training week. Analysis of these differences by two-way ANOVA reveal a main effect of group (F(1,6) = 12.455, p < 0.001) and a significant interaction between group and assessment number (F(10,58) = 2.318, p = 0.014). Most importantly, the two groups differed significantly at baseline assessment (dynamic participants, 391.12 ms, static participants, 149.45 ms, p = 0.032), and while a second ANOVA excluding the baseline assessment continued to show a main effect of group (F(1,6) = 17.489, p = 4.82 × 10−5), the groups did not differ significantly at any other individual assessment number. As was the case with 2-AFC dynamic participants, it was found that dynamic participants' window sizes decreased significantly over the course of the training week (F(10,99) = 5.656, p = 1.15 × 10−6), but there were significant variations in window size in static participants over this period (F(10,49) = 3.604, p = 0.001), driven by the sharp increases on post-training day 4 and post-training day 5 assessments. These increases form part of an overall pattern characterized by increased window sizes during post-training assessment compared with pretraining, leading to the appearance of a sawtooth pattern in the window sizes of static participants over the training week. Interestingly, this pattern does not appear in dynamic participants but is prominent in the 2-IFC group data (Fig. 5c), indicating that the latter effect may be wholly driven by the increases exhibited by the 2-IFC static participants.
Examination of individual subjects' initial window sizes and the window size changes exhibited by these individuals after training yielded significant correlations in both the 2-AFC (R2 = 0.695, p = 1.51 × 10−6) and 2-IFC (R2 = 0.504; p = 4.61 × 10−4) data sets. Even more striking, the lines of best fit for these data sets have very similar slopes (0.93, 2-AFC; 0.94, 2-IFC) and x-intercepts converging near 200 ms, the approximate size of static participants' initial windows. Together, these analyses indicate that it is possible to predict the direction and magnitude of window size change based on initial window size.
We have demonstrated that two multisensory perceptual training paradigms are capable of effecting significant, lasting changes in participants' judgments of the perceived simultaneity between visual and auditory events. Moreover, we have provided strong evidence that these effects are driven by a true change in perceptual discrimination abilities engendered by training, and are not a result of simple exposure to the repeated statistical regularities of the training stimuli.
Examination of the multisensory temporal window distributions before training on both tasks revealed a strong asymmetry, with a shoulder of increased probability of simultaneity judgment on the right half of the distribution (i.e., when the onset of the visual stimulus precedes the auditory stimulus). This asymmetry is consistent with other measures of the multisensory temporal window (McGurk and MacDonald, 1976; Dixon and Spitz, 1980; McGrath and Summerfield, 1985), and may be explained by the fact that the visual-leading conditions (unlike auditory-leading conditions) are ethologically valid and must be flexibly specified based on the distance of the stimulus from the observer. This asymmetry is eliminated with training, most likely reflecting the symmetrical structure of the training tasks.
One of the most surprising effects of the perceptual training was its time course. In both the 2-AFC and 2-IFC tasks, significant effects emerged after a single day of training. Indeed, there is growing evidence that short-term exposure to asynchronous audiovisual pairs can drive temporal recalibration (Fujisaki et al., 2004; Vroomen et al., 2004; Navarra et al., 2005; Keetels and Vroomen, 2007, 2008; Navarra et al., 2007; Vatakis et al., 2007; Hanson et al., 2008). These short-term effects have been shown to be transient, and as a consequence of these previous studies our expectation was that the effects of multisensory perceptual learning would not be retained long after the cessation of training. In contrast, the training effects in the current study showed a stability that extended at least a week after the cessation of training. Indeed, performance further improved during this week. In the 2-AFC task, this improvement on follow-up is seen not only as a decline in reports of simultaneity for nonsimultaneous conditions, but also as an increase in the probability that participants will judge the veridical simultaneous condition as simultaneous. Possible explanations for this improvement include recovery from fatigue associated with five consecutive days of training, a possibility that needs further investigation (see below). Another intriguing explanation is that long-term memory consolidation may play a role in strengthening the original training effects (for review, see McGaugh, 2000). Indeed, several studies have demonstrated that perceptual learning effects may depend greatly on sleep-mediated consolidation (Karni et al., 1994; Maquet, 2001; Fenn et al., 2003; Walker and Stickgold, 2004).
The results of the passive exposure experiment deserve particular note, given the surprising finding of an increase in window size over the course of the week of exposure to training stimuli. Because the switch from assessment to training/exposure in the 2-AFC task required an alteration in the ratio of simultaneous to nonsimultaneous stimulus presentations from 1:1 to 6:1, the widening of the temporal binding window observed in exposure subjects may well represent an implicit learning phenomenon, wherein participants “learn” during exposure that an increased number of simultaneous presentations is occurring and subsequently bias their responses. This hypothesis is supported by similar data shown in supplemental Figure 1 (available at www.jneurosci.org as supplemental material), derived from a small number of participants (n = 5) who underwent the 2-AFC training paradigm without any explicit feedback; thus, these participants did not perform the exposure (oddball) task, but performed the original training task without the presence of feedback These data show a similar (but not statistically significant) increase in window size over the course of the week. In addition, these results indicate that the training effects observed were not the result of a simple narrowing of the stimulus space (from 300 ms on each side of the distribution during assessment to 150 ms on each side during training) but were the true result of feedback training.
The importance of feedback for the observed narrowing of the multisensory temporal binding window fits well with what is known about the critical elements for engaging sensory plasticity. Seminal studies showed that significant reorganization could be driven in a bottom-up manner by exposure to a constrained set of sensory stimuli early in development (Hubel et al., 1977; Simons and Land, 1987; Zhang et al., 2001; de Villers-Sidani et al., 2007; Han et al., 2007), and that passive exposure to these same stimuli became less likely to drive behavioral change and neural reorganization as an animal reached the end of a critical period of development (Hubel and Wiesel, 1963). Later studies revealed that these anatomical, behavioral and physiological changes induced in developing animals by passive exposure could indeed take place in adults via top-down perceptual learning, wherein stimuli are paired with either positive or negative reinforcement (Salazar et al., 2004; Blake et al., 2006; Polley et al., 2006). Thus, it appears that the pairing of an alteration in the sensory statistics with an instructive signal (i.e., feedback) is crucial for adult sensory reorganization, and this principle is supported by the data reported here.
Further work is needed to better characterize the effects of fatigue and inattention on the size of the multisensory temporal binding window. As highlighted in the data, the increases in window size on each post-training assessment in the 2-IFC task are driven exclusively by static participants whose temporal windows are small before any training. This fact points strongly to the idea that these increases may be the result of fatigue or inattention and is further supported by an analysis of response bias in these individuals (supplemental Fig. 2, available at www.jneurosci.org as supplemental material). These data were derived from responses in 2-IFC catch trials during which participants were presented with two simultaneous (instead of one simultaneous and one nonsimultaneous) audiovisual events. Results show that, while all individuals appear to share a bias toward indicating that the simultaneous pair was present in the second interval during this condition (p = 1.421 × 10−4), static participants have a much more pronounced bias than dynamic participants during the pretraining (p = 0.035) and post-training (p = 0.037) assessments on day 1, during which participants spent the most time in the lab and showed large increases in window size (Fig. 8).
Importantly, and despite the above considerations, several pieces of evidence indicate that the phenomena observed here reflect changes in sensory perceptual rather than cognitive systems. First is the fact that, by and large, the size of the multisensory temporal binding window at baseline assessment and the narrowing brought about by training are remarkably similar despite alterations in task structure, pointing to a construct that is driven largely by changes in (multi)sensory representations. It should also be noted that the mean window sizes and shapes observed at baseline assessment (∼275–300 ms, skewed toward the positive side of the distribution) are very much in accord with those reported in previous work (Dixon and Spitz, 1980; Bushara et al., 2001; Zampini et al., 2003, 2005a,b; Colonius and Diederich, 2004; Fujisaki et al., 2004), making the training effects reported here even more striking in that they alter both of these characteristics of the temporal binding window. Second, and most notably, trends across the span of the training week indicate that dynamic participants tend to exhibit larger temporal windows at baseline assessment than do static participants and that dynamic participants' windows narrow until they are at a size that is comparable to those of static participants. Together, these results indicate that there may be a lower limit to the size of the multisensory binding window in typically developing adults. However, further studies must be done to rule out the possibility that this seeming lower limit is an artifact of the specifics of the tasks' training and reward structure.
A search for the neural bases of the multisensory temporal binding window described here has become an increasingly active area of inquiry of late, and can trace its origin to earlier studies that highlighted the importance of temporal factors in modulating multisensory integration at the level of the single cell (Meredith et al., 1987b; Wallace et al., 1996; Stanford et al., 2005). This work has now been extended to the network level, where a number of recent studies point to the presence of a large, dynamic network of areas that includes the insula, posterior parietal, and superior temporal cortices as being critical in the perception of audiovisual simultaneity (Bushara et al., 2001, 2003; Calvert et al., 2001; Noesselt et al., 2007, 2008). Most recently, interest has been focused on the potential role of neuronal oscillations in multisensory processing and temporal binding (Lakatos et al., 2007; Chandrasekaran and Ghazanfar, 2009). Together, this work points to cortex as the critical locus for perceptual plasticity (Schwartz et al., 2002; Pleger et al., 2003; Maertens and Pollmann, 2005).
Returning to the single cell, one readily envisioned mechanism to subserve the plastic changes in evidence here is a narrowing in the temporal tuning profile of multisensory neurons responsible for binding processes. In virtually all studies examining these tuning functions in individual multisensory neurons, the temporal windows within which significant multisensory interactions can be generated have been shown to be surprisingly wide [i.e., several hundred milliseconds (Meredith et al., 1987; Wallace et al., 1996; Stanford et al., 2005)]. Although these tuning functions have been shown to be relatively static in adults even in the face of significant changes in sensory statistics (Polley et al., 2008), the coupling with a reinforcement-based signal may be sufficient to engender significant change. Indeed, physiological studies of adult plasticity within sensory systems have focused on basal forebrain cholinergic signals as the instructive cue (Huerta and Lisman, 1995; Hohmann and Berger-Sweeney, 1998; Kilgard, 2003). Moving beyond the single cell, another plausible mechanism is a consolidation in the timing circuits that serve to perceptually anchor stimulus events from different modalities, with this consolidation serving to narrow the tolerance for the encoding of unity judgments. Whether the critical consolidation takes place in one of the nodes in the cortical network, or whether it is distributed awaits future study. A final potential mechanism could feature changes in oscillatory patterns within or across cortical domains that are integral in temporal binding. In addition to the role of cortex, increasing evidence indicates that subcortical structures are far more capable of plastic change than previously thought (Illing, 2001; de Boer and Thornton, 2008; Song et al., 2008; Tzounopoulos and Kraus, 2009). Hence, future neuroimaging studies will focus on both cortical and subcortical structures to elucidate the neural bases of the temporal plasticity evident in multisensory systems.
Overall, the results reported here indicate that training on a simultaneity judgment task is capable of eliciting meaningful, lasting changes in the size of individuals' multisensory temporal binding windows. This ability holds particular promise in designing remediation strategies for disorders (i.e., dyslexia, autism, schizophrenia) in which altered multisensory temporal processing is a contributory factor.
This research was supported by the Vanderbilt Kennedy Center for Research on Human Development, the National Institute for Child Health and Development (Grant HD050860), and the National Institute for Deafness and Communication Disorders (Grant F30 DC009759). We thank Drs. Calum Avison, Randolph Blake, Maureen Gannon, and Daniel Polley, as well as Leslie Dowell, Matthew Fister, Matthew Hevey, Haleh Kadivar, and Juliane Krueger for their technical, conceptual, and editorial assistance.
- Correspondence should be addressed to Albert R. Powers III, Medical Scientist Training Program, Vanderbilt University School of Medicine, 7110 MRB III, 465 21st Avenue South, Nashville, TN 37232.